LI
3
be used to decorrelate the temporal correlation within each data stream and a spatial model can be used to decorrelate
the spatial correlation between data streams before applying monitoring schemes, following most papers on the topic,
we assume that the Xi,t are independent both within and between data streams. When the system is in control (IC), the
underlying distribution of {Xi,1, Xi,2, … } (i = 1, … , m) is called the IC distribution, denoted by F0,i. Following this setup,
at a given time t, we observe Xi,1, Xi,2, … , Xi,t, i = 1, … , m. The task of our online monitoring scheme at time t is to
determine if the distribution of Xi,1, Xi,2, … , Xi,t is the same as F0,i for all i = 1, 2, … , m.
The above task can be carried out by tracking a global monitoring statistic Gt, which contains information collected
from all data streams up to time t. If Gt is within the preset control limit, we will declare that all data streams are IC and
continue monitoring. If Gt exceeds the control limit, we will raise an alarm suggesting that some of the data streams are
out of control (OC). In the following, we propose a novel class of global monitoring statistics that can work with any type
of data and be effective for different types of OC scenarios.
2.2 Proposed global monitoring statistics
Because the change point can happen at different times for different data streams, a popular approach in the literature
for developing the global monitoring statistic Gt is to first choose an appropriate local monitoring statistic for tracking
each data stream and then combine those local monitoring statistics in a way that produces a single global monitoring
statistic. We will follow this approach. More specifically, let Wi,t be the local monitoring statistic for the ith data stream at
time t that summarizes the evidence regarding a possible local change based on the observations, Xi,1, … , Xi,t. Without
loss of generality, we assume that a larger Wi,t indicates a higher probability of the ith data stream being OC. Although
our proposed global monitoring statistic Gt can work with any choice of Wi,t, in order for Gt to be efficient for detecting
changes in any data stream, the Wi,t should be chosen to be efficient for detecting local changes. Since choosing a good
local monitoring statistic Wi,t is equivalent to choosing an appropriate monitoring statistic for the univariate data stream,
there is rich literature on this topic (see, for example, Qiu7), and we can easily find the appropriate monitoring statistic
from the literature as Wi,t for any particular application in mind. Therefore, in the following, we assume that the Wi,t have
been constructed, and our focus is how to combine these local monitoring statistics Wi,t into a powerful global monitoring
statistic.
Note that at any time t, we have calculated W1,t, … , Wm,t. Without loss of generality, we assume that the Wi,t are
independent and identically distributed when the system is IC. As mentioned in Section 1, Liu et al6 recently proposed
an SUM-shrinkage approach to construct the global monitoring statistic based on W1,t, … , Wm,t. In their approach,
W1,t, … , Wm,t are compared with some pre-specified threshold, and only those that exceed the threshold are used to con-
struct the global test statistic. However, similar to all the other thresholding methods, it is impossible to choose a threshold
in advance that works well for all OC scenarios.
Instead of comparing W1,t, … , Wm,t with some pre-specified threshold, we propose to compare their order statistics
with their respective expected values when the system is IC. More specifically, let W(1),t ≤ W(2),t ≤ … ≤ W(m),t be the
order statistics of W1,t, … , Wm,t. Note that W(i),t can be also considered as the observed (i − 3∕4)∕(m − 1∕2) quantile of the
underlying distribution of the Wi,t. Here, (i − 3∕4)∕(m − 1∕2) is the common continuity correction of i∕m. Let q(i),t denote
the expected (i − 3∕4)∕(m − 1∕2) quantile of the IC distribution of the Wi,t. Then, q(i),t can be considered as the expected
value of W(i),t. A natural statistic that summarizes the differences between W(i),t and their respective expected values q(i),t
∑
m
is simply i=1(W(i),t − q(i),t)2. Since a larger Wi,t indicates a higher probability of the ith data stream being OC, only when
W(i),t is larger than its expected value q(i),t, it may indicate abnormality in the system. Therefore, we only include the
difference when W(i),t is larger than its expected value q(i),t in our global monitoring statistic, and the new class of global
monitoring statistics we propose is
m
∑
(
)
2
Gt =
W
(i),t − q(i),t I{W
,
}
(1)
(i),t>q(i),t
i=1
where I{A} is the indicator function and takes 1 if A is true and 0 otherwise. Then, our proposed monitoring scheme is
to plot Gt over the time t, and it raises an alarm if Gt > h, where h is the control limit predetermined by the desired IC
average run length (denoted by ARL0).
At first glance, our proposed global monitoring statistic Gt is the sum-type statistic, so it is expected to be effective when
a moderate or large number of data streams are abnormal. As shown in our simulation studies in Section 3, our global
monitoring statistic Gt is efficient not only for a large number of abnormal data streams but also for a few abnormal data
streams. The reason why Gt can be efficient when a few data streams are abnormal is the following. In general, the extreme
order statistics W(i),t with small i or large i have larger variabilities than the order statistics in the middle. Therefore,