INTRODUCTION
For online transactions, whether a transaction will success are influenced by many factors. For example, you are using your cell phone for dialing, whether the transaction could continue depends on whether you have input a correct phone number, whether your account has enough money, whether the network is good, whether the receiver is in a region with signals and so on. Though a failure transaction maybe caused by systems themselves or other factors, a steady transactional system should have a steady distribution on its responses from a statistical perspective. In fact, the online transaction systems are always improved so that they can provide richer and more reliable services for their customers; it is also possible that some mistakes are left in those systems because of some casual actions. Those inadvertent mistakes are very difficult to discover since they only happen in some specific conditions and are not visible for all customers. For example, you maybe often get such message, You dialed cannot be connected, when dialing by your cell phone, obviously, this message cannot give us any help about how to do in the next step. Though those uncertain messages are inevitable, they should have a steady distribution in a large number of historical records. If some operators have a high probability on those uncertain messages suddenly, maybe the operators changed their systems.
It is an important problem for the third parties to monitor whether changes are made in the systems of any party involved in transactions. Anomaly detection on distributions is an effective mechanism for monitoring. In this study, we consider those response signals as indicators; all indicators should have a steady distribution if the corresponding systems are not changed for a large scale of transactions. A successful change on systems should let the systems provide better services and present a steady distribution, a bad change will cause the systems show unsteady indicator distributions. The problem of detecting system changes and system stability is transformed into computation of different indicator distributions.
Classification and clustering are two primary technologies for data analysis.
Hannan et al. (2008) applied classification method
on tire pressure and temperature data for intelligent vehicle performance evaluation.
Hussain et al. (2008) provided a method to detect
drowsiness through combining the Electrooculogram and Head Nodding signals,
which can be analyzed and represented as feature vector of two states, alertness
and sleepiness. Arora et al. (2009), Ranjan
and Khalil (2007), Premalatha and Natarajan (2010)
applied clustering methods on different data. Anomaly detection, which is similar
to outlier detection, skyline query in some applications, such as network or
intrusion detection (Thottan and Ji, 2003; Teodoro
et al., 2009), privacy preserving (Vaidya and
Clifton, 2004), Software bugs (Hangal and Lam, 2002)
and so on. Elahi et al. (2009) researched how
to detect outstanding outliers from datastreams by gridbased technology. Abe
et al. (2006) proposed a selective sampling mechanism based on active
learning for outlier detection, Stibor et al. (2005)
and Gnozalez et al. (2002) used classification
methods for anomaly detection and focused on negative selection algorithm. Song
et al. (2007) made use of users’ domain knowledge to discover
anomalies in which users are really interesting, this is a common goal with
our paper. Chan and Mahoney (2005) provided some algorithms
for anomaly detection on monitoring applications, in which time factor is considered.
More information on anomaly detection is covered by Chandola
et al. (2009). Uysal (2007) presented some
applications and comparison of two models on the prediction of time series values,
Radial Basis Function Networks and Autoregressive Integrated Moving Average
models. Sharma et al. (2007) researched the interestingness
measure to extract the interesting results from large number of classification
results.
Present contribution is to model system stability as indicator distributions and use distance metrics and anomaly detection on different distributions to discover the changes of systems. Based on a large number of the indicators, our method can judge whether changes happened on corresponding systems.
PROBLEM SETTING
From statistically significant, the behaviors of a system can be represented as a series of distributions on their indicators. Given an indicator set I = {i_{1}, i_{2}, ..., i_{n}} and two distributions on I, D_{1} and D_{2}, D_{1} = {<i_{1}, p_{11}>, <i_{2}, p_{12}>, ..., <i_{n}, p_{1n}>}, D_{2} = {<i_{1}, p_{21}>, <i_{2}, p_{22}>, ..., <i_{n}, p_{2n}>}:
D_{1} and D_{2} correspond to different system states and reflect different system characteristics. Generally, for the given indicator set, only one of them represents a successful indicator, such as i_{1} and other failure. Given a time set T = {t_{1}, t_{2}, ..., t_{m}}, if all distributions are timerelated, we have TD = {<t_{1}, D_{1}>, <t_{2}, D_{2}>, ..., <t_{m}, D_{m}>}, the aim of anomaly detection is to find abnormal distribution and locate the abnormal indicators based on a series of given distributions representing the stability of a system, which can be considered as strong evidence of bad system changes.
Problem definition: Given a set of timerelated distributions TDS =
{TD_{1}, TD_{2}, ..., TD_{m}}, anomaly detection is
to find whether there is a distribution change on TDS at the time series {T_{1},
T_{2}, ..., T_{m}}. A distribution change on time series T means
that there are two distribution sets DS_{1} and DS_{2}, the
time set related with DS_{1} is T_{1} and DS_{2} corresponds
to T_{2}, T = T_{1}∪T_{2}∧T_{1}∩T_{2}
= ø and they satisfy MAX (T_{1}) < MIN (T_{2}) or
MAX (T_{2}) < MIN (T_{1}).
A distribution change can be also understood as a distribution transfer from a steady state to another steady state. If the second steady state is formed by some bad system changes, it is valuable to capture such state since the system should be corrected immediately. In fact, anomaly detection includes two stages, the first stage is to define some steady distributions on TD_{1}, TD_{2}, ..., TD_{m}, the second is to discover detailed anomaly through comparison between the steady distribution and current distribution TD_{m+1}. In the following part of this study, we will pay attention to those failure indicators since the successful indicator and the set of failure indicators are related.
Example 1: If a system has 17 indicators, which are expressed as A_{0}, A_{1}, A_{2}, ..., A_{16}. A_{0} is the successful indicator and A_{1}, A_{2}, ..., A_{16} are failure indicators. In Fig. 1a, there are 5 different indicator distributions, their related time is t_{1}<t_{2}<t_{3}<t_{4}<t_{5}, indicators A_{1}, A_{4} and A_{7 }are becoming high, which shows that the system may be changed. Figure 1b presents the comparison between baseline, which is established by historical distributions and current distribution TD_{5}, A_{1}, A_{4} and A_{7} are detected as anomaly since their distributions are higher than baseline’s.
BASIC AND IMPROVED METHOD
Distance metrics is used to judge whether a current indicator distribution is not consistent with the baseline distribution and anomaly detection is used to discover which indicator is bad.
There are two situations for anomaly detection, one is that the successful indicator has a great change and the second is the successful indicator has no apparent change. Usually, only the first situation, namely the good indicator has a sharp decline, will be concerned. In this study, we will also introduce distance metrics to detect the second situation.
Distance metrics: Distance metrics are often used to cluster the similar
points in multidimensional space. The classical distance metrics include Dice
distance, Jaccard distance, Cosine distance, Euclidean distance and so on.

Fig. 1(ab): 
An illustration of indicator distribution and anomaly detection,
(a): Timerelated indicator distributions, (b): A comparison of indicator
distributions 
These distance metrics have a wide range of applications, such as clustering
analysis (Schenker et al., 2003), natural language
processing (Lee et al., 2005; Dumais
et al., 1998), text retrieval (Kim and Choi,
1999), image recognition (Salleh et al., 2011)
and so on. Here we consider a distribution on a set of indicators as a point
of multidimensional space, because all dimensions are in a same range, which
are all probability, a value between 0 and 1, Euclidean distance is the best
to measure their similarity. Given a set of such points P = {p_{1},
p_{2}, ..., p_{n}}, ∀_{i}, p_{i} is a
distribution on a set of indicators, we assume that all points in P are normal,
which means that these points are gathered when system is steady and has no
anomaly, if a point p has a far distance from all points in P, we can say that
there is anomaly in p.
In fact, it is difficult to find enough good distributions for anomaly detection,
especially for the third parties since they cannot know whether a change has
been made on systems. The usual situation is that we have a group of distributions,
we need to distinguish those abnormal distributions from normal ones. In order
to find which point is outlier, we use Euclidean distance to compute the distance
between points and then group those points with small distance into a set, obviously,
every set of points should belong to a steady distribution. Time information
is checked to judge whether a system has a change. If we get two sets, S_{1}
and S_{2}, after distance computation, S_{1} = {<t_{11},
p_{11}>, <t_{12}, p_{12}>, ..., {t_{1n},
p_{1n}}}, S_{2} = {<t_{21}, p_{21}>, <t_{22},
p_{22}>, ..., {t_{2m}, p_{2m}}}, the time information
related with them are T_{1} = {t_{11}, t_{12}, ...,
t_{1n}} and T_{2} = {t_{21}, t_{22}, ..., t_{2m}},
if MAX (T_{1}) < MIN (T_{2}) or MAX (T_{2}) <
MIN (T_{1}), we can say there is a change in corresponding system. Here,
we only consider MAX (T_{1}) < MIN (T_{2}), if all failure
indicators in S_{2} have lower probability than S_{1}, we say
that this is a successful change on corresponding system, our aim is to find
those failed change on systems, which means that some indicators in S_{2}
have a higher probability than S_{1}.
The partition is done by distance computation, we firstly pick up the point pair with the maximum distance, the two points are considered as the first element of corresponding categories. All remaining points are judged their owners on their average distance with current categories, a point will be allocated to the set with minimum average distance. The average distance between a point p and a set S is defined as:
The detailed process for distribution partition is presented in algorithm
1.
Example 2: Table 1 illustrates five distributions on an indicator set {A_{0}, A_{1}, A_{2}, A_{3}}, which are {<t_{1}, D_{1}>, <t_{2}, D_{2}>, <t_{3}, D_{3}>, <t_{4}, D_{4}>, <t_{5}, D_{5},>} t_{1} to t_{5} is an increasing time sequence.
According to algorithm 1, five distributions can be divided
into S_{1} = {<t_{1}, D_{1}>, <t_{2},
D_{2}>, <t_{3}, D_{3}>} and S_{2} =
{<t_{4}, D_{4}>, <t_{5}, D_{5}>}.
If A_{0} represents the successful indicator, there is a change
on corresponding system since S_{2} presents different distribution
from S_{1}.
Algorithm 1: 
Abnormal distribution detection 

Table 1: 
Distributions on an indicator set 

Anomaly detection: According to distance metrics, we can only identify whether there are some changes on systems but we don’t know the change details, namely which indicator has an offset on a series of distributions. In this section, we will discover the change details through anomaly detection. In fact, distance metrics measure changes from multidimensional perspective and anomaly detection find change details based on concrete onedimensional perspective.
We use distribution trends to detect anomaly before and after system change. Whether an indicator is abnormal depends on whether it has different distribution trends before and after system changes. Here, distribution trends are measured by baseline and fluctuation range. Baseline is represented by average and fluctuation range is represented by variance. Given two distributions of a same indicator, which correspond to two phases, before and after system changes, we can compute their average, base_{1} and base_{2} and variance, var_{1} and var_{2}, if the average has an obvious increase or variance have different change, we can say that the indicator is abnormal. In fact, an abnormal indicator except successful indicator has two forms:
• 
base_{2}>base_{1}+β, namely baseline
has increased beyond the expected range after system change 
• 
,
indicator presents an unsteady state 
For the successful indicator, the first rule should be base_{2}<base_{1}+β
and the second rule is right. Here, β and ε are threshold which are
set experimentally. The detailed process is presented in algorithm
2.
Algorithm 2: 
Anomaly detection 

Improvement on distance metrics and complexity: Though a steady system often presents a steady distribution on its indicators, exception is inevitable, which means that a distribution seems to be abnormal but no system change, strict distance metrics are not very accurate to solve this problem.
Since, it is possible that a distribution before change is divided into the distribution set after change and vice versa. Here, we introduce probability threshold to solve this question, suppose time sets T_{1} and T_{2} are collected from two timerelated distribution sets S_{1} and S_{2}, S_{1} corresponds to a distribution before system change and S_{2} is a distribution after system change, if:
we still say that there is a change on corresponding system. Here λ is set empirically.
Another improvement is on computation complexity. Strict distance metrics need distance computation between any pair of distributions, whose time complexity is O (n^{2}). Here, since it is most possible to have the maximum distance between two distributions that are at the starting and end of a time series, we can pick up these two points for computation to establish the initial categories. Especially, the pair distance can be precomputed and stored in a hash structure, which will improve the partition speed.
EXPERIMENTS
Here, we carried out experiments to verify the efficiency and effectiveness
of the proposed method.

Fig. 2: 
Accurateness under steady and unsteady distributions 
The experiments use two datasets, one is transformed from real transactional
data and the other is generated by machine based on the former. All experiments
run under Core Duo 2.2 GHz CPU and 2 G memory.
Accurateness: In order to verify the effectiveness of our method, we
carried out a group of experiments on the first dataset, which is a transformed
dataset on a real one. This dataset involves 6 institutions and 30 days’
transactions. The number of transactions in one day varies from tens of thousands
to several millions and the numbers of indicators for them are from 30 to 40.
Here, we assume that every institution corresponds to a system since we cannot
know how many systems really exist in an institution. Every experiment is carried
out in two phases, this first phase is to find whether there is change on all
institutions and the second phase is to find the change details. We compare
the accurateness on two situations, one is to test whether the proposed method
will find anomaly under steady distribution, another is to test whether the
proposed method can locate the anomaly accurately under some distribution after
system changes. The experimental results are presented in Fig.
2. We can find that when all distributions are steady, namely no system
change, the proposed method is perfect since it does not give any false positives.
When checking whether there is anomaly under system change, the proposed method
maintains a high accurateness above 90%.
Time performance: In this group of experiments, we expand the dataset
to verify the efficiency of our method. An expanded dataset is used in this
group of experiments, we expanded the number of institutions to 5000, 10000,
15000 and 20000, every of which still has 30 days’ transactions.

Fig. 3: 
Computation time comparison between original and improved
program 
The number of indicators still varies from 30 to 40. In fact, the original
program has a good performance but after improvement, the performance presents
a better effect. The experimental results are shown in Fig. 3.
The original program has a quicker rise on execution time than the improved
when the number of institutions becomes larger. It is possible to consume more
time when the number of transaction day increases greatly. In this dataset,
we only expanded the number of institutions base on their distributions.
CONCLUSIONS
In this study, we consider a steady system as a steady indicator distribution, so a system change can be detected through distribution offset. All information returned by a system are modeled into an indicator distributions, through monitoring the indicator distribution, a system change can be discovered effectively. We use a twostage method to locate system change, the first stage helps to decide whether there is a change on system, the second stage detect which indicator has an abnormal distribution. The proposed method can help to find system change effectively, which is very significant for quality analysis of online transactions.
ACKNOWLEDGMENT
We gratefully acknowledge the support of Education Department Foundation of Guangxi under grants No. 201010LX154.