INTRODUCTION
The accuracy of intrusion detection system is subject to unstable and timeconsuming
detection algorithms. In recent years, intrusion detection technology can be
divided into two categories: intellectualization (MaciaPerez
et al., 2011) and distribution (Rehak et al.,
2009). With the development of electronic computer technology, researches
of the anomaly detection methods (Mabu et al., 2009)
play a much more significant role in intrusion detection systems. Based on statistical
learning theory, the intrusion detection system is capable of continuous learning
which enhances recognition rate and reliability.
In reality, when the network scale is large, the training time becomes too long to detect the incoming attacks. Besides of this main issue, the complexities of matrix operation will cause memoryhungry. In order to monitor traffic on a highspeed link continuously, a quickresponse SVM classifier is needed. However, it is not practical for SVM to handle unlabeled data stream. For this reason, this study provides convenience and assistant of preprocessing and analyzing audit data for security auditor.
The difficulty in SVMbased IDS is that of optimizing the parameters from the training phase of the classifier. Because of this, an optimization algorithm based on grid search is proposed. By our use of the KDD CUP 99 data preprocessed as a source in these experiments, the related parameters were selected rapidly and appropriately. The test result showed that the classification accuracy ratio has been improved significantly.
PRINCIPLE OF SVM FOR MULTICLASSIFICATION
Support Vector Machine (SVM), a kind of machine learning algorithm, can efficiently
solve the classification issue. Since it can obtain better performance under
less sample training conditions, SVM possesses stronger generalized capability
(Keerthi et al., 2000). SVM had been widely used
in the small data sets and high dimension feature spaces due to its good applicability
and high efficiency, especially in abnormal intrusion detection.
SVM is a linear classification machine which was based on the premise that the problem is linearly separable. For twoclass samples in training set, the linear separable function works perfectly if samples can be apart by a hyperplane. But there are special circumstances which appears discrete samples.
Figure 1 indicates that it is impossible to separate the
white samples from the black ones which surround them by a straight line. The
classifier will fall into an infinite loop apparently. In order to obtain an
available optimal hyperplane, the data should be logically mapped into a highdimensional
feature space (Liejun et al., 2008). As Fig.
1 depicts, a dotted parabola is utilized to separate the two classes according
to the function value. Suppose that <x, z> is the inner product of original
features. While mapping to <Φ(x), Φ(z)>, the kernel function
can be defined as follows:
According to the Mercer Principle, different kernel function generates different classification machines.

Fig. 1: 
The impartible data sample 
RBF (Radial Basis Function) is one of the kernel functions and it can be defined as follows:
In related researches (Lijia et al., 2011; ElKouatly
and Salman, 2008; Shuchun et al., 2011),
the RBF possess advantages of other kernel function for three reasons: Keerthi
proved that linear kernel, with one penalty parameter and RBF, with a penalty
parameter and a kernel parameter, can provide the same performance. Moreover,
it is hardly to distinguish the performance between Sigmoid kernel and RBF.
RBF is able to deal with samples in which relationship between Class Label and
feature is nonlinear.
Classification model might be severely affected by noise. An outlier will result in hyper plane dislocation. According to the largest interval law, the interval distance of hyper planes reduced by the deflected support vectors. To avoid this, the nonnegative Slack Variable ξ_{i} is introduced to largest interval and noise samples are partially allowed to maximize the interval distance. Here is the expression socalled soft margin separating hyper plane:
In expression 3, the target function value can be largely affected by parameter
C which is the weight of outliers. The larger the C value is, the bigger the
function value will be. Since the amount of outliers is controlled by accumulated
polynomial, most of the sample points obey the constraint. The corresponding
Lagrange formula can be defined as follows:
In expression 3, both α_{i} and r_{i} are Lagrange multipliers. The expressions of w and b can be derived using partial derivatives to parameters of this formula:
According to expression 6 and constraint expression 7, the antinoise ability
can be improved and the affect of outliers can be reduced by looking for the
biggest interval. Aimed at the problem of multiclass classification, Hsu
and Lin (2002) approaches compared theoretical results with experimental
data. Some of the typical methods such as 1vs 1, 1vsr and DAGSVM are highly
effective. Centered on the thinking of twoclass classification (Yang
et al., 2011), to obtain effective multiclass classifier, it is
necessary to pretreat the input data before training the classifier. After the
data preprocessing phase, parameter optimization will be the most important
part in this study.
DATA PREPROCESSING FOR INTRUSION DICTION MODEL
Security audit dataset, KDD 99, processed by Lee (Stolfo
et al., 2001) was considered as a noted dataset for intrusion detection
systems. Aiming at making evaluation (Muda et al.,
2011) on intrusion detection models, this audit dataset, provides TCPdumpformat
data recorded from a simulative local network, serve as training input to regulate
multiclass classifier based on SVM. There are 38 kinds of attacks in this textformat
file. Basic, contentbased, 2secondflow and host stream characteristics are
included in this dataset. Because of more than million records are included
in KDD dataset, 10% data are selected in this study.
The relative merits of data preprocessing are highly related to the precision
of classification (Gu et al., 2008). For some
unrecognized features are included in KDD dataset, such as the character data,
all features should be numeralized before the classification phase. For example,
the protocol type, of which the values are TCP, ICMP and UDP, are replaced by
3 integer values from 1 to 3. Similarly, numeralization replacement is implemented
for the third, fourth and fortysecond features. The 40 sec feature can individually
serve for classification label. According to the rest of the dataset which includes
41 features, the metrics of features are different in standard. To avoid the
unbalanced data impact, a phenomenon that some of the smaller values are covered
by the larger values, it is essential to normalize the value of each attribute
from metrical to nonunitrelated by the following formula:
In the expression 8 for normalization, y_{max} is range of the upper limit whereas y_{min} is range of the lower limit. x_{max} and x_{min} are maximum and minimum value of the original feature. With the input of x, the value from current record of the corresponding attribute, the output y is recorded as normalized value through calculation.
However, it is complicated for computer to calculate enormous matrix generated
from 10% KDD dataset after numeralization and normalization. To reduce memory
consumption in large scale computation, feature extraction (AlTameemi
et al., 2011) is utilized to accelerate convergence. According to
related research, rough set theory is proposed to extract KDD dataset which
can be indicated using the methods provided by ZAINA A (Zaina
et al., 2006). Six attributes are included in the extracted dataset:
3, 4, 5, 24, 32 and 33. To train the SVM multiclassifier, this experiment selected
2251 samples, including 1000 normal behaviors, 817 denialofservice attacks,
23 probe behaviors, 37 usertoroot behaviors and 374 remotetolocal behaviors.
TRAINING SVMBASED MULTICLASSIFIER WITH GS
Cross validation provides reliable basis for the evaluation and stability to
the trained multiclassifier. In this phase, SVM classification accuracy can
be greatly affected by the nuclear parameter σ in formula 2 and the error
penalty parameter C in expression 3. In order to obtain the best combination
of these parameters, an improved GA (genetic algorithm) was applied to parameter
optimization (Chen and Wang, 2008). To largescale and
unbalanced datasets, such as KDD dataset, GA has provided a global optimal solution
depending on the fitness function. However, it is easy to get into the local
optimum combination because of the randomness of genetic variation algorithm.
To avoid the shortcoming of weak local search ability of traditional genetic
algorithm, a GS algorithm (Coarsetorefined grid search) is proposed. The steps
are as follow:
Step 1: 
First, search at large range. Parameter C is selected from
a geometric series of which the range is 2^{4} to 2^{8}
and the common ratio is 2^{0.5}. Parameter σ is selected from
a geometric series of which the range is 2^{4} to 2^{4}
and the common ratio is 2^{0.5}. According to the best accuracy
in cross validation, the optimal combination of C and σ are
recorded as C* and σ*. The distribution map of parameters and accuracy
and threedimensional one are shown in Fig. 2 

Fig. 2: 
Distribution map of parameters and accuracy and that in threedimensional
coordinate using rough GS; (a) Distribution map of parameters and accuracy
and (b) Distribution map of parameters and accuracy in threedimensional
coordinate 

Fig. 3: 
Threedimensional distribution map of parameters and accuracy
using refined GS 
As Fig. 2a shows, the cross validation accuracies are distributed in the curves. The combination of C and σ is gradually converging to smaller areas.
Step 2: 
Then, search at small range according to the selected C* and σ*.
New parameter C is selected from a geometric series of which the range is
C*x2^{1} to C*x2^{1} and the common ratio is 2^{0.125}.
Similarly, new parameter σ is selected from a geometric series of which
the range is σ*x2^{1} to σ*x2^{1} and the common
ratio is 2^{0.125}. According to the best accuracy in cross validation,
new optimal combination of C and σ are recorded as C* and σ*.
The distribution map of parameters and accuracy and threedimensional one
are shown in Fig. 3 
As Fig. 2a shows, the distribution of cross validation accuracies are in close proximity to plain areas. In the process selecting C and σ, overlearning may arise (The multiclassifier runs with high accuracy in training dataset but poor accuracy in test dataset.). In order to avoid this phenomenon in highest plain area, minimum C is selected as best parameter C*, maximum σ is selected as best parameter σ*.
SIMULATION RESULTS
By using the 5% KDD dataset, selected and pretreated as test data, is utilized for classification. The Table 1 compares the SVM multiclassifier based on Grid Search Algorithm with that based on Genetic Algorithm.
Experimental result shows that the testing accuracy optimized by GS is better
then that optimized by GA and the best C and best σ selected by GS are
smaller than that selected by GA.

Fig. 4: 
The training time and crossvalidation accuracy of GS and
GA during the parameter optimization phase 
Table 1: 
Grid search algorithm vs. Genetic algorithm 

Although crossvalidation accuracies are the same, the GS is advantageous
in both training and testing time. The training result is shown in Fig.
4.
From the Table 1 and Fig. 4, we can see that the training time of GAbased multiclassifier is three times the training time of GSbased. So the intrusion detection model based on GSSVM is an effective system, with faster reaction velocity, higher accuracy and stronger generalization.
CONCLUSION
This study focused on the OneClass SVM which was applied to the intrusion detection module. According to the scale of KDD dataset, the accuracy was strongly influenced by parameter C and σ. After sample numeralization and normalization, feature extraction based on rough set theoretical model was implemented to reduce memory consumption result from high dimensionality. In order to improve the generalization capacity of SVM, a parameter selection algorithm was proposed. Computation on simulation examples and comparison with genetic algorithm demonstrated that the GS doubled the speed of convergence in the cross validation phase. Moreover, the classification accuracy was improved by 0.2669%. For this reason, this paper recommended GSSVM into intrusion detection system for the purpose of reducing the training time and enhancing the stability of the anomaly detection model.
ACKNOWLEDGMENT
This study is supported in part by the National Natural Science Foundation of China, Grant No. 60961002.