ABSTRACT
The study presents a Fault Detection and Isolation (FDI) scheme with a particular emphasis placed on sensor fault diagnosis in nonlinear dynamic systems. The non-analytical FDI scheme is based on a two-step procedure. Two methods are proposed for the first step, called residual generation, one use fuzzy sets and the second neuronal network. A fuzzy neural network performs the second step, called residual evaluation. Some simulation results are given for efficiency assessment of this fault diagnosis approach.
PDF Abstract XML References Citation
How to cite this article
DOI: 10.3923/jas.2006.2020.2030
URL: https://scialert.net/abstract/?doi=jas.2006.2020.2030
INTRODUCTION
The problem of Fault Detection and Isolation (FDI) is a crucial issue for the safety, reliability and performance of industrial processes.
The FDI procedure consists basically of two main steps: generation of residuals which should be useful fault indicators and residual evaluation which involves decision making.
The model-based FDI approach also referred to as the analytical approach, which has received intensive attention, use mainly state and parameter estimation techniques (Benloucif and Mehennaoui, 1992; Benloucif et al., 1998; Frank, 1990; Patton and Chen, 1997). The main drawback of the analytical approach is the requirement of an accurate model for reliable diagnostic decision (minimum rate of missed detections and false alarms).
A fundamental aspect in the design of model-based methods is thus concerned with the problem of robustness with respect to model uncertainties arising in the form of modelling errors and unknown external disturbances. As far as linear systems are concerned, the problem of robust residual generation may be considered to be mature (Benloucif and Staroswiecki, 2002; Franck, 1990; Patton et al., 1997) whereas the FDI problem for nonlinear dynamic systems has been investigated to a lesser extent (Garcia and Frank, 1997; Jiang et al., 2001).
Alternately, FDI can be performed using qualitative techniques such as expert systems, fuzzy logic, neural networks (Alexandru et al., 2000; Benloucif and Staroswiecki, 2002; Chen and Lee, 2002; Evsukoff et al., 1999; Frank, 1994; Isermann, 1998; Schneider and Frank, 1996; Theilliol et al., 1997).
In (Benloucif and Mehennaoui, 2002), a fault diagnosis procedure for linear systems used a combination of an analytical residual generator (a Kalman filter) and a fuzzy neural network for residual evaluation. This paper extends this work to the nonlinear case. The main difference is the problem of the identification of non-linear model. On the other hand, we know now the capacity of fuzzy systems (Hellendoorn and Driankov, 1997) and neural network (Norgaard et al., 2000) to identify nonlinear systems.
Once the model is obtained, a neural network performs the decision-making, which consists in detecting and isolating a fault when it occurs. This neural network coupled to a fuzzy inference block acts as an on-line fault classifier.
RESIDUAL GENERATION
There are several different approaches to modelling of complex nonlinear systems. The main distinction can be made between global and local methods.
In this study, we present the two approaches: Neural network in the global approach and fuzzy sets in the local one.
The residual generation procedure is depicted in Fig. 1.
Residual generation by fuzzy sets: The fuzzy sets methods use partitioning of the process domains into a number of fuzzy regions. For each region in the input space, a rule is defined that specifies the output of model. The rules can be seen as local of submodels of the systems. The rules used, here in this paper, are Takagi-Sugeno (TS) rules that give as result locally nonlinear submodels.
Fig. 1: | General procedure of residual generation |
Takagi-Sugeno model: The affine Takagi-Sugeno (TS) fuzzy model consists of rules Ri with the following structure:
(1) |
antecedentconsequent where x∈X⊂Rp is a crisp input vector, Ai is a (multidimensional) fuzzy set: μAi(x): X→(0, 1), yi ε R is the scalar output of the i th rule, ai ε Rp is a parameter vector and bi is a scalar offset. K is the number of rules in the rule base.
Given the outputs of the individual consequents yi, the global output y of the TS model (1) is computed using the weighted (fuzzy) mean formula
(2) |
Here ßi(x) denotes the degree of fulfilment of the i th rules antecedent, ßi = μAi(x).
For building fuzzy models from data, generated by poorly understood dynamic systems, the input-output representation is often applied. The most common structure is the NARX (Nonlinear AutoRegressive with eXogenous input) model.
In terms of rules, the model is given by
(3) |
where k denotes discrete time sample, ny and nu are integers (fixed by the user) related to the systems order and ai, bi, ci are consequent parameters. The NARX model can represent MISO systems directly and MIMO systems in decomposed form of a set coupled MISO models.
By choosing the structure of the model, the identification problem is transformed into static nonlinear regression y = F(x). The model input x is called the regressor, the output y is called the regressand and the product space of the regressor and the regressand, Z = (X xY)⊂Rn is called he regression space, where n = p+1 is the dimension of this space. Recall that p is the dimension of the regressor vector x. In this space, the equation y = F(x) defines a hypersurface (subspace of the dimension Rp), which is called regression surface. Geometrically, the consequents of the affine TS model (1) can be seen as hyperplanes in the regression space. By means of the antecedent fuzzy sets, the regression space is partitioned into smaller regions, in which the regression surface can be locally approximated by these hyperplanes. The purpose of identification is to find the number, locations and parameters of the hyperplanes, such that the regression surface is accurately approximated. This is achieved by applying a class of fuzzy clustering methods called subspace clustering algorithms. In this study, the Gustafson-Kessel (GK) algorithms is used.
Gustafson-Kessel (GK) algorithm: First, we have to construct a matrix Z, of data to be clustered. This is achieved by concatenating a matrix containing the regressions vectors in its columns and a vector containing the regressands.
As an example, consider a SISO system for which a set of N measurement is available:
Postulating, for instance, a second order NARX structure, y(k+1) = F(y(k), y(k-1), u(k), u(k-1)), the data set for clustering is constructed as:
(4) |
The first four rows contain the regressors and the last row the regressand. The vector in the k th column of the matrix Z will be denoted by zk.
The set of vectors zk, k = 1, 2, , N will be partitioned into c clusters, represented by their prototypical vectors vi = (vi,1, , vi,n)T ε Rn, i = 1, ,c.
Denote V ε R n x c the matrix having vi in its column. V is called the prototype matrix. The fuzzy partitioning of the data among the c clusters is represented as the fuzzy partition matrix UεRcxN whose elements denoted μi,k ε [0, 1] are the membership degree of the data vector zk in i the cluster. A class of clustering algorithms search for the partition matrix and the cluster prototypes such that the following objective function is minimized
(5) |
subject to the following constraints
(6) |
(7) |
In (5), m>1 is a parameter that control the fuzziness of the clusters. The usual setting with m = 2 is suitable for most applications. The function d(zk, vi) is the distance of the data vector zk from the cluster prototype vi. The constraint (6) avoids the trivial solution U = 0 and the constraint (7) guarantees that clusters are neither empty nor contain all the points to degree 1.
The optimization problem defined by the functional (5) subject to the constraints (6) and (7) can be solved by different nonlinear optimization techniques. The most popular one is the so-called fuzzy c-means algorithm (Bezdec et al., 1987). Gustafson and Kessel extended the c-means algorithm for an inner-product metric norm
(8) |
where Mi ia appositive definite matrix adapted according the actual shapes of the individual clusters, described approximately by the cluster covariance matrices Fi
(9) |
The distance inducing matrix Mi is calculated as the normalized inverse of the cluster covariance matrix
(10) |
In the iterative optimization scheme of the GK algorithm below, the subscript (l) denotes the value of a variable at the l th iteration.
Gustafson-Kessel fuzzy clustering algorithm: Given the data matrix Z, choose the number of clusters 1<c<N, the weighting exponent m>1 and the termination tolerance ∈>0. Initialize the fuzzy partition matrix U(0) randomly, such that is satisfies the conditions (6) and (7).
Repeat for l = 1,2,
Step 1: Compute the cluster prototypes (means):
(11) |
Step 2: Compute the cluster covariance matrices
(12) |
Step 3: Compute the distances:
(13) |
Step 4: Update the fuzzy partition matrix:
(14) |
if di,k = 0 for some i = s, set μk,s = 1 and μi,k = 0 ∀i ≠ s until
This algorithm simply loops through the estimates of the cluster centres V, the covariance matrices F and the fuzzy partition matrix U. We explain, now, how to derive fuzzy models from these matrices.
Estimation of consequent parameters: There are several methods to obtain the consequent parameters. Since the model should serve as numerical predictor, we use the global least square approach, which gives the least prediction error.
In order to obtain an optimal global predictor, the aggregation of the rules should be taken into account. When using the fuzzy mean defuzzification (2), which is a convex linear combination, a global least squares problem can be solved to obtain the consequent parameter estimates.
The membership degrees ßi,k = μAi(xk), representing the degree of fulfilment of the i th rule of each data point, can be obtained from the fuzzy partition matrix U. Recall that each row of U contains a point-wise definition of the membership function for the data in the product space X x Y. In order to obtain the membership function Ai in the regressor space X, the i th row of U, denoted U(i) must be projected onto regressor space
(15) |
where proj(.) is the point-wise projection operator.
The result of the projection step is that a set of data vectors with repeated regressors xk are assigned the maximum membership degree from this set. in order to write (2) in a matrix form for all data (xk, yk), 1≤k≤N, denote Bi a diagonal matrix in RNxN having the normalized membership degree γi as its k th diagonal element. Finally denote X the matrix in RNxcN composed from matrices produces of Bi and Xe as:
(16) |
Denote θ the vector in Rc(p+1) given by
(17) |
where θi= (aiT, biT) for 1≤i≤c.
The resulting global least square problem X(θ) • y has the solution
(18) |
From (17) the parameters ai and bi are obtained by
(19) |
Deriving antecedent membership functions: The fuzzy partition matrix u projected onto the antecedent space defines the membership functions point-wise, for the available data. In order to obtain a prediction model, the antecedent membership functions need to be expressed in a form that allows one to compute the membership degrees for any input data. This can be achieved by using an inverse of the distance function of the clustering algorithm in the antecedent product space.
The degrees of fulfilment of the rules are computed by evaluating the distance function, Eq. (8), only for the regressor x and the regressor part of the cluster prototype , using the corresponding partition of the cluster covariance matrix
(20) |
The inner product norm then measures the distance of the antecedent vector from the projection of the cluster center to the antecedent space. Then the inner product norm can be evaluated as
(21) |
and transformed into the membership degree (degree of fulfilment), using some kind of inversion. One possible choice is to use the same formula as in the clustering algorithm
(22) |
which takes into account all the rules and computes the degree of fulfilment of one rule relative to the other rules. the sum of the membership degrees also equals one as with clustering, hence γi = βi.
Summary of the identification procedure. The identification procedure can be summarized in the following steps:
Step 1: | Design identification experiments and collect a set of representative measurements. |
Step 2: | Choose the model structure, Eq. (3) |
Step 3: | Cluster data by using GK algorithm. |
Step 4: | Generate the rules by computing the consequent parameters and the antecedent membership functions. |
Step 5: | Validate the model. |
Residual generation by neural network: It is relevant to use the high potential of neural networks for nonlinear system modelling in the context of fault diagnosis of nonlinear dynamic systems. The most commonly used neural network architecture is the multilayer perceptron (MLP) network (Norgaard et al., 2000).
Its implementation goes through the following steps:
• | Off-line construction of a database using expert knowledge of the process characteristics under different operating conditions. |
• | Selection of the neural network structure: The NNARX model is recommended (Chen and Lee, 2002; |
Fig. 2: | Two-layer neural network |
Norgaard et al., 2000) when the system under consideration is deterministic or weakly noisy. The NNARX model may be represented by the general form:
(23) |
the regression vector and the nonlinear function is the regression vector and the nonlinear function g can be realized by a suitable MLP network.
A multivariable NNARX model can be adequately implemented as a feedforward two-layer perceptron network having one hidden layer and an output layer as shown in Fig. 2.
The vector φ(k) of delayed outputs and inputs of the system is applied to the network inputs. (n1, ,nn, m1, ,mm,d) are the structural indices also referred to as the lag space of the neural model. The input delay d is generally taken as one.
The hidden layer includes a sufficient number nh of sigmoid units (nh must be specified experimentally) and the output layer contains linear units.
W = (W1 W2) is the weight matrix relating the inputs to the hidden layer units and Z is the weight matrix relating the hidden layer units to the output units.
The neural network outputs are given by:
(24) |
(25) |
where φj are sigmoïd type activation functions and ψi are linear type activation function and (wj0, zi0) are the biases.
Network training: The network weights and biases (randomly initialized) are adjusted using a suitable minimisation algorithm of the following mean square error criterion:
(26) |
where N is the length of the training data set. The Levenberg-Marquardt algorithm is recommended to use as pointed out in (Norgaard et al., 2000).
Network validation: In this stage the resulting neural model is evaluated to decide for its adequate representation of the system. This is done by testing the trained network using a data set different from the one used for training. If the trained network is judged unsatisfactory after the validation tests then it is necessary to go backwards in the procedure by retraining the network with different weight initializations, or by generating additional training data, or by modifying the network structure (by redefining the regression vector and the number of hidden units).
As in the case of residual generation by fuzzy sets, all these steps are accomplished off-line. When the neural network is validated, it may be utilized for online residual generation.
RESIDUAL EVALUATION
The task of residual evaluation can be achieved by a fuzzy neural decision scheme (Alexandru et al., 2000; Benloucif and Mehennaoui, 2002) as represented in Fig. 3.
A fuzzy neural network is based on the association of fuzzy logic inference and the learning ability of neural networks. The fuzzy neural approach is a powerful tool for solving important problems encountered in the design of fuzzy systems such as: determining and learning membership functions, determining fuzzy rules, adapting to the system environment. The main points of the residual evaluation procedure are described below:
Residual fuzzyfication: It consists in converting the numerical values of residuals into linguistic variables. Each input (residual) may be described by three linguistic variables (Negative, Zero, Positive). Each linguistic variable is represented by a membership function, which has generally a triangular or trapezoidal shape.
Fig. 3: | Neural fuzzy decision scheme |
Fig. 4: | Example of RNN used for residual evaluation |
The linguistic variable zero defines the range where the residual may be considered to be unaffected by a fault. The linguistic variables Negative and Positive define the residual amplitude ranges indicating the presence of a fault. The corresponding membership functions give the extent to which a residual is or is not affected by a fault.
Neural network structure: For fault diagnosis, it is desirable to use a neural network to model the nonlinear relationship between the fuzzyfied residuals and the fault decision functions. A multilayer perceptron network is therefore a good candidate. Moreover, to account for memory in the decision process it is necessary to use a recurrent neural network (RNN). The RNN may be implemented as a NNARX model described by:
(27) |
Dk(fi) i = 1 nf, are the fault decision functions also referred to as fault indicators and fi are the faults acting on the process. The regression vector φ(k) contains the fuzzy residuals Rj(k) j = 1..nr and the delayed decisions Dk-1(fi) i = 1 nf. Because of the feedback introduced, the recurrent NNARX model may be realized by a three-layer MLP. This is illustrated by the example given in Fig. 4, which shows a residual evaluation scheme processing three residuals (r1, r2, r3) to diagnose three faults (f1, f2, f3).
The corresponding neural network has the following architecture: an input layer with 12 units representing all possible states of the fuzzy residuals together with the past decisions, a hidden layer having 4 units and an output layer with 3 units each assigned to a decision function. The use of this RNN architecture ensures reliable dynamic decision-making (Alexandru et al., 2000; Benloucif and Mehennaoui, 2002; Evsukoff et al., 1999).
Training: Prior to on-line use, network training is performed for all possible fault scenarios. During training a residual pattern corresponding e.g. to fault f1 is applied to the network input and a one is assigned to the corresponding output. The network weights are then adjusted by an appropriate algorithm thus enabling the neural network to learn the imposed input-output pattern. The use of the back propagation algorithm is recommended (Evsukoff et al., 1999). The ultimate goal of the training is to achieve the extraction and selection of the necessary parameters defining the if-then inference rules.
NUMERICAL RESULTS
Simulation results are next presented to assess the capacity of this diagnosis approach based on neural and fuzzy techniques to detect and isolate sensor faults in a nonlinear process. The nonlinear process considered here is composed of three identical tanks having section Q, connected in series by a pipe of section q, with outlet at height H. The system outputs are the three tank levels yi = hi i=1 3 satisfying the condition h1>h2>h3>H>0.
This system is governed by the following nonlinear differential equations:
(28) |
is the gravity constant and u = 1.222 m3 sec-1 is the constant input flow. This simulation study is carried out with a sampling time Ts = 10 sec and with initial conditions: h10 = 6.9 m, h20 = 5.5 m, h30= 4.3 m.
Method using fuzzy sets Residual generation: The structure of the fuzzy model is selected by using the insight in the physical structure of the system as follows:
Output 1: | ny11 = 1, ny12= 1, ny13 =0, u11= 1 |
Output 2: | ny21 = 1, ny22= 1, ny23 =1, u12= 0 |
Output 3: | ny31 = 0, ny32= 1, ny33 =1, u13= 0 |
Fig. 5: | Membership functions (output 1) |
Fig. 6: | Membership functions (output 2) |
Fig. 7: | Membership functions (output 3) |
Table 1: | Cluster centers (output 1) |
Table 2: | Cluster centers (output 2) |
Table 3: | Cluster centers (output 3) |
As example, the degrees selected for the output 1 state that the level h1 depends on h1, h2, u, but not on h3 (Eq. 28).
The number of clusters is c = 2, then the number of rules are also 2. The fuzzy TS models obtained are:
Output 1:
R1: If y1(k-1) is A11 and y2(k-1) is A12 and u is A13
Then y1(k) = 0.96 y1(k-1)+0.05 y2(k-1)+0.13u-0.07
R2: If y1 (k-1) is A21 and y2(k-1) is A22 and u is A23
Then y1(k) = 0.97 y1(k-1)+0.04 y2(k-1)+ 0.12 u-0.08
The cluster centers are regrouped in the Table 1.
The antecedent membership functions obtained are represented by the Fig. 5.
Output 2:
R1: If y1(k-1) is A11 and y2(k-1) is A12 and y3(k-1) Is A13 Then y2(k) =0.035 y1 (k-1) + 0.922 y2(k-1) + 0.05 y3(k-1)-0.002
R2:. If y1(k-1) is A21 and y2(k-1) is A22 and y3(k-1) is A23 Then y2(k) =0.037 y1(k-1)+0.926 y2(k-1)+0.036 y3(k-1) +0.006
For the second output, the cluster centers are summarized in Table 2 and the antecedent Membership functions: They are depicted by the Fig. 6.
Output 3:
R1: If y2(k-1) is A11 and y3(k-1) is A12. Then
y3(k) = 0.05 y2(k-1)+0.907 y3(k-1)+0.144
R2: If y2(k-1) is A21 and y3(k-1) is A22. Then
y3(k) = 0.04 y2(k-1)+0.920 y3(k-1)+0.121
For the third output, the cluster centers are summarized in Table 3 and the antecedent Membership functions are depicted by the Fig. 7.
After validation, this NNARX fuzzy model is used to generate the residuals: ri(k) = yi(k)-yi(k) i = 1 3. In normal operation, the residuals are near zero as shown in Fig. 8.
Fig. 8: | Residuals by fuzzy sets (normal operation) |
Fig. 9: | Residuals by fuzzy sets (case 1) |
Fig. 10: | Decision functions (case 1) |
Table 4: | Inference table |
Residual evaluation: The linguistic variables describing the fuzzyfied residuals are defined by the following Membership Functions (MF):
• | N: negative residual with trapezoidal MF, |
• | Z: zero residual with triangular MF, |
• | P: positive residual with trapezoidal MF. |
The membership functions for each residual are given below:
The RNN used in this simulation study is shown in Fig. 4. Its training is based on the rules summarized in Table 4, which have been obtained after many simulation tests.
The learning operation realized by the back propagation algorithm converged after 3266 iterations with a mean square error E = 0.001.
Sensor fault diagnosis of the three-tank process: Various simulation tests have been performed in order to validate the efficiency of this diagnosis scheme and the results are quite conclusive. For illustrative purposes only two fault scenarios summarized in Table 2 and 3 are discussed.
Case 1: Bias type faults are injected in sensors 1 and 2 as described in Table 5. The corresponding residuals are shown in Fig. 9.
The fault f1 on sensor 1 affects positively the residual r1 and negatively the residuals r2 and r3 at time t = 12000 s, whereas the fault f2 on sensor 2 affects positively the residual r2 and negatively the residuals r1 and r3 at time 9000 s.
Table 5: | Case 1 |
Table 6: | Case 2 |
Fig. 11: | Residuals by fuzzy sets (case 2) |
Fig. 12: | Decision functions (case 2) |
The obtained decision functions allow to well detect the faults f1 and f2 as shown in Fig. 10. It was possible by use of fuzzyfied residuals and the training network operation.
Case 2: This fault scenario in sensors 2 and 3 is described in table 6. The corresponding residuals are shown in Fig. 11.
The fault f2 on sensor 2 affects positively the residual r2 and negatively the residuals r1 and r3 at time t = 13000, whereas the fault f3 on sensor 3 affects positively the residual r3 and negatively the residuals r1 and r2 at time t = 10000.
Fig. 13: | Residuals by neural network in normal operation |
Fig. 14: | Residuals by neural network (case 1) |
Fig. 15: | Residuals by neural network (case 2) |
As shown in Fig. 12, the fault indicators detect and isolate successfully the faulty sensors.
Method using neural network Residual generation by neural network: A NNARX model having the architecture shown in Fig. 2 has been used with the following parameters:
n1 = n2 = n3 = m1 = 1, d = 1, nφ = 4, nh = 4. |
Training of this network was done by the Levenberg-Marquardt algorithm and the mean square error reached at 500 iterations is E = 2.36510-4. After validation, this NNARX model is used to generate the residuals:
ri(k) = yi(k)-ŷi(k) i = 1 3. In normal operation, the residuals are near zero as shown in Fig. 13.
Residual evaluation: In this case, the membership functions are given as follow:
We use the same RNN shown in Fig. 4. Its training is based on the rules summarized in Table 4. We notice that is the same logic decision for both methods.
Sensor fault diagnosis of the three-tank process: Case 1: Bias type faults are injected in sensors 1 and 2 as described in Table 2. The corresponding residuals are shown in Fig. 14.
We notice that effects of faults on this residuals are similar with those on the residuals obtained by the method of fuzzy sets.
Also with this method, the decision functions isolate the two faults and we obtain the same function decision shown in Fig. 7.
Case 2: This fault scenario is the same as that described in Table 3. The corresponding residuals are shown in Fig. 15.
With this method, the faulty sensors are also isolated successfully and we obtain the same decision functions shown in Fig. 12.
CONCLUSIONS
A fuzzy neural scheme for on-line fault diagnosis was presented. A NNARX model is used for residual generation. This NNARX model can be obtained either by fuzzy sets or neural network. A recurrent fuzzy neural network performs the residual evaluation task. Fault diagnosis is achieved by training the network to recognize the pattern of the fault signatures. Preliminary simulation results show the efficiency of the developed scheme for detecting and isolating sensor faults in a nonlinear system. The applicability of this qualitative diagnostic approach to the case of system actuator and component faults is currently under study.
REFERENCES
- Bezdec, J.C., R. Hathaway, R.E. Howard, C.A. Wilson and M.P. Windham, 1987. Local convergence analysis of a grouped variable version of coordinate descent. J. Optimiz. Theory Applic., 54: 471-477.
CrossRef - Chen, Y.M. and L.M. Lee, 2002. Neural networks based scheme for system failure detection and diagnosis. Math. Comput. Simulat., 58: 101-109.
CrossRef - Frank, P.M., 1990. Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy: A survey and some new results. Automatica, 26: 459-474.
CrossRef - Garcia, E. and P. Frank, 1997. Deterministic nonlinear observer based approaches to fault diagnosis a survey. Control Eng. Practice, 5: 663-670.
CrossRef - Isermann, R., 1998. On fuzzy logic applications for automatic control supervision and fault diagnosis. IEEE Trans. Syst. Man Cybernet., 28: 221-235.
CrossRef - Patton, R. and J. Chen, 1997. Observer based fault detection and isolation robustness and applications. Control Eng. Practice, 5: 671-682.
CrossRef - Schneider, H. and P.M. Frank, 1996. Observer based supervision and fault detection in robots using nonlinear and fuzzy logic residual evaluation. IEEE Trans. Control SysT. Technol., 4: 274-282.
CrossRef