Subscribe Now Subscribe Today
Fulltext PDF
Research Article
A Novel Fast and Efficient Evolutionary Method for Optimal Design of Proportional Integral Derivative Controllers for Automatic Voltage Regulator Systems

S.M.A. Mohammadi, A.A. Gharaveisi and M. Mashinchi

An efficient and powerful design method for calculating optimal Proportional-Integral-Derivative (PID) controllers for AVR systems is proposed. The method is an improved version of the Discrete Action Reinforcement Learning Automata (DARLA) while discrete probability functions (DPF) of the design variables are not considered independent. The results of the proposed method called Extended Discrete Action Reinforcement Learning Automata (EDARLA) are compared to the results obtained by the well known Ziegler-Nichols (ZN), conventional DARLA and Genetic Algorithms (GA) and conventional CARLA approaches. The extensive simulation results prove superiority of the proposed design method in terms of optimality, efficiency, computation burden and being less sensitive to the ranges considered for the design variables that is the search space. Besides being successful in providing globally optimal results, due to high efficiency and lower computation time, the proposed approach can be considered an interesting candidate for designing and tuning optimal adaptive PID controllers for many practical systems.

Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

  How to cite this article:

S.M.A. Mohammadi, A.A. Gharaveisi and M. Mashinchi, 2009. A Novel Fast and Efficient Evolutionary Method for Optimal Design of Proportional Integral Derivative Controllers for Automatic Voltage Regulator Systems. Asian Journal of Applied Sciences, 2: 275-295.

DOI: 10.3923/ajaps.2009.275.295



Reinforcement learning (RL) approaches are new but quite promising approaches giving a new scientific insight into the intelligent systems area with immense practical applications (Oh et al., 2000; Kamal and Murata, 2008; Duan et al., 2007; Charvillat and Grigoras, 2007; Howell et al., 1997, 2000, 2001). Reinforcement learning is different to supervised learning, which is a kind of learning being widely studied in current research in machine learning, statistical pattern recognition and artificial neural networks. Supervised learning basically means learning from examples provided by a knowledgeable external supervisor. This is an important kind of learning, but it alone is not adequate without learning from interactions. Interactive problems are often off-line. An agent with its goal embedded in an environment learns how to transform one environmental state into another. The agents with the ability of performing this task with minimal human supervision are called autonomous. Learning from an environment is more robust because agents are directly affected by the dynamics of the environment (the system under control).

Generally, adapting and tuning of the process parameters can be performed by either Continuous Action Reinforcement Learning Automata (CARLA), or Discrete Action Reinforcement Learning Automata (DARLA) (Howell et al., 1997). DARLA uses discrete action space making it more appropriate for the discrete engineering applications. CARLA was actually developed as an extension of the discrete stochastic learning automata methodology. Both DARLA and CARLA operate through interactions with a random or unknown environment by selecting actions in a stochastic trial and error process. CARLA replaces the discrete action space with a continuous one, using continuous probability distributions and hence making it more appropriate for engineering applications with continuous time variables. The only interconnection mechanism between DARLA is provided through the environment and via a shared performance or evaluation function. In each iteration, every action has an associated discrete probability functions (DPF) being used as a basis for its evaluation. The calculation is done separately for n disjoint DPFs where n is the number of parameters must be tuned so that an index function is optimized.

In this study we consider an n-dimensional search space for the state of the environment (system) with a joint multi-variable discrete probability function for each cell in the search space which is updated at discrete samples. As will be shown, DARLA and CARLA methods significantly suffer from being too sensitive to the ranges considered for the design variables. If the ranges are too small, then there will be low chance to reach globally optimal results. On the other side, having large sets for the variables makes the method time consuming. The proposed approach (EDARLA) is potentially promising to achieve better results and can more easily be adjusted so that the speed of convergence is significantly improved.

Numerous control methods such as fuzzy control, adaptive control and neuro-fuzzy control have been studied by Kim and Han (2006), Ying (2000) and Seng et al. (1999). Among them, the best known is the Proportional-Integral-Derivative (PID) controller, which has been widely used in the industry because of its simple structure and robust performance in a wide range of operating conditions. Unfortunately, it is still rather difficult to tune properly the parameters of PID controllers, because many industrial plants are often burdened with problems such as being of high order and existence of nonlinearities (Ho et al., 2006). One of the first methods used as a classical tuning rule was proposed by Ziegler and Nichols (Guillermo et al., 2005). In general, it is often hard to determine optimal or near optimal PID parameters with the ZN method in many industrial plants For these reasons, it is highly desirable to increase the capabilities of PID controllers by adding new researches.

Genetic Algorithm (GA) has recently received much interest for achieving high efficiency and searching globally optimal solution in search space (Chou, 2006; Lin and Xu, 2006). Due to its high potential for global optimization, GA has received great attention in control systems such as the search of optimal PID controller parameters. Although GA has widely been applied to many control systems, its natural genetic operations would still result in enormous computational efforts (Chou, 2006). Though the GA methods have been employed successfully to solve complex optimization problems, recent research has identified some deficiencies in GA performance. This degradation in efficiency is more apparent in applications with the parameters being optimized are highly correlated. So, the crossover and mutation operations cannot ensure better fitness of offspring, because chromosomes of the population have similar structures and their average fitness is high toward the end of the evolutionary process (Liu, 2008). Moreover, the poor premature convergence of GA degrades its performance and reduces its search capability (Liu, 2008).

To explore the superiority of the proposed optimization approach, the EDARLA method has been applied to design a PID controller for an Automatic Voltage Regulator (AVR) system for power generation system as an important industrial plant. The generator excitation system regulates the generator’s voltage and controls the reactive power flow using an AVR system. The role of the AVR is to hold the terminal voltage magnitude of a synchronous generator at a pre-specified level. Hence, the performance and stability of the AVR system seriously affect the security of the whole power system. In this paper, besides demonstrating how to employ the classic CARLA and DARLA as well as the proposed EDARLA methods to obtain the optimal PID controller parameters, it is shown that the proposed method has better performance compared to the conventional reinforcement learning methods.


Here, both CARLA and DARLA techniques are briefly reviewed (Howell et al., 1997).


In order to practically implement CARLA, the probability distributions fi(xi) are stored and updated at successive sample points. The sampled vector xi must be updated after each iteration k according to its updated probability distribution fi(xi, k + 1). Every action set producing some improvements in the system performance achieves a higher performance score denoted by β(k) and their probability of reselection is increased through the corresponding learning sub-system. It is achieved by modifying fi(x) by Gaussian neighborhood function centered at the successful action value. The neighborhood function incorporates in increasing the probability of the original successful action as well as the probability of all actions close to it. The assumption is that the performance surface over a range for each action is continuous and stationary or with slow variation. As the system learns, each probability distribution usually converges to a single Gaussian. Referring to the ith action (parameter), each xi is defined within a pre-specified range [xi(min), xi(xi, k)]. In each of iterations, each new action is randomly chosen based on its probability distribution function fi(xi, k), which is initially a uniform function:


The new action i is selected by:


where, z(k) takes random values uniformly within the range [0,1]. When the set of all updated actions are available, then the set is again evaluated in the environment for an appropriate timeframe and the scalar function Jcal (k) is calculated. Where, Jcal(k) is the cost function at kth iteration and calculated based on the Performance Index (PI) to be optimized. This PI function can generally be defined based on the desired characteristics of the system under control such as steady state error, amplitude of the control signal, overshoot and settling time. Then the multiplier β(k) is calculated as follows:


As seen, the cost Jcal (k) is compared with both average and minimum costs Jmed, Jmin calculated based on a memory set of R previous values. The algorithm uses a reward/inaction rule. When an action set generates a cost below the current median level has no effect (β(k) = 0) and the maximum reinforcement (reward) is also capped at β(k) = 1.

After performance evaluation, each probability density function is updated according to the following rule:


where, H(x,r) is a symmetric Gaussian neighborhood function centered at the action choice r = x(k):


And gh and gw are adjustable constants that influence the speed and resolution of the learning process by adjusting the normalized ’height’ and width of H.

Fig. 1: Learning system by CARLA

The parameter α(k) is chosen according to Eq. 6 to renormalize the distribution functions fort k+1 iteration:


For practical implementation, the distribution functions are each stored at discrete points with equal inter-sample probability and linear interpolation is usually used to determine the values at intermediate positions. A typical layout of the method is shown in Fig. 1 .


In order to implement DARLA, each DPF fi(d) must be stored and updated at discrete sample points. The most efficient data storage can be achieved using equal inter-sample probability rather than equal sampling at d = 1,2,3,...,N, but this imposes some more computational burden. Like CARLA, any action set that produces an improvement in the system performance receives a higher-performance score β(k) and thus its probability of reselection is increased. This is achieved by modifying fi(d) through the use of an Inverse Exponential Neighborhood function centered at the recent successful action. The neighborhood function increases the probability of the original action and the probability of all actions ‘close ’ to that selected as well. The assumption is that the performance surface over a range in each action is discrete and slowly varying. Within each iteration k of the algorithm, each action is chosen based on the corresponding probability distribution function fi(d, k) which is initially chosen to be a uniform function:


The action d is selected by solving:


where, the constant cumulative is uniformly selected at random within the range [0,1]. When all n actions are selected, the set is evaluated in the environment for a suitable time and the scalar cost Jcal(k) is calculated. Performance evaluation is then carried out using Eq. 9 and 10:


Again, the algorithm uses a reward/inaction rule. The action sets generating a cost not better than the current average level receive no reward (β(k) = 0) and the maximum reinforcement (reward) is also capped at β(k) = 1. After performance evaluation, each discrete probability function is updated according to the following rule:


where, Q is a symmetric Inverse Exponential neighborhood function centered on the action choice, r = di(k) :


And λ is an adjustable parameter that influences speed and resolution of the learning. The parameter α(k) is also calculated by Eq. 12 to renormalize the distribution functions in k+1-th iteration:


A typical layout of the learning system by DARLA is shown in Fig. 2 .

Fig. 2: Learning system by DARLA

The Proposed Extended DARLA METHOD (EDARLA)
Let n be the number of parameters must be adjusted so that the PI function reaches its minimum value. We can search for the controller’s parameters in a for each cell dimensional space using a common discrete probability function (CDPF) fX1,X2,...,Xn(x1, x2, ...,xn) for each cell rather than n separate DPFs. The idea behind this strategy is that a CDPF has by far much more information than n separate DPFs. Each cell in the n-dimensional space must be stored and updated at discrete sample points which are here updated to either a new value or zero for simplicity. A typical layout for the proposed method is shown in Fig. 3.

In this method, instead of n times calculating DPFs, we calculate one matrix function so that the speed of convergence increases by efficient matrix calculation algorithms. More importantly, it also potentially improves chance of reaching some better results closer to the globally optimal results. Suppose n = 3 then each three-dimensional probability function forms a cubic space as shown in Fig. 4. The number of distinct values at each dimension is denoted as Ndiv1, Ndiv2 and Ndiv3, respectively.

Within each iteration, each action has an associated probability density function fX(x) being used as the basis for its selection. The action sets improving the system performance, receive a higher-performance score, thus their probability of reselection increases through the learning sub-system. This is achieved by modifying fX(x) through the use of a three-dimensional neighborhood function centered at the successful action. The neighborhood function increases the probability of the original action, as well as the probability of the actions close to the point selected. With all n actions selected, the set is now evaluated in the environment for a suitable time and a scalar cost value Jcal(k) is calculated and compared with a memory set of previous values as described before.

Fig. 3: Learning system by Proposed Algorithm

Fig. 4: A typical three- dimensional probability functional cube

After performing the performance evaluation process, each probability density function is updated according to a specified rule. Furthermore, EDARLA can consider several loops for searching. For example, if the number of loops is two, then the optimization is done through two stages (loops). In the first stage, the best subsection of the search space is found, then in the second stage, the algorithm proceeds searching for the best results within the determined subsection. This way, the algorithm would be much faster and less sensitive to the size of the initial search space defined as will be proved by various simulation results. The proposed method can be summarized as follows:

Step 1
Initialize the number of design variables (Npar) depending on the system under study, the number of divisions for each parameter (Ndiv) which, the number of iterations (Niter) and finally the number of loops (Nloop) which determines the number of subsections.

Step 2
Start the loop.

Step 3
Generate a Npar-dimensional space and then set a Npar-dimensional uniform CDPF matrix which has (Ndiv)Npar cells and each cell has its own CDPF which is initially the same for all cells, but it must be updated in each iteration. The calculation of CDPF is very important, as the best cell has the highest CDPF.

Step 4
Start the iterations for the current loop which of course depends on its number of iterations specified and the number of loops.

Step 5
Select at random a cell with respect to its CDPF from the cubic structure shown so that the selection probability is proportional to its CDPF. It must be noted that there are different ways for this selection such as roulette wheel selection method used in this study (Liu, 2008). This step is performed according to the following three sub-steps:

Suppose a circle that has (Ndiv)Npar sectors. Which each sector of the circle is respect to a cell. Angle of each sector calculates by:


Now, select sector θrand at random (According to the CDPFs) within [0°, 360°]:


If θi, θrand ≤ θi+1 for some i, then sector i in the original search space is selected which is recognized by its index, (i1, i2,..., iNpar), in the cubic structure.

Step 6
The selected index I s applied to Eq. 15 and Npar parameters (Npar actions) are obtained, the set is evaluated in the environment (control system) and the scalar cost Jcal(k) is calculated.


where, P(i), l(i) and G(i) are the value of the ith parameter or action, length of ith interval and beginning of ith interval, respectively.

Step 7
Evaluate the cost function according to the performance index of the system under control such as time and rate of error, control input, overshoot, steady-state error, etc. There are some coefficients which are the weighting elements. After the transient response, using a record of the system response, calculate the cost function (Jcal(k)) which is minimum for the best cell. The procedure compares the calculated cost function of the selected cell with the minimum of the cost function. It must be noted that in the first iteration, the minimum performance index is the same recently calculated performance index. The parameter β(k) is updated according to the following rule in which k is the number of iteration:


If calculated cost function is improved (Jcal(k)<Jmin), then go to step 8, else set the CDPF of the selected cell to zero and then go to step 9. This modification has improved the efficiency of the EDARLA.

Step 8
Take the calculated cost function (Jcal(k)) as the minimum cost function (Jmin) , selected cell as the New-Group and update the CDPF matrix as described below:


where, Indx_Cell is the index of each cell and New_Group (k) is the current group’s center. One can compare the procedure with the original DARLA (11). The same normalization task is performed similar to (12).

Step 9
If the iteration is finished, go to the step 2 for another loop and repeat the process to the Final-Group, else go to the step 5.

The flowchart of the proposed reinforcement learning method is given by Fig. 5.

Fig. 5: The flowchart of the proposed reinforcement learning method (EDARLA)


So, far, PID controllers have widely been used in process control. With simple structure, they yet can effectively control various large industrial processes. There are many tuning approaches for these controllers, but each has own disadvantages or limitations. As a result, the design of PID controllers still remains a remarkable challenge for researchers. In simple words, the PID controller is used to improve the dynamic response as well as reduce or eliminate the steady-state error. The derivative term normally adds a finite zero to the open loop plant transfer function and can improve the transient response in most cases.

The integral term adds a pole at origin resulting in increasing the system type and therefore reducing the steady-state error. Furthermore, this controller is often regarded as an almost robust controller. As a result, they may also control uncertain processes. The well-known PID controller transfer function is as follows:


One of the important approaches used to design and tune the PID controllers including those are being used in AVR systems is the well-known Ziegler and Nichols (ZN) approach ( Guillermo et al., 2005). ZN is a well popular and interesting approach originated by Ziegler and Nichols in 1942 (Guillermo, 2005) and later extended in 1984 by Astrom and Hagglund (Astrom , 2006). In this paper the proposed EDARLA is used as an automatic technique for optimal designing the PID parameters for a practical high order AVR system. The design method is fast, robust and with adaptation ability. The application results and comparisons with DARLA and ZN methods are given in the next section.

AVR System Under Study
The responsibility of an AVR is to hold the terminal voltage of a synchronous generator at a specified level. Hence, the performance and stability of the AVR seriously affects the security of the power system. In this paper, the AVR system under study has been modeled based on IEEE standard 421.5 (Gaing, 2004). The model takes into account all the major time constants and saturation effect and other nonlinearities.

The transfer functions of the AVR components can be represented as follow (Gaing, 2004):

Amplifier Model
The amplifier model is represented by a gain KA and a time constant τA, the transfer function is given by:


Typical value of KA is in the range of 10 to 400. The amplifier time constant is very small ranging form 0.02 to 0.1sec.

Exciter Model
The transfer function of a modern exciter may be represented by a gain KE and a single time constant τE as:


Typical value of KE is in the range of 10 to 400. The time constant τE is in the range of 0.5 to 1.0 sec.

Generator Model
In the linearized model, the transfer function relating the generator terminal voltage to its field voltage can be represented by a gain KG and a time constant τG as:


These constants are load dependent, KG may vary between 0.7 to 1.0 and τG between 1.0 and 2.0 sec from full load to no load, respectively.

Fig. 6: Block diagram of the AVR System with PID controller

Sensor Model
The sensor is modeled by a simple first-order transfer function, given by:


where, τR is very small, ranging from of 0.001 to 0.06 sec.

The block diagram of the AVR system with the PID controller is shown in Fig. 6. The Fig. 6 does not show the saturation effects, but they are fully considered in all design steps and simulations.

Performance Index
Generally, in many traditional optimal PID controller design approaches, some well-known performance indexes or performance criteria such as the integrated absolute error (IAE), the integral of squared-error (ISE), or the integrated of time-weighted- squared-error (ITSE) are widely used. Each of the three integral performance criteria has its own features. For example, a disadvantage of the IAE and ISE criteria is that its minimization can result in a response with relatively small overshoot but long settling time. That is because the ISE performance criterion equally weights the error over time. The ITSE criterion can overcome this disadvantage, but analytically derivation of the controller is rather complicated and time-consuming. The aforementioned classic IAE, ISE and ITSE performance criteria are shown as follow, where e(t) is the error between the desired and real output quantities:




Despite the existence of some classic performance indexes addressed, a more effective performance criterion in time domain is here suggested for designing the PID controller. The new performance criterion J is defined as follows and considers some broader requirements in more explicit manner:


where, T is the total simulation time and it is T = 20 sec for this study, e(t) is the tracking error , uc(t) is the control input , Mp is the amount of the overshoot, ESS is the steady-state error at t = T and G-coefficients are the weighting elements ( Ge = 10, Gu = 1 ,GM = 10, Gs = 10, Gd = 5). The parameters of the PID controller are calculated based on the following approaches and the results are compared:

Conventional DARLA
Conventional Ziegler-Nichols
Genetic Algorithm (GA)
Conventional CARLA (Kashki et al., 2006)


Performance of the EDARLA
For the practical AVR system under study, the results show that the proposed method can perform an excellent search for the optimal PID controller parameters quickly. As will be shown through extensive simulation studies, one of drawbacks of the DARLA is that it is quite sensitive to the initial ranges considered for the design variables forming the overall search space. The proposed method is more efficient as well as more robust against the search space volume. For this comparative study, fortunately, the normal ranges of three controller parameters that is Kp, Ki and Kd considered in other reference (Gaing, 2004) is available. The AVR system parameters are show in Table 1 (Gaing, 2004). Some other symbols used in this study are shown in Table 2. Figure 7 shows the original step response of the AVR system without controller. In this case, Mp% = 50.51, Ess% = 9.06, Tr = 0.273sec and Ts = 5.565 sec. As seen, the results are not satisfactory for a practical system.

Also, The PID controller parameters calculated by Ziegler-Nichols method are as follow:


Here, the parameters of the PID controller are calculated by the proposed EDARLA. Considering the results already reported by Kashki et al. (2006), the ranges of the three parameters Kp, Ki and Kd are taken as [0,1.5], [0,1] and [0 1], respectively.

Table 1: The AVR system parameters

Table 2: The used symbols

For calculation of the PID parameters, each interval is divided into 5 slots in the first, second and three loops. The simulation results of the EDARLA for different number of loops and different number of total iterations are summarized in Table 3. It can be seen that the final results are quite interesting and excellent results can be obtained through even less than 50 iterations. Moreover, comparing cases 2 and 3 ( with the same number or iterations equal 20) reveals that having two loops rather than one loop can clearly result in better results as reflected in the final cost functions of the addressed case studies. Also, Fig. 8 and 9 show the convergence characteristics of the proposed method and terminal voltage step response of the AVR system respectively for different simulation conditions. As seen, through about 50 iterations, the EDARLA method successfully converges and provides good performance. The results prove that the proposed method (EDARLA) can search for optimal PID controller parameters quickly and efficiently.

Table 3: The Results of Simulation for proposed method (EDARLA)

Fig. 7: Step response of the AVR system without controller

Fig. 8: Convergence tendency of the EDARLA (case 7)

Fig. 9: Terminal voltage step response of the AVR system with optimal PID controller (EDARLA)

Fig. 10: Comparison of ZN and EDARLA methods

Comparison between the Proposed EDARLA, DARLA and ZN Methods
Figure 10-12 compare the results ZN, DARLA and EDARLA methods. The EDARLA has totally run for 50 iterations only while the DARLA has trained for 200 iterations and as seen EDARLA provides much better results.

Fig. 11: The Control Signal of ZN and EDARLA methods

Fig. 12: Performance index of DARLA and EDARLA

Fig. 13: The responses of DARLA and EDARLA methods

Also, Fig. 13 shows the convergence curves and step response of the AVR for both DARLA and EDARLA methods after 200 iterations. Again, one can clearly see that the EDARLA has provided much better results with faster convergence.

Robustness to the Design Variables Ranges
As Already addressed, one of key issues in both DARLA and CARLA methods is the range of each design variable need to be pre-defined. This is not a trivial task, as the designer cannot always guess some appropriate ranges. If the ranges are too small, that makes the overall search space too small that the procedure may fail to find acceptable optimal points. On the other side, large ranges can make the algorithm too time consuming. Even worth, for a fixed number of slots (Ndiv), larger ranges may lead to even worth results as shown in Table 4 and 5.

Table 4: Comparison of proposed method (EDARLA) and conventional DARLA run for 200 iterations

Table 5: Effect of the search space: A comparison between the proposed EDARLA and the conventional DARLA

The tables show the comparative results of the conventional DARLA and the proposed EDARLA. Both methods have run for 200 iterations. As the results show, the DARLA fails to reach good results even for small ranges. Moreover, as the ranges increase, the overall performance index increases. The changes in the performance index, rise time, overshoot and overshoot is notably less for the EDARLA. This feature is quite important, because as already mentioned, in most cases, the best ranges for the search space cannot be determined exactly. Hence, a need for a heuristic approach such as EDARLA is apparent. The Results prove the good robustness of the EDARLA and explain how the conventional DARLA is too sensitive to the ranges taken for its variable. So, the most efficient, robust and optimal results can be gained by the proposed method (EDARLA). In the next subsections performance, efficiency and sensitivity of GA and CARLA approaches are also investigated.

Comparison between the Proposed EDARLA and GA Methods
In order to more highlight the advantages of the proposed method, we also implemented the GA (Genetic Algorithm) method (Gaing, 2004). The characteristics of the two controllers using the same performance index as defined by (22) are compared. The GA parameters are as follow:

Population size = 25
Crossover rate (Pc) = 0.75
Mutation rate (Pm) = 0.0075

Table 6 shows the GA approach results for different ranges of the parameters. As seen the GA does not provide good results even over 200 populations. It is also more sensitive to the variables ranges. Figure 14-16 compare the step responses of the AVR system when GA and EDARLA methods are used for training the PID controller for some different ranges of the parameters. As seen, EDARLA provides the best results in comparison with the results of the other methods discussed. Also, Fig. 17 compares the convergence curve of GA and EDARLA methods. It must be noted that while both methods are executed for 200 iterations, the EDARLA needs 200 fitness function evaluations, but the GA needs 200x25 function evaluations, because each population has 25 elements. That makes the GA too time consuming in comparison to DARLA and EDARLA.

Comparison Between the Proposed EDARLA and CARLA Methods
To give further insight to the horizon of the evolutionary optimization approaches used for design of controllers as whole and PID controllers of AVR systems in particular, the optimization performance of the classic CARLA (Kashki et al., 2006) is also evaluated.

Table 6: Simulation results for designing PID controller by GA: sensitivity analysis

Fig. 14: The responses of GA and EDARLA methods (range of parameters : [0 1.5])

Fig. 15: The responses of GA and EDARLA methods (range of parameters : [0 10])

Fig. 16: The responses of GA and EDARLA methods (range of parameters : [0 20]) (Number of Iterations for EDARLA and Number of Generations for GA : 200)

The results can be compared to the other methods already discussed including the EDARLA. Table 7 depicts the performance of the CARDA run for 200 iterations, as seen, it provides good results when the search space is properly small (cases 1 and 2). It is however very sensitive to the ranges taken for the design parameters and provides poor results for large ranges.

Table 7: Simulation results for designing PID controller by CARLA
gh = 0.75, gw = 0.008

Fig. 17: Performance index of GA and EDARLA methods (range of parameters : [0 20]) (Number of Iterations for EDARLA and Number of Generations for GA : 200)

Fig. 18: The responses of CARLA and EDARLA (range of parameters : [0 1.5])

Fig. 19: The responses of CARLA and EDARLA (range of parameters : [0 5])

However, it provides better results than GA. It should also be noted that each iteration of the CARDA is a bit more time consuming than DARLA’s. Figure 18-20 compare the step responses of the AVR system when CARLA and EDARLA methods are used.

Fig. 20: The responses of CARLA and EDARLA (range of parameters : [0 15])

Fig. 21: Comparison of the performance index for CARLA and EDARLA range of parameters : [0 15])

Fig. 22: Step response of the system before and after retuning by EDARLA

As seen, EDARLA provides the best results in comparison with the results of the other methods. Also, Fig. 21 compares the convergence curve of CARLA and EDARLA methods confirming the superior behavior of the proposed method.

Adaptive PID Controller Tuning Using EDARLA
Thank to the efficiency of the EDARLA, it is quite easy to quickly retune the controller parameters once a change occurs in the plant parameters. That change, of course, should be identified by a proper parameter identification method. The generator parameters are change as KG = 0.7 and TG = 2 which is a major change that is the generator is considered no-load with a high time-constant significantly affects the overall transient response of the AVR system. The EDARLA algorithm has run for only 20 more iteration to retune the PID controller. The updated PID parameters are: Kp = 0.9293, Ki = 0.1507 and Kd = 0.452. Figure 22 shows the step response of the system before and after retuning and as seen the adaptation mechanism has provided much good results.


This study introduces a new intelligent method for optimizing the parameters of the PID controller for an AVR system. The proposed method is an extended version of DARLA method called EDARLA which optimizes the controller parameters while the variables are not considered independent opposed to the classic DARLA. By using matrix calculation, the speed of convergence can be increased and the system can be used for many real time applications. Superior performance, robustness and efficiency of the proposed method have been proved through extensive simulation results including comparisons with CARLA, GA, Ziegler-Nichols and DARLA methods. The extensive studies carried out clearly proves that the proposed approach is an excellent candidate for optimizing various control problems including adaptive control system thank to its high efficiency, speed and robustness.


The authors wish to thank the Fuzzy Systems and Applications Center of Excellence, Shahid Bahonar University of Kerman, Kerman, Iran.

Astrom, K.J. and T. Hagglund, 2006. Advanced PID Control. illustrated Edn., ISA., USA., ISBN-10: 1556179421, pp: 461.

Charvillat, V. and R. Grigoras, 2007. Reinforcement learning for dynamic multimedia adaptation. J. Net. Comput. Appl., 30: 1034-1058.
CrossRef  |  

Chou, C.H., 2006. Genetic algorithm-based optimal fuzzy controller design in the linguistic space. IEEE Trans. Fuzzy Syst., 14: 372-385.
CrossRef  |  

Duan, Y., Q. Liu and X. Xu, 2007. Application of reinforcement learning in robot soccer. Eng. Appl. Artific. Intell., 20: 936-950.
CrossRef  |  

Gaing, Z.L., 2004. A particle swarm optimization approach for optimum design of PID controller in AVR system. IEEE Trans. Energy Conver., 19: 384-391.
CrossRef  |  PubMed  |  Direct Link  |  

Guillermo, J.S., D. Aniruddha and S.P. Bhattacharyya, 2005. PID Controllers for Time-Delay Systems. 1st Edn., Birkhauser, Boston, ISBN: 0-8176-4266-8, pp: 330.

Ho, S.J., S. Li-Sun and H. Shinn-Ying, 2006. Optimizing fuzzy neural networks for tuning PID controllers using an orthogonal simulated annealing algorithm OSA. Fuzzy Syst. IEEE Trans., 14: 421-434.
CrossRef  |  

Howell, M.N. and M.C. Best, 2000. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Eng. Practice, 8: 147-154.
CrossRef  |  

Howell, M.N. and T.J. Gordon, 2001. Continuous action reinforcement learning automata and their application to adaptive digital filter design. Eng. Appl. Artific. Intell., 14: 549-561.
CrossRef  |  

Howell, M.N., G.P. Frost, T.J. Gordon and Q.H. Wu, 1997. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics, 7: 263-276.
CrossRef  |  

Kamal, M.A.S. and J. Murata, 2008. Reinforcement learning for problems with symmetrical restricted states. Robotics Autonomous Syst., 56: 717-727.
CrossRef  |  

Kashki, M., A.A. Gharaveisi and F. Kharaman, 2006. Application of cdcarla technique in designing takagi- sugeno fuzzy logic power system stabilizer (PSS), Proceedings of the International Conference on Power and Energy, November 28-29, 2006, IEEE Computer Society Press, pp: 280-285.

Kim, S.M. and W.Y. Han, 2006. Induction motor servo drive using robust PID-like neuro-fuzzy controller. Cont. Engin. Pract., 14: 481-487.
CrossRef  |  

Lin, C.J. and Y.J. Xu, 2006. A novel genetic reinforcement learning for nonlinear fuzzy control problems. Neurocomputing, 69: 2078-2089.
CrossRef  |  

Liu, B., 2008. Theory and Practice of Uncertain Programming. 3rd Edn., Uncertainty Theory Laboratory, Department of Mathematical Sciences Tsinghua University Beijing, China.

Oh, S.Y., J.H. Lee and D.H. Choi, 2000. A new reinforcement learning vehicle control architecture for vision-based road following. IEEE Trans. Vehicular Technol., 49: 997-1005.
CrossRef  |  

Seng, T.L., M.B. Khalid and R. Yusof, 1999. Tuning of a neuro-fuzzy controller by genetic algorithm. IEEE Trans. Syst. Man Cybern. Part B, 29: 226-236.
CrossRef  |  

Ying, H., 2000. Theory and application of a novel fuzzy PID controller using a simplified takagi-sugeno rule scheme. Inform. Sci., 123: 281-293.
PubMed  |  

©  2019 Science Alert. All Rights Reserved
Fulltext PDF References Abstract