INTRODUCTION
Reinforcement learning (RL) approaches are new but quite promising approaches
giving a new scientific insight into the intelligent systems area with immense
practical applications (Oh et al., 2000; Kamal
and Murata, 2008; Duan et al., 2007; Charvillat
and Grigoras, 2007; Howell et al., 1997, 2000,
2001). Reinforcement learning is different to supervised
learning, which is a kind of learning being widely studied in current research
in machine learning, statistical pattern recognition and artificial neural networks.
Supervised learning basically means learning from examples provided by a knowledgeable
external supervisor. This is an important kind of learning, but it alone is
not adequate without learning from interactions. Interactive problems are often
offline. An agent with its goal embedded in an environment learns how to transform
one environmental state into another. The agents with the ability of performing
this task with minimal human supervision are called autonomous. Learning from
an environment is more robust because agents are directly affected by the dynamics
of the environment (the system under control).
Generally, adapting and tuning of the process parameters can be performed by
either Continuous Action Reinforcement Learning Automata (CARLA), or Discrete
Action Reinforcement Learning Automata (DARLA) (Howell et
al., 1997). DARLA uses discrete action space making it more appropriate
for the discrete engineering applications. CARLA was actually developed as an
extension of the discrete stochastic learning automata methodology. Both DARLA
and CARLA operate through interactions with a random or unknown environment
by selecting actions in a stochastic trial and error process. CARLA replaces
the discrete action space with a continuous one, using continuous probability
distributions and hence making it more appropriate for engineering applications
with continuous time variables. The only interconnection mechanism between DARLA
is provided through the environment and via a shared performance or evaluation
function. In each iteration, every action has an associated discrete probability
functions (DPF) being used as a basis for its evaluation. The calculation is
done separately for n disjoint DPFs where n is the number of parameters must
be tuned so that an index function is optimized.
In this study we consider an ndimensional search space for the state of the environment (system) with a joint multivariable discrete probability function for each cell in the search space which is updated at discrete samples. As will be shown, DARLA and CARLA methods significantly suffer from being too sensitive to the ranges considered for the design variables. If the ranges are too small, then there will be low chance to reach globally optimal results. On the other side, having large sets for the variables makes the method time consuming. The proposed approach (EDARLA) is potentially promising to achieve better results and can more easily be adjusted so that the speed of convergence is significantly improved.
Numerous control methods such as fuzzy control, adaptive control and neurofuzzy
control have been studied by Kim and Han (2006), Ying
(2000) and Seng et al. (1999). Among them,
the best known is the ProportionalIntegralDerivative (PID) controller, which
has been widely used in the industry because of its simple structure and robust
performance in a wide range of operating conditions. Unfortunately, it is still
rather difficult to tune properly the parameters of PID controllers, because
many industrial plants are often burdened with problems such as being of high
order and existence of nonlinearities (Ho et al.,
2006). One of the first methods used as a classical tuning rule was proposed
by Ziegler and Nichols (Guillermo et al., 2005).
In general, it is often hard to determine optimal or near optimal PID parameters
with the ZN method in many industrial plants For these reasons, it is highly
desirable to increase the capabilities of PID controllers by adding new researches.
Genetic Algorithm (GA) has recently received much interest for achieving high
efficiency and searching globally optimal solution in search space (Chou,
2006; Lin and Xu, 2006). Due to its high potential
for global optimization, GA has received great attention in control systems
such as the search of optimal PID controller parameters. Although GA has widely
been applied to many control systems, its natural genetic operations would still
result in enormous computational efforts (Chou, 2006).
Though the GA methods have been employed successfully to solve complex optimization
problems, recent research has identified some deficiencies in GA performance.
This degradation in efficiency is more apparent in applications with the parameters
being optimized are highly correlated. So, the crossover and mutation operations
cannot ensure better fitness of offspring, because chromosomes of the population
have similar structures and their average fitness is high toward the end of
the evolutionary process (Liu, 2008). Moreover, the poor
premature convergence of GA degrades its performance and reduces its search
capability (Liu, 2008).
To explore the superiority of the proposed optimization approach, the EDARLA method has been applied to design a PID controller for an Automatic Voltage Regulator (AVR) system for power generation system as an important industrial plant. The generator excitation system regulates the generator’s voltage and controls the reactive power flow using an AVR system. The role of the AVR is to hold the terminal voltage magnitude of a synchronous generator at a prespecified level. Hence, the performance and stability of the AVR system seriously affect the security of the whole power system. In this paper, besides demonstrating how to employ the classic CARLA and DARLA as well as the proposed EDARLA methods to obtain the optimal PID controller parameters, it is shown that the proposed method has better performance compared to the conventional reinforcement learning methods.
CARLA AND DARLA TECHNIQUES
Here, both CARLA and DARLA techniques are briefly reviewed (Howell
et al., 1997).
CARLA
In order to practically implement CARLA, the probability distributions f_{i}(x_{i})
are stored and updated at successive sample points. The sampled vector x_{i}
must be updated after each iteration k according to its updated probability
distribution f_{i}(x_{i}, k + 1). Every action set producing
some improvements in the system performance achieves a higher performance score
denoted by β(k) and their probability of reselection is increased through
the corresponding learning subsystem. It is achieved by modifying f_{i}(x)
by Gaussian neighborhood function centered at the successful action value. The
neighborhood function incorporates in increasing the probability of the original
successful action as well as the probability of all actions close to it. The
assumption is that the performance surface over a range for each action is continuous
and stationary or with slow variation. As the system learns, each probability
distribution usually converges to a single Gaussian. Referring to the ith action
(parameter), each x_{i} is defined within a prespecified range [x_{i}(min),
x_{i}(x_{i}, k)]. In each of iterations, each new action is
randomly chosen based on its probability distribution function f_{i}(x_{i},
k), which is initially a uniform function:
The new action i is selected by:
where, z(k) takes random values uniformly within the range [0,1]. When the
set of all updated actions are available, then the set is again evaluated in
the environment for an appropriate timeframe and the scalar function J_{cal}
(k) is calculated. Where, J_{cal}(k) is the cost function at
kth iteration and calculated based on the Performance Index (PI) to be optimized.
This PI function can generally be defined based on the desired characteristics
of the system under control such as steady state error, amplitude of the control
signal, overshoot and settling time. Then the multiplier β(k) is calculated
as follows:
As seen, the cost J_{cal }(k) is compared with both average and minimum
costs J_{med}, J_{min} calculated based on a memory set of R
previous values. The algorithm uses a reward/inaction rule. When an action set
generates a cost below the current median level has no effect (β(k) = 0)
and the maximum reinforcement (reward) is also capped at β(k) = 1.
After performance evaluation, each probability density function is updated
according to the following rule:
where, H(x,r) is a symmetric Gaussian neighborhood function centered at the
action choice r = x(k):
And g_{h} and g_{w} are adjustable constants that influence
the speed and resolution of the learning process by adjusting the normalized
’height’ and width of H.

Fig. 1: 
Learning system by CARLA 
The parameter α(k) is chosen according to Eq. 6 to
renormalize the distribution functions fort k+1 iteration:
For practical implementation, the distribution functions are each stored at
discrete points with equal intersample probability and linear interpolation
is usually used to determine the values at intermediate positions. A typical
layout of the method is shown in Fig. 1 .
DARLA
In order to implement DARLA, each DPF f_{i}(d) must be stored and updated
at discrete sample points. The most efficient data storage can be achieved using
equal intersample probability rather than equal sampling at d = 1,2,3,...,N,
but this imposes some more computational burden. Like CARLA, any action set
that produces an improvement in the system performance receives a higherperformance
score β(k) and thus its probability of reselection is increased. This is
achieved by modifying f_{i}(d) through the use of an Inverse Exponential
Neighborhood function centered at the recent successful action. The neighborhood
function increases the probability of the original action and the probability
of all actions ‘close ’ to that selected as well. The assumption
is that the performance surface over a range in each action is discrete and
slowly varying. Within each iteration k of the algorithm, each action is chosen
based on the corresponding probability distribution function f_{i}(d,
k) which is initially chosen to be a uniform function:
The action d is selected by solving:
where, the constant cumulative is uniformly selected at random within the range
[0,1]. When all n actions are selected, the set is evaluated in the environment
for a suitable time and the scalar cost J_{cal}(k) is calculated. Performance
evaluation is then carried out using Eq. 9 and 10:
Again, the algorithm uses a reward/inaction rule. The action sets generating
a cost not better than the current average level receive no reward (β(k)
= 0) and the maximum reinforcement (reward) is also capped at β(k) = 1.
After performance evaluation, each discrete probability function is updated
according to the following rule:
where, Q is a symmetric Inverse Exponential neighborhood function centered
on the action choice, r = d_{i}(k) :
And λ is an adjustable parameter that influences speed and resolution
of the learning. The parameter α(k) is also calculated by Eq.
12 to renormalize the distribution functions in k+1th iteration:
A typical layout of the learning system by DARLA is shown in Fig. 2 .

Fig. 2: 
Learning system by DARLA 
The Proposed Extended DARLA METHOD (EDARLA)
Let n be the number of parameters must be adjusted so that the PI function
reaches its minimum value. We can search for the controller’s parameters
in a for each cell dimensional space using a common discrete probability function
(CDPF) f_{X1},_{X2},...,_{Xn}(x_{1}, x_{2},
...,x_{n}) for each cell rather than n separate DPFs. The idea behind
this strategy is that a CDPF has by far much more information than n separate
DPFs. Each cell in the ndimensional space must be stored and updated at discrete
sample points which are here updated to either a new value or zero for simplicity.
A typical layout for the proposed method is shown in Fig. 3.
In this method, instead of n times calculating DPFs, we calculate one matrix function so that the speed of convergence increases by efficient matrix calculation algorithms. More importantly, it also potentially improves chance of reaching some better results closer to the globally optimal results. Suppose n = 3 then each threedimensional probability function forms a cubic space as shown in Fig. 4. The number of distinct values at each dimension is denoted as Ndiv1, Ndiv2 and Ndiv3, respectively.
Within each iteration, each action has an associated probability density function
f_{X}(x) being used as the basis for its selection. The action sets
improving the system performance, receive a higherperformance score, thus their
probability of reselection increases through the learning subsystem. This is
achieved by modifying f_{X}(x) through the use of a threedimensional
neighborhood function centered at the successful action. The neighborhood function
increases the probability of the original action, as well as the probability
of the actions close to the point selected. With all n actions selected,
the set is now evaluated in the environment for a suitable time and a scalar
cost value J_{cal}(k) is calculated and compared with a memory set of
previous values as described before.

Fig. 3: 
Learning system by Proposed Algorithm 

Fig. 4: 
A typical three dimensional probability functional cube 
After performing the performance evaluation process, each probability density
function is updated according to a specified rule. Furthermore, EDARLA can consider
several loops for searching. For example, if the number of loops is two, then
the optimization is done through two stages (loops). In the first stage, the
best subsection of the search space is found, then in the second stage, the
algorithm proceeds searching for the best results within the determined subsection.
This way, the algorithm would be much faster and less sensitive to the size
of the initial search space defined as will be proved by various simulation
results. The proposed method can be summarized as follows:
Step 1
Initialize the number of design variables (Npar) depending on the system
under study, the number of divisions for each parameter (Ndiv) which, the number
of iterations (Niter) and finally the number of loops (Nloop) which determines
the number of subsections.
Step 2
Start the loop.
Step 3
Generate a Npardimensional space and then set a Npardimensional uniform
CDPF matrix which has (Ndiv)^{Npar} cells and each cell has its own
CDPF which is initially the same for all cells, but it must be updated in each
iteration. The calculation of CDPF is very important, as the best cell has the
highest CDPF.
Step 4
Start the iterations for the current loop which of course depends on its
number of iterations specified and the number of loops.
Step 5
Select at random a cell with respect to its CDPF from the cubic structure shown
so that the selection probability is proportional to its CDPF. It must be noted
that there are different ways for this selection such as roulette wheel selection
method used in this study (Liu, 2008). This step is performed
according to the following three substeps:
• 
Suppose a circle that has (N_{div})^{Npar}
sectors. Which each sector of the circle is respect to a cell. Angle of
each sector calculates by: 
• 
Now, select sector θ_{rand} at random (According
to the CDPFs) within [0°, 360°]: 
• 
If θ_{i}, θ_{rand} ≤
θ_{i+1} for some i, then sector i in the original search
space is selected which is recognized by its index, (i_{1}, i_{2},...,
i_{Npar}), in the cubic structure. 
Step 6
The selected index I s applied to Eq. 15 and N_{par}
parameters (N_{par} actions) are obtained, the set is evaluated in the
environment (control system) and the scalar cost J_{cal}(k) is calculated.
where, P(i), l(i) and G(i) are the value of the ith parameter or action, length
of ith interval and beginning of ith interval, respectively.
Step 7
Evaluate the cost function according to the performance index of the system
under control such as time and rate of error, control input, overshoot, steadystate
error, etc. There are some coefficients which are the weighting elements. After
the transient response, using a record of the system response, calculate the
cost function (J_{cal}(k)) which is minimum for the best cell. The procedure
compares the calculated cost function of the selected cell with the minimum
of the cost function. It must be noted that in the first iteration, the minimum
performance index is the same recently calculated performance index. The parameter
β(k) is updated according to the following rule in which k is the number
of iteration:
If calculated cost function is improved (J_{cal}(k)<J_{min}),
then go to step 8, else set the CDPF of the selected cell to zero and then go
to step 9. This modification has improved the efficiency of the EDARLA.
Step 8
Take the calculated cost function (J_{cal}(k)) as the minimum cost
function (J_{min}) , selected cell as the NewGroup and update the CDPF
matrix as described below:
where, Indx_Cell is the index of each cell and New_Group (k) is the current group’s center. One can compare the procedure with the original DARLA (11). The same normalization task is performed similar to (12).
Step 9
If the iteration is finished, go to the step 2 for another loop and repeat
the process to the FinalGroup, else go to the step 5.
The flowchart of the proposed reinforcement learning method is given by Fig. 5.

Fig. 5: 
The flowchart of the proposed reinforcement learning method
(EDARLA) 
THE PROPOSED OPTIMAL PID CONTROLLER FOR AVR SYSTEM
So, far, PID controllers have widely been used in process control. With simple structure, they yet can effectively control various large industrial processes. There are many tuning approaches for these controllers, but each has own disadvantages or limitations. As a result, the design of PID controllers still remains a remarkable challenge for researchers. In simple words, the PID controller is used to improve the dynamic response as well as reduce or eliminate the steadystate error. The derivative term normally adds a finite zero to the open loop plant transfer function and can improve the transient response in most cases.
The integral term adds a pole at origin resulting in increasing the system
type and therefore reducing the steadystate error. Furthermore, this controller
is often regarded as an almost robust controller. As a result, they may also
control uncertain processes. The wellknown PID controller transfer function
is as follows:
One of the important approaches used to design and tune the PID controllers
including those are being used in AVR systems is the wellknown Ziegler and
Nichols (ZN) approach ( Guillermo et al., 2005).
ZN is a well popular and interesting approach originated by Ziegler and Nichols
in 1942 (Guillermo, 2005) and later extended in 1984 by
Astrom and Hagglund (Astrom , 2006). In this paper the
proposed EDARLA is used as an automatic technique for optimal designing the
PID parameters for a practical high order AVR system. The design method is fast,
robust and with adaptation ability. The application results and comparisons
with DARLA and ZN methods are given in the next section.
AVR System Under Study
The responsibility of an AVR is to hold the terminal voltage of a synchronous
generator at a specified level. Hence, the performance and stability of the
AVR seriously affects the security of the power system. In this paper, the AVR
system under study has been modeled based on IEEE standard 421.5 (Gaing,
2004). The model takes into account all the major time constants and saturation
effect and other nonlinearities.
The transfer functions of the AVR components can be represented as follow (Gaing,
2004):
Amplifier Model
The amplifier model is represented by a gain K_{A} and a time constant
τ_{A}, the transfer function is given by:
Typical value of K_{A} is in the range of 10 to 400. The amplifier
time constant is very small ranging form 0.02 to 0.1sec.
Exciter Model
The transfer function of a modern exciter may be represented by a gain K_{E}
and a single time constant τ_{E} as:
Typical value of K_{E} is in the range of 10 to 400. The time constant
τ_{E} is in the range of 0.5 to 1.0 sec.
Generator Model
In the linearized model, the transfer function relating the generator terminal
voltage to its field voltage can be represented by a gain K_{G} and
a time constant τ_{G} as:
These constants are load dependent, K_{G} may vary between 0.7 to 1.0
and τ_{G} between 1.0 and 2.0 sec from full load to no load, respectively.

Fig. 6: 
Block diagram of the AVR System with PID controller 
Sensor Model
The sensor is modeled by a simple firstorder transfer function, given by:
where, τ_{R} is very small, ranging from of 0.001 to 0.06 sec.
The block diagram of the AVR system with the PID controller is shown in Fig. 6. The Fig. 6 does not show the saturation effects, but they are fully considered in all design steps and simulations.
Performance Index
Generally, in many traditional optimal PID controller design approaches,
some wellknown performance indexes or performance criteria such as the integrated
absolute error (IAE), the integral of squarederror (ISE), or the integrated
of timeweighted squarederror (ITSE) are widely used. Each of the three integral
performance criteria has its own features. For example, a disadvantage of the
IAE and ISE criteria is that its minimization can result in a response with
relatively small overshoot but long settling time. That is because the ISE performance
criterion equally weights the error over time. The ITSE criterion can overcome
this disadvantage, but analytically derivation of the controller is rather complicated
and timeconsuming. The aforementioned classic IAE, ISE and ITSE performance
criteria are shown as follow, where e(t) is the error between the desired and
real output quantities:
Despite the existence of some classic performance indexes addressed, a more
effective performance criterion in time domain is here suggested for designing
the PID controller. The new performance criterion J is defined as follows and
considers some broader requirements in more explicit manner:
where, T is the total simulation time and it is T = 20 sec for this study,
e(t) is the tracking error , u_{c}(t) is the control input , M_{p}
is the amount of the overshoot, E_{SS} is the steadystate error at
t = T and Gcoefficients are the weighting elements ( G_{e} = 10, G_{u}
= 1 ,G_{M} = 10, G_{s} = 10, G_{d} = 5). The parameters
of the PID controller are calculated based on the following approaches and the
results are compared:
SIMULATION RESULTS
Performance of the EDARLA
For the practical AVR system under study, the results show that the proposed
method can perform an excellent search for the optimal PID controller parameters
quickly. As will be shown through extensive simulation studies, one of drawbacks
of the DARLA is that it is quite sensitive to the initial ranges considered
for the design variables forming the overall search space. The proposed method
is more efficient as well as more robust against the search space volume. For
this comparative study, fortunately, the normal ranges of three controller parameters
that is K_{p}, K_{i} and K_{d} considered in other reference
(Gaing, 2004) is available. The AVR system parameters
are show in Table 1 (Gaing, 2004).
Some other symbols used in this study are shown in Table 2.
Figure 7 shows the original step response of the AVR system
without controller. In this case, M_{p}% = 50.51, E_{ss}% =
9.06, T_{r} = 0.273sec and T_{s} = 5.565 sec. As seen, the results
are not satisfactory for a practical system.
Also, The PID controller parameters calculated by ZieglerNichols method are
as follow:
Here, the parameters of the PID controller are calculated by the proposed EDARLA.
Considering the results already reported by Kashki et
al. (2006), the ranges of the three parameters K_{p}, K_{i}
and K_{d} are taken as [0,1.5], [0,1] and [0 1], respectively.
Table 1: 
The AVR system parameters 

Table 2: 
The used symbols 

For calculation of the PID parameters, each interval is divided into 5 slots
in the first, second and three loops. The simulation results of the EDARLA for
different number of loops and different number of total iterations are summarized
in Table 3. It can be seen that the final results are quite
interesting and excellent results can be obtained through even less than 50
iterations. Moreover, comparing cases 2 and 3 ( with the same number or iterations
equal 20) reveals that having two loops rather than one loop can clearly result
in better results as reflected in the final cost functions of the addressed
case studies. Also, Fig. 8 and 9 show the
convergence characteristics of the proposed method and terminal voltage step
response of the AVR system respectively for different simulation conditions.
As seen, through about 50 iterations, the EDARLA method successfully converges
and provides good performance. The results prove that the proposed method (EDARLA)
can search for optimal PID controller parameters quickly and efficiently.
Table 3: 
The Results of Simulation for proposed method (EDARLA) 


Fig. 7: 
Step response of the AVR system without controller 

Fig. 8: 
Convergence tendency of the EDARLA (case 7) 

Fig. 9: 
Terminal voltage step response of the AVR system with optimal
PID controller (EDARLA) 

Fig. 10: 
Comparison of ZN and EDARLA methods 
Comparison between the Proposed EDARLA, DARLA and ZN Methods
Figure 1012 compare the results ZN,
DARLA and EDARLA methods. The EDARLA has totally run for 50 iterations only
while the DARLA has trained for 200 iterations and as seen EDARLA provides much
better results.

Fig. 11: 
The Control Signal of ZN and EDARLA methods 

Fig. 12: 
Performance index of DARLA and EDARLA 

Fig. 13: 
The responses of DARLA and EDARLA methods 
Also, Fig. 13 shows the convergence curves and step response of the AVR for
both DARLA and EDARLA methods after 200 iterations. Again, one can clearly see
that the EDARLA has provided much better results with faster convergence.
Robustness to the Design Variables Ranges
As Already addressed, one of key issues in both DARLA and CARLA methods
is the range of each design variable need to be predefined. This is not a trivial
task, as the designer cannot always guess some appropriate ranges. If the ranges
are too small, that makes the overall search space too small that the procedure
may fail to find acceptable optimal points. On the other side, large ranges
can make the algorithm too time consuming. Even worth, for a fixed number of
slots (N_{div}), larger ranges may lead to even worth results as shown
in Table 4 and 5.
Table 4: 
Comparison of proposed method (EDARLA) and conventional DARLA
run for 200 iterations 

Table 5: 
Effect of the search space: A comparison between the proposed
EDARLA and the conventional DARLA 

The tables show the comparative results of the conventional DARLA and the proposed
EDARLA. Both methods have run for 200 iterations. As the results show, the DARLA
fails to reach good results even for small ranges. Moreover, as the ranges increase,
the overall performance index increases. The changes in the performance index,
rise time, overshoot and overshoot is notably less for the EDARLA. This feature
is quite important, because as already mentioned, in most cases, the best ranges
for the search space cannot be determined exactly. Hence, a need for a heuristic
approach such as EDARLA is apparent. The Results prove the good robustness of
the EDARLA and explain how the conventional DARLA is too sensitive to the ranges
taken for its variable. So, the most efficient, robust and optimal results can
be gained by the proposed method (EDARLA). In the next subsections performance,
efficiency and sensitivity of GA and CARLA approaches are also investigated.
Comparison between the Proposed EDARLA and GA Methods
In order to more highlight the advantages of the proposed method, we also
implemented the GA (Genetic Algorithm) method (Gaing, 2004).
The characteristics of the two controllers using the same performance index
as defined by (22) are compared. The GA parameters are as follow:
Population size 
= 
25 
Crossover rate (Pc) 
= 
0.75 
Mutation rate (Pm) 
= 
0.0075 
Table 6 shows the GA approach results for different ranges
of the parameters. As seen the GA does not provide good results even over 200
populations. It is also more sensitive to the variables ranges. Figure
1416 compare the step responses of the AVR system when
GA and EDARLA methods are used for training the PID controller for some different
ranges of the parameters. As seen, EDARLA provides the best results in comparison
with the results of the other methods discussed. Also, Fig. 17
compares the convergence curve of GA and EDARLA methods. It must be noted that
while both methods are executed for 200 iterations, the EDARLA needs 200 fitness
function evaluations, but the GA needs 200x25 function evaluations, because
each population has 25 elements. That makes the GA too time consuming in comparison
to DARLA and EDARLA.
Comparison Between the Proposed EDARLA and CARLA Methods
To give further insight to the horizon of the evolutionary optimization
approaches used for design of controllers as whole and PID controllers of AVR
systems in particular, the optimization performance of the classic CARLA (Kashki
et al., 2006) is also evaluated.
Table 6: 
Simulation results for designing PID controller by GA: sensitivity
analysis 


Fig. 14: 
The responses of GA and EDARLA methods (range of parameters
: [0 1.5]) 

Fig. 15: 
The responses of GA and EDARLA methods (range of parameters
: [0 10]) 

Fig. 16: 
The responses of GA and EDARLA methods (range of parameters
: [0 20]) (Number of Iterations for EDARLA and Number of Generations for
GA : 200) 
The results can be compared to the other methods already discussed including
the EDARLA. Table 7 depicts the performance of the CARDA run for 200 iterations,
as seen, it provides good results when the search space is properly small (cases
1 and 2). It is however very sensitive to the ranges taken for the design parameters
and provides poor results for large ranges.
Table 7: 
Simulation results for designing PID controller by CARLA 

g_{h} = 0.75, g_{w} = 0.008 

Fig. 17: 
Performance index of GA and EDARLA methods (range of parameters
: [0 20]) (Number of Iterations for EDARLA and Number of Generations for
GA : 200) 

Fig. 18: 
The responses of CARLA and EDARLA (range of parameters : [0
1.5]) 

Fig. 19: 
The responses of CARLA and EDARLA (range of parameters : [0
5]) 
However, it provides better results than GA. It should also be noted that each
iteration of the CARDA is a bit more time consuming than DARLA’s. Figure
1820 compare the step responses of the AVR system when
CARLA and EDARLA methods are used.

Fig. 20: 
The responses of CARLA and EDARLA (range of parameters : [0
15]) 

Fig. 21: 
Comparison of the performance index for CARLA and EDARLA range
of parameters : [0 15]) 

Fig. 22: 
Step response of the system before and after retuning by EDARLA 
As seen, EDARLA provides the best results in comparison with the results of
the other methods. Also, Fig. 21 compares the convergence curve of CARLA and
EDARLA methods confirming the superior behavior of the proposed method.
Adaptive PID Controller Tuning Using EDARLA
Thank to the efficiency of the EDARLA, it is quite easy to quickly retune
the controller parameters once a change occurs in the plant parameters. That
change, of course, should be identified by a proper parameter identification
method. The generator parameters are change as K_{G} = 0.7 and T_{G}
= 2 which is a major change that is the generator is considered noload with
a high timeconstant significantly affects the overall transient response of
the AVR system. The EDARLA algorithm has run for only 20 more iteration to retune
the PID controller. The updated PID parameters are: K_{p} = 0.9293,
K_{i} = 0.1507 and K_{d} = 0.452. Figure 22
shows the step response of the system before and after retuning and as seen
the adaptation mechanism has provided much good results.
CONCLUSION
This study introduces a new intelligent method for optimizing the parameters of the PID controller for an AVR system. The proposed method is an extended version of DARLA method called EDARLA which optimizes the controller parameters while the variables are not considered independent opposed to the classic DARLA. By using matrix calculation, the speed of convergence can be increased and the system can be used for many real time applications. Superior performance, robustness and efficiency of the proposed method have been proved through extensive simulation results including comparisons with CARLA, GA, ZieglerNichols and DARLA methods. The extensive studies carried out clearly proves that the proposed approach is an excellent candidate for optimizing various control problems including adaptive control system thank to its high efficiency, speed and robustness.
ACKNOWLEDGMENT
The authors wish to thank the Fuzzy Systems and Applications Center of Excellence, Shahid Bahonar University of Kerman, Kerman, Iran.