Subscribe Now Subscribe Today
Research Article
 

Automatic Tuning of Q-learning Algorithms Parameters



M.R. Meybodi and S. Hodjat
 
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail
ABSTRACT

This paper describes a general approach for automatically tuning of reinforcement learning algorithms` parameters. In this approach a reinforcement learning agents` parameters are tuned by other more simple algorithms of reinforcement learning. We will explain this approach by tuning one of the parameters of a Q-learning and statistical clustering algorithm. The results of tuning this parameter will be described by some simple examples. Comparing the result of an algorithm using automatically tuned parameter and the algorithms with fixed parameters will show that the former is generally more flexible and capable of performing better in most cases.

Services
Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

M.R. Meybodi and S. Hodjat , 2002. Automatic Tuning of Q-learning Algorithms Parameters. Journal of Applied Sciences, 2: 408-415.

DOI: 10.3923/jas.2002.408.415

URL: https://scialert.net/abstract/?doi=jas.2002.408.415

REFERENCES
1:  Dayan, P. and G.E. Hinton, 1993. Feudal Reinforcement Learning. In: Advancess in Neural in Formation Processing Systems 5, Hanson, S.J., J.D. Cowan and C.L. Giles (Eds.). CA Morgan Kaufman, San Mateo.

2:  Harmon, M.E. and S.S. Harmon, 1996. Reinforce learning: Tutorial. http://www.nbu.bg/cogs/events/2000/Readings/Petrov/rltutorial.pdf.

3:  Hodjat, S. and M.R. Meybodi, 1996. Fine tuning of Q-learning automata (IN Farsi). Proceedings of the 2nd Annual Conference of Computer Society of Iran, (ACCSI`96), Tehran, Iran, pp: 209-220.

4:  Hodjat, S., 1997. An artifical lab for creating and comparing learning algorithms. M.Sc. Thesis, Department of Computer Engineering, Amirkabir University, Thran, IRan.

5:  Kaelbling, L.P., M.L. Littman and A.W. Moore, 1996. Reinforcement learning: A survey. J. Artificial Intell. Res., 4: 237-285.
Direct Link  |  

6:  Krinskii, V.I., 1964. Asymptotically optimal automaton with exponential seed of convergence. Biofizica, 9: 484-487.
PubMed  |  Direct Link  |  

7:  Krylov, V.Y., 1964. One stochastic automaton which is asymptotically optimal in ramdom medium. Automata Remote Control, 24: 1114-1116.

8:  Mathadvan, S. and J. Connel, 1991. Scaling reinforcement learning to robotics by exploiting the subsumption architecture. Proceedings of the 8th International Workshop on Machine Learning, (IWMC`91), Morgan Kaufmann, pp: 328-332.

9:  Mahadevan, S. and J. Connel, 1991. Automatic programming of behavior-based robots using reinforcement learning. Proceedings of the Artifical Intelligence, (AI`91), Pittsburgh, PA, pp: 311-365.

10:  Meybodi, M.R. and S. Lakshmivarahan, 1982. E-Optimality of a general class of absorbing barrier learning algorithms. Inform. Sci., 28: 1-20.

11:  Narendra, K.S. and A.L. Thathachar, 1989. Learning Automata. Prentic Hall, Englewood Cliffs, NJ., USA.

12:  Schalkoff, R., 1991. Pattern Recognition. Wiley International, New Jercy, USA.

13:  Rseltsin, M.L., 1962. On the behavior of finite automata in random media. Automata Remote Control, 22: 1345-1354.

14:  Watkins, C.J.C.H., 1989. Learning from delayed rewards. Ph.D. Thesis, Kings College, Cambridge, England.

©  2021 Science Alert. All Rights Reserved