Automatic Tuning of Q-learning Algorithms Parameters

Abstract: This paper describes a general approach for automatically tuning of reinforcement learning algorithms` parameters. In this approach a reinforcement learning agents` parameters are tuned by other more simple algorithms of reinforcement learning. We will explain this approach by tuning one of the parameters of a Q-learning and statistical clustering algorithm. The results of tuning this parameter will be described by some simple examples. Comparing the result of an algorithm using automatically tuned parameter and the algorithms with fixed parameters will show that the former is generally more flexible and capable of performing better in most cases.

Fulltext PDF

How to cite this article

M.R. Meybodi and S. Hodjat, 2002. Automatic Tuning of Q-learning Algorithms Parameters. Journal of Applied Sciences, 2: 408-415.

Keywords: Reinforcement learning, hierarchical learning, parameter tuning, Q-learning, learning automata and statistical clustering

REFERENCES

Dayan, P. and G.E. Hinton, 1993. Feudal Reinforcement Learning. In: Advancess in Neural in Formation Processing Systems 5, Hanson, S.J., J.D. Cowan and C.L. Giles (Eds.). CA Morgan Kaufman, San Mateo

Harmon, M.E. and S.S. Harmon, 1996. Reinforce learning: Tutorial. http://www.nbu.bg/cogs/events/2000/Readings/Petrov/rltutorial.pdf.

Hodjat, S. and M.R. Meybodi, 1996. Fine tuning of Q-learning automata (IN Farsi). Proceedings of the 2nd Annual Conference of Computer Society of Iran, (ACCSI`96), Tehran, Iran, pp: 209-220.

Hodjat, S., 1997. An artifical lab for creating and comparing learning algorithms. M.Sc. Thesis, Department of Computer Engineering, Amirkabir University, Thran, IRan.

Kaelbling, L.P., M.L. Littman and A.W. Moore, 1996. Reinforcement learning: A survey. J. Artificial Intell. Res., 4: 237-285.
Direct Link

Krinskii, V.I., 1964. Asymptotically optimal automaton with exponential seed of convergence. Biofizica, 9: 484-487.
PubMed Direct Link

Krylov, V.Y., 1964. One stochastic automaton which is asymptotically optimal in ramdom medium. Automata Remote Control, 24: 1114-1116.

Mathadvan, S. and J. Connel, 1991. Scaling reinforcement learning to robotics by exploiting the subsumption architecture. Proceedings of the 8th International Workshop on Machine Learning, (IWMC`91), Morgan Kaufmann, pp: 328-332.

Mahadevan, S. and J. Connel, 1991. Automatic programming of behavior-based robots using reinforcement learning. Proceedings of the Artifical Intelligence, (AI`91), Pittsburgh, PA, pp: 311-365.

Meybodi, M.R. and S. Lakshmivarahan, 1982. E-Optimality of a general class of absorbing barrier learning algorithms. Inform. Sci., 28: 1-20.

Narendra, K.S. and A.L. Thathachar, 1989. Learning Automata. Prentic Hall, Englewood Cliffs, NJ., USA

Schalkoff, R., 1991. Pattern Recognition. Wiley International, New Jercy, USA

Rseltsin, M.L., 1962. On the behavior of finite automata in random media. Automata Remote Control, 22: 1345-1354.

Watkins, C.J.C.H., 1989. Learning from delayed rewards. Ph.D. Thesis, Kings College, Cambridge, England.

HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2002 | Volume: 2 | Issue: 3 | Page No.: 408-415
DOI: 10.3923/jas.2002.408.415

Automatic Tuning of Q-learning Algorithms Parameters

M.R. Meybodi and S. Hodjat

How to cite this article

M.R. Meybodi and S. Hodjat, 2002. Automatic Tuning of Q-learning Algorithms Parameters. Journal of Applied Sciences, 2: 408-415.

Keywords: Reinforcement learning, hierarchical learning, parameter tuning, Q-learning, learning automata and statistical clustering

REFERENCES

HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2002 | Volume: 2 | Issue: 3 | Page No.: 408-415 DOI: 10.3923/jas.2002.408.415

Automatic Tuning of Q-learning Algorithms Parameters

M.R. Meybodi and S. Hodjat

How to cite this article

M.R. Meybodi and S. Hodjat, 2002. Automatic Tuning of Q-learning Algorithms Parameters. Journal of Applied Sciences, 2: 408-415.

Keywords: Reinforcement learning, hierarchical learning, parameter tuning, Q-learning, learning automata and statistical clustering

REFERENCES

Year: 2002 | Volume: 2 | Issue: 3 | Page No.: 408-415
DOI: 10.3923/jas.2002.408.415