Automatic Tuning of Q-learning Algorithms Parameters
Abstract:
This paper describes a general approach for automatically tuning of reinforcement learning algorithms` parameters. In this approach a reinforcement learning agents` parameters are tuned by other more simple algorithms of reinforcement learning. We will explain this approach by tuning one of the parameters of a Q-learning and statistical clustering algorithm. The results of tuning this parameter will be described by some simple examples. Comparing the result of an algorithm using automatically tuned parameter and the algorithms with fixed parameters will show that the former is generally more flexible and capable of performing better in most cases.
How to cite this article
M.R. Meybodi and S. Hodjat, 2002. Automatic Tuning of Q-learning Algorithms Parameters. Journal of Applied Sciences, 2: 408-415.
REFERENCES
Dayan, P. and G.E. Hinton, 1993. Feudal Reinforcement Learning. In: Advancess in Neural in Formation Processing Systems 5, Hanson, S.J., J.D. Cowan and C.L. Giles (Eds.). CA Morgan Kaufman, San Mateo
Harmon, M.E. and S.S. Harmon, 1996. Reinforce learning: Tutorial. http://www.nbu.bg/cogs/events/2000/Readings/Petrov/rltutorial.pdf.
Hodjat, S. and M.R. Meybodi, 1996. Fine tuning of Q-learning automata (IN Farsi). Proceedings of the 2nd Annual Conference of Computer Society of Iran, (ACCSI`96), Tehran, Iran, pp: 209-220.
Hodjat, S., 1997. An artifical lab for creating and comparing learning algorithms. M.Sc. Thesis, Department of Computer Engineering, Amirkabir University, Thran, IRan.
Kaelbling, L.P., M.L. Littman and A.W. Moore, 1996. Reinforcement learning: A survey. J. Artificial Intell. Res., 4: 237-285.
Direct Link
Krinskii, V.I., 1964. Asymptotically optimal automaton with exponential seed of convergence. Biofizica, 9: 484-487.
PubMed Direct Link
Krylov, V.Y., 1964. One stochastic automaton which is asymptotically optimal in ramdom medium. Automata Remote Control, 24: 1114-1116.
Mathadvan, S. and J. Connel, 1991. Scaling reinforcement learning to robotics by exploiting the subsumption architecture. Proceedings of the 8th International Workshop on Machine Learning, (IWMC`91), Morgan Kaufmann, pp: 328-332.
Mahadevan, S. and J. Connel, 1991. Automatic programming of behavior-based robots using reinforcement learning. Proceedings of the Artifical Intelligence, (AI`91), Pittsburgh, PA, pp: 311-365.
Meybodi, M.R. and S. Lakshmivarahan, 1982. E-Optimality of a general class of absorbing barrier learning algorithms. Inform. Sci., 28: 1-20.
Narendra, K.S. and A.L. Thathachar, 1989. Learning Automata. Prentic Hall, Englewood Cliffs, NJ., USA
Schalkoff, R., 1991. Pattern Recognition. Wiley International, New Jercy, USA
Rseltsin, M.L., 1962. On the behavior of finite automata in random media. Automata Remote Control, 22: 1345-1354.
Watkins, C.J.C.H., 1989. Learning from delayed rewards. Ph.D. Thesis, Kings College, Cambridge, England.
© Science Alert. All Rights Reserved