1: Dayan, P. and G.E. Hinton, 1993. Feudal Reinforcement Learning. In: Advancess in Neural in Formation Processing Systems 5, Hanson, S.J., J.D. Cowan and C.L. Giles (Eds.). CA Morgan Kaufman, San Mateo.
2: Harmon, M.E. and S.S. Harmon, 1996. Reinforce learning: Tutorial. http://www.nbu.bg/cogs/events/2000/Readings/Petrov/rltutorial.pdf.
3: Hodjat, S. and M.R. Meybodi, 1996. Fine tuning of Q-learning automata (IN Farsi). Proceedings of the 2nd Annual Conference of Computer Society of Iran, (ACCSI`96), Tehran, Iran, pp: 209-220.
4: Hodjat, S., 1997. An artifical lab for creating and comparing learning algorithms. M.Sc. Thesis, Department of Computer Engineering, Amirkabir University, Thran, IRan.
5: Kaelbling, L.P., M.L. Littman and A.W. Moore, 1996. Reinforcement learning: A survey. J. Artificial Intell. Res., 4: 237-285.
Direct Link |
6: Krinskii, V.I., 1964. Asymptotically optimal automaton with exponential seed of convergence. Biofizica, 9: 484-487.
PubMed | Direct Link |
7: Krylov, V.Y., 1964. One stochastic automaton which is asymptotically optimal in ramdom medium. Automata Remote Control, 24: 1114-1116.
8: Mathadvan, S. and J. Connel, 1991. Scaling reinforcement learning to robotics by exploiting the subsumption architecture. Proceedings of the 8th International Workshop on Machine Learning, (IWMC`91), Morgan Kaufmann, pp: 328-332.
9: Mahadevan, S. and J. Connel, 1991. Automatic programming of behavior-based robots using reinforcement learning. Proceedings of the Artifical Intelligence, (AI`91), Pittsburgh, PA, pp: 311-365.
10: Meybodi, M.R. and S. Lakshmivarahan, 1982. E-Optimality of a general class of absorbing barrier learning algorithms. Inform. Sci., 28: 1-20.
11: Narendra, K.S. and A.L. Thathachar, 1989. Learning Automata. Prentic Hall, Englewood Cliffs, NJ., USA.
12: Schalkoff, R., 1991. Pattern Recognition. Wiley International, New Jercy, USA.
13: Rseltsin, M.L., 1962. On the behavior of finite automata in random media. Automata Remote Control, 22: 1345-1354.
14: Watkins, C.J.C.H., 1989. Learning from delayed rewards. Ph.D. Thesis, Kings College, Cambridge, England.