This paper presents a modified distributed Q-learning algorithm termed the Sequential Q-learning algorithm with Kalman Filtering (SQKF), for multi-robot decision making. While Q-learning is employed commonly in the multi-robot domain to support robot operation in dynamic and unknown environments, it also faces many challenges. It is questionable to scale the conventional single-agent Q-learning algorithm into the multi-robot domain because such an extension violates the Markov assumption on which the algorithm is based on. The empirical results show that it can confuse the robots and render them unable to learn a good cooperative policy due to incorrect credit assignment among robots and also make a robot incapable of observing the actions of other robots in the same environment. In this paper, a modified Q-learning algorithm termed the Sequential Q-learning Algorithm with Kalman Filtering (SQKF), which is suitable for multi-robot decision-making, is developed. The basic characteristics of the SQKF algorithm are: (1) the learning process is not parallel but sequential, i.e. the robots will not make decisions simultaneously and instead, they will learn and make decisions according to a predefined sequence; (2) a robot will not update its Q values with observed global rewards and instead, it will employ a specific Kalman filter to extract its real local reward from the global reward thereby updating its Q-table with this local reward. The new SQKF algorithm is intended to solve two problems in multi-robot Q-learning: Credit assignment and Behavior conflicts. The detailed procedure of the SQKF algorithm is presented and its application is illustrated. Empirical results show that the algorithm has better performance than the conventional single-agent Q-learning algorithm or the Team Q-learning algorithm in the multi-robot domain.
Skip Nav Destination
ASME 2007 International Mechanical Engineering Congress and Exposition
November 11–15, 2007
Seattle, Washington, USA
Conference Sponsors:
- ASME
ISBN:
0-7918-4303-3
PROCEEDINGS PAPER
A Modified Q-Learning Algorithm for Multi-Robot Decision Making
Ying Wang,
Ying Wang
University of British Columbia, Vancouver, BC, Canada
Search for other works by this author on:
Clarence W. de Silva
Clarence W. de Silva
University of British Columbia, Vancouver, BC, Canada
Search for other works by this author on:
Ying Wang
University of British Columbia, Vancouver, BC, Canada
Clarence W. de Silva
University of British Columbia, Vancouver, BC, Canada
Paper No:
IMECE2007-41643, pp. 1275-1281; 7 pages
Published Online:
May 22, 2009
Citation
Wang, Y, & de Silva, CW. "A Modified Q-Learning Algorithm for Multi-Robot Decision Making." Proceedings of the ASME 2007 International Mechanical Engineering Congress and Exposition. Volume 9: Mechanical Systems and Control, Parts A, B, and C. Seattle, Washington, USA. November 11–15, 2007. pp. 1275-1281. ASME. https://doi.org/10.1115/IMECE2007-41643
Download citation file:
10
Views
Related Proceedings Papers
Related Articles
Simplified Federated Filtering Algorithm With Different States in Local Filters
J. Dyn. Sys., Meas., Control (January,2011)
Using an Extended Kalman Filter for Rigid Body Pose Estimation
J Biomech Eng (June,2005)
Fundamentals of Robotics: Linking Perception to Action (Series in Machine Perception and Artificial Intelligence)
Appl. Mech. Rev (September,2004)
Related Chapters
Better Decisions
Total Quality Development: A Step by Step Guide to World Class Concurrent Engineering
Conflict Mediation
Conflict Resolution: Concepts and Practice (The Technical Manager's Survival Guides)
Practical Applications
Robust Control: Youla Parameterization Approach