In this paper we implemented two ways of improving the performance of reinforcement learning algorithms. We proposed a new equation to prioritize transition samples to improve model accuracy, and by deploying a generalized solver of randomly-generated two-dimensional mazes on a distributed computing platform, our dual-network model is available to others for further research and development. Reinforcement Learning is concerned with identifying the optimal sequence of actions for an agent to take in order to reach an objective to achieve the highest score in the future. Complex situations can lead to computational challenges in terms of both finding the best answer and the training time required to do so. Our prioritization algorithm increased model accuracy by 7% versus a baseline model with no prioritization, and using five workers on the RAY platform using RLlib achieved a 4.5X acceleration in training time versus using one worker.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License