Reinforcement learning approach to control a single inverted pendulum

Abstract: This project investigates the use of the Reinforcement learning algorithm, i.e., the Soft Actor-Critic (SAC) algorithm with a cascaded learning strategy to learn how to swing up a pendulum attached to a cart to a vertical position. The SAC algorithm is a model-free, off-policy algorithm that can learn complex policies for continuous action spaces. The agent receives feedback in the form of a reward function that incentivizes the agent to keep the pendulum as close to the vertical position as possible.

The goal is to train the agent to learn the task from scratch, starting from any random location and configuration of the pendulum. To accomplish this, we use a cascaded learning approach, where the agent is first trained in a simulation environment with an easy task. Then, the learned agent is fine-tuned in a more complex environment with randomly position and configuration of the pendulum on cart system.

Our results demonstrate that the cascaded learning approach using the SAC algorithm can effectively train the agent to swing up the pendulum from any random location and configuration. Moreover, the agent can maintain the pendulum in the upright position, displaying robustness and adaptability to changing environments.
In conclusion, our study highlights the potential of SAC with transfer learning in solving challenging control problems in robotics and other areas. The approach of training the agent with an easy task before fine-tuning with a more complete task can be an effective strategy to accelerate the learning process and improve the agent’s performance.