On Stationary Point Convergence of PPO-Clip

On Stationary Point Convergence of PPO-Clip

ICLR 2025 Conference Submission168 Authors

24 Sept 2024 (modified: 24 Sept 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsCC BY 4.0

Keywords: PPO, PPO-Clip, stochastic optimization

Abstract: Proximal policy optimization (PPO) has gained popularity in reinforcement learning (RL). Its PPO-Clip variant is one the most frequently implemented algorithms and is one of the first-to-try algorithms in RL tasks. This variant uses a clipped surrogate objective function not typically found in other algorithms. Many works have demonstrated the practical performance of PPO-Clip, but the theoretical understanding of it is limited to specific settings. In this work, we provide a comprehensive analysis that shows the stationary point convergence of PPO-Clip and the convergence rate thereof. Our analysis is new and overcomes many challenges, including the non-smooth nature of the clip operator, the potentially unbounded score function, and the involvement of the ratio of two stochastic policies. Our results and techniques might share new insights into PPO-Clip.

Submission Number: 168

Loading