Abstract:With the increasing military application of unmanned combat aircraft (UCAV), unmanned combat will become the main combat mode in the future air battlefield. In closerange air combat, the environment is complex and the combat situation changes rapidly. The method based on game theory cannot meet the realtime requirements due to the large amount of data iteration, and the datadriven method has the problems of long training time and low execution efficiency. To solve this problem, a UCAV maneuver decision method based on deep reinforcement learning algorithm is proposed in this paper. Firstly, the flight drive module is constructed on the basis of UCAV threedegreeoffreedom model to form the state transition updating mechanism. Then, on the basis of PPO algorithm, ornsteinuhlenbeck (OU) random noise was added to improve UCAV's ability to explore unknown state space, and LSTM was combined to enhance UCAV's ability to learn sequence sample data, so as to improve the training efficiency and effect of the algorithm. Finally, the effectiveness and superiority of the proposed method are verified by designing three groups of closerange air combat simulation experiments and comparing the performance with PPO algorithm.