基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策

首页 > 过刊浏览>2021年第22卷第4期 >15-21

基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:V279
基金项目:

Research on UAV Anti-Pursing Maneuvering Decision Based on Improved Twin Delayed Deep Deterministic Policy Gradient Method

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对近距空战下的自主机动反追击问题，建立了无人机反追击马尔科夫（Markov）决策过程模型；在此基础上，提出了一种采用深度强化学习的无人机反追击自主机动决策方法。新方法基于经验回放区重构，改进了双延迟深度确定性策略梯度（TD3）算法，通过拟合策略函数与状态动作值函数，生成最优策略网络。仿真实验表明，在随机初始位置/姿态条件下，与采用纯追踪法的无人机对抗，该方法训练的智能无人机胜率超过93%；与传统的TD3、深度确定性策略梯度（DDPG）算法相比，该方法收敛性更快、稳定性更高。

Abstract:

In view of the problem of autonomous maneuvering counterpursuing in close air combat, a Markov decisionmaking process model for UAV counterpursuing is established, and for the abovementioned reasons, an autonomous maneuvering decisionmaking method for unmanned aerial vehicles (UAVs) based on deep reinforcement learning is proposed. The new method is based on the empirical replay area reconstruction, and improves the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, and generates the optimal strategy network by fitting the strategy function and the state action value function. The simulation experiments show that under condition of random initial position/attitude, being confronted with the drones adopted by the pure pursuit methods, the winning rate of intelligent drones trained by this method exceeds 93%. Compared with traditional TD3 and Deep Deterministic policy gradient (DDPG), this method is faster at convergence and higher in stability.

参考文献

相似文献

引证文献

引用本文

郭万春,解武杰,尹晖,董文瀚.基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策[J].空军工程大学学报,2021,22(4):15-21

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2021-09-13
出版日期:

欢迎访问《空军工程大学学报》官方网站!

首页

期刊简介

投审指南

过刊浏览

信息公告

出版伦理

OA政策声明

大学主页

联系我们

English

引用本文

分享

文章指标

历史