基于强化学习的半潜式航行体回收辅助操控技术应用
DOI:
CSTR:
作者:
作者单位:

中国船舶重工集团有限公司第七一〇研究所

作者简介:

通讯作者:

中图分类号:

基金项目:


Research on Assisted Control Technology for Semisubmersible Vehicle Recovery Based on Reinforcement Learning
Author:
Affiliation:

CSSC710

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    半潜式航行体回收过程耗时较长,对操纵者经验、技术门槛高,试错成本大。针对此问题提出VH-PPO算法,从收敛性、期望收敛时间的上下界、时间复杂度三个方面分析其性能,其中通过人工操控成功的历史数据给予初始概率分布并在此基础上进行训练,省去了自由探索不断试错的过程,可有效减小期望收敛时间,使训练模型更快的收敛,从而降低算法的时间复杂度。针对训练时不同的阶段,选择更优的超参数,防止超调和欠调现象,可帮助训练模型更好的收敛,降低期望收敛时间的上下界,从而降低算法的时间复杂度。在OpenAI Gym上通过该算法进行强化学习,训练完成后应用于操控软件,在三亚某海域进行验证并进一步调整模型,实验结果表明:随着试验次数增加,智能体在真实环境中的适应能力越来越好,辅助操控指令在总操控指令中占比超过50%,有效的减缓了操纵者的疲劳,降低了新手训练难度及替换操纵者的门槛。

    Abstract:

    The recovery process of semi submersible vehicles takes a long time, requires a high level of experience and technical expertise from the operator, and incurs high trial and error costs. The VH-PPO algorithm is proposed to address this issue, and its performance is analyzed from three aspects: convergence, upper and lower bounds of expected convergence time, and time complexity. The initial probability distribution is given through manually manipulating successful historical data and trained on this basis, eliminating the process of free exploration and continuous trial and error. This can effectively reduce the expected convergence time, make the training model converge faster, and thus reduce the time complexity of the algorithm. Choosing more optimal hyperparameters for different stages of training to prevent overshoot and undershooting can help the training model converge better, reduce the upper and lower bounds of the expected convergence time, and thus reduce the time complexity of the algorithm. The algorithm was used for reinforcement learning on OpenAI Gym, and after training, it was applied to the control software. The model was validated and further adjusted in a certain sea area in Sanya. The experimental results showed that as the number of experiments increased, the adaptability of the intelligent agent in the real environment improved, with auxiliary control commands accounting for more than 50% of the total control commands, effectively alleviating the fatigue of the operator, Reduced the difficulty of novice training and the threshold for replacing operators.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-10-23
  • 最后修改日期:2023-11-13
  • 录用日期:2023-11-27
  • 在线发布日期:
  • 出版日期:
文章二维码