The recovery process of semi-submersible vehicles takes a long time,requires a high level of experience and technology of the operator,and the cost of trial and error is high. The VH-PPO algorithm is proposed to address this issue,and its performance is analyzed from three aspects,which are convergence,upper and lower bounds of expected convergence time,and time complexity. Using historical data successfully manipulated by human,the initial probability distribution is given and trained on this basis. The process of free exploration and continuous trial and error is eliminated and the expected convergence time can be effectively reduced,so that the training model can converge faster,and time complexity of the algorithm can be reduced. Choosing better hyperparameters for different stages of training to prevent overshoot and undershooting can help the training model converge better,reduce the upper and lower bounds of the expected convergence time,and thus reduce the time complexity of the algorithm. The algorithm is used for reinforcement learning in OpenAI Gym. After training,it is applied to the control software. The model is validated and further adjusted in a certain sea area. The experimental results show that as the number of experiment increases,the adaptability of the intelligent agent in the real environment gets better and better,and auxiliary control commands account for more than 50% of the total control commands,which effectively relieves the fatigue of the operator,and reduces the difficulty of novice training and the threshold for replacing operators.