I have an artificial intelligence algorithm which is applied to a simple two legged robot. Something strange is happening. The reward function is (1-cos(swingAngle))/0.1077 where swing angle converges to -0.38. In this case I was expecting to see the reward value at steady state 0.6624. However, the reward is 0.995. I changed the reward function but the result is still miscalculated. (Please see the file. instantRewardSwingLeg is the reward function modelica custom component, reward plot is the reward figure.). Could you advise me how to get rid of this problem?
Another thing is I need to reinitialize the both legs' angles/velocities when the swing leg hits the ground and the angles reach the value (both swing and support) that are appropriate for walking. Basically based on the number of measurements I need to reinitialize the both legs for the next steps. Could you help me for this please.
Here is the file RLHumanoidRobotFull3.msim
Thanks for your help...