I am trying to use Reinforcement Learning functionalities in FlexSim for a test problem in my case. Attached is the FlexSim model. Essentially the task is to find an optimal vessel shipping schedule from Production Tank to the two customer tanks. My goal was to use reinforcement learning to do this and followed the tutorial here.
The actions for me are the destinations to which the vessels are shipped and the states are the tank levels at Cylindric1, Cylindric2 and Cylindric3. For now I put a fixed reward just to get started and kept 100 days for a single episode. I have some attached images for what I have implemented.
What I see is something I am not able to understand. For some reason I am not able to see the random actions being selected i.e. ships being shipped randomly to the two destinations. Also, while trying to test the model with flexsim_env.py, I get a message "Waiting for Observation message" with the code not running further.
I guess for some reason, the agent is not able to find observations and hence its not able to run further. Can anyone please help me by having a look at it? Thanks in advance.