Reinforcement learning tutorial: state values and mismatch between messages

Question

question

sz asked Oct 14 2024 at 5:08 PM Jeanette F commented Oct 23 2024 at 2:25 PM

Reinforcement learning tutorial: state values and mismatch between messages

Hello,

I’ve been following this tutorial: https://docs.flexsim.com/en/22.1/ModelLogic/ReinforcementLearning/Training/Training.html und have successfully implemented and run the two Python scripts, flexsim_env.py and flexsim_training.py. However, I have trouble understanding parts of the output. I've attached a screenshot for reference.

1) In the FlexSim model, the "action" and "observation" parameters ("LastItemType“ and "ItemType“) are defined to have values between 1 and 5. However, in the output, the state values range from 0 to 4. Why is there this discrepancy between the expected state range in the model and the observed state values in the output?

2) At the beginning of each iteration, the "state" values from the Action and Observation messages don’t match. After a few simulation steps, the values do align, but why are they initially inconsistent?

Thank you!

Model.fsm

Software Version:

FlexSim 24.1.0

reinforcement learning reinforcement training

1728925547594.png (206.6 KiB)

model.fsm (32.6 KiB)

· 1

Answer 1 · 2024-10-15T06:31:52Z

Felix Möhlmann answered Oct 15 2024 at 6:31 AM

1) I believe a discrete parameter with N possible values is always mapped to the range [0, N-1]. For example, if the possible values were 3, 6, 9 and 12, the RL agent would "see" the values 0, 1, 2, 3.

2) Not all types of items will be available to pull at the start of run. When the requested type is not available the demo model will instead pull the first item in the queue.

question