Reinforcement learning tutorial: state values and mismatch between messages

Question

question

sz asked Oct 14, '24 Jeanette F commented Oct 23, '24

Reinforcement learning tutorial: state values and mismatch between messages

Hello,

I’ve been following this tutorial: https://docs.flexsim.com/en/22.1/ModelLogic/ReinforcementLearning/Training/Training.html und have successfully implemented and run the two Python scripts, flexsim_env.py and flexsim_training.py. However, I have trouble understanding parts of the output. I've attached a screenshot for reference.

1) In the FlexSim model, the "action" and "observation" parameters ("LastItemType“ and "ItemType“) are defined to have values between 1 and 5. However, in the output, the state values range from 0 to 4. Why is there this discrepancy between the expected state range in the model and the observed state values in the output?

2) At the beginning of each iteration, the "state" values from the Action and Observation messages don’t match. After a few simulation steps, the values do align, but why are they initially inconsistent?

Thank you!

Model.fsm

Software Version:

FlexSim 24.1.0

reinforcement learning reinforcement training

1728925547594.png (206.6 KiB)

model.fsm (32.6 KiB)

· 1

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jeanette F ♦♦ commented · Oct 23 at 02:25 PM

Hi @sz, was Felix Möhlmann's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always comment back to reopen your question.

0 ·

Answer 1 · 2024-10-15T06:31:52Z

Felix Möhlmann answered Oct 15, '24

1) I believe a discrete parameter with N possible values is always mapped to the range [0, N-1]. For example, if the possible values were 3, 6, 9 and 12, the RL agent would "see" the values 0, 1, 2, 3.

2) Not all types of items will be available to pull at the start of run. When the requested type is not available the demo model will instead pull the first item in the queue.

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

question