Reinforcement learning "0" in the 1. test run

Question

question

katerina-fratczak asked Jun 3, '24 katerina-fratczak commented Jun 4, '24

Reinforcement learning "0" in the 1. test run

Hello,

I am using your RL training model. By running flexsim_training.py the "0" appears " after a few rows. It happens in my other similar models too. I have added a print command to the flexsim model: rewards and model reset - so we can see the reward "0" in python corresponds with reset of the model.

In this situation, the rewards before "0" (it means from the previous simulation run) are counted also into the overall reward of the 1st test run, so this reward is then too high.

Why it behaves like that? Is this a problem or just a "cosmetic issue"?

(The algorithm seems to be trianed well, when using later the flexsim_inference.py.)

Thank you in advance :-)

Software Version:

FlexSim 23.0.15

reinforcement learning

1717417895572.png (261.7 KiB)

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Answer 1 · 2024-06-04T06:18:44Z

Felix Möhlmann answered Jun 4, '24 katerina-fratczak commented Jun 4, '24

The RL agent is called and receives a reward each time the processor runs its Pull Strategy. This happens the first time during the model reset at which point the reward label on the sink is 0.

You could add a condition to the reward function that a different reward is given out if the model time is equal to 0.

It shouldn't make much of a difference though if each model run lasts long enough and/or the state for which this reward is received is not always the same.

· 5

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

katerina-fratczak commented · Jun 04 at 06:57 AM

Hello Felix,

thank you for you answer. I understand now, why the 0 appears. But still I do not understand, why the rewards before 0, in the picture 0.5 and 1 are counted as well to the overall reward of the 1st run? In other words, why the 1st test run does not start with 0, as the 2nd test run does?

0 ·

Felix Möhlmann katerina-fratczak commented · Jun 04 at 08:24 AM

I believe, and I might be completely wrong here, that what you see in the console are the latest rewards/states of the entire test run. So the first to rows you see would actually be the last two rows of a previous simulation run.

0 ·

katerina-fratczak Felix Möhlmann commented · Jun 04 at 09:36 AM

Yes, I believe the same. the only concern I have is, that the rewards of the items of the previous simulation run are counted into the overall reward of the new simulation run. Here I tried 90000 timesteps and I calculated the rewards in excel, so I realized, in python console it counted also the rewards from the previous simulation run (except of the first item). 2nd run begins always with 0, so there the overall reward is correct for that simulation run.

0 ·

reward.png (520.5 KiB)

Show more comments

question