question

katerina-fratczak avatar image
0 Likes"
katerina-fratczak asked katerina-fratczak commented

Reinforcement learning "0" in the 1. test run

Hello,

I am using your RL training model. By running flexsim_training.py the "0" appears " after a few rows. It happens in my other similar models too. I have added a print command to the flexsim model: rewards and model reset - so we can see the reward "0" in python corresponds with reset of the model.

1717417895572.png

In this situation, the rewards before "0" (it means from the previous simulation run) are counted also into the overall reward of the 1st test run, so this reward is then too high.

Why it behaves like that? Is this a problem or just a "cosmetic issue"?

(The algorithm seems to be trianed well, when using later the flexsim_inference.py.)

Thank you in advance :-)


FlexSim 23.0.15
reinforcement learning
1717417895572.png (261.7 KiB)
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

1 Answer

Felix Möhlmann avatar image
0 Likes"
Felix Möhlmann answered katerina-fratczak commented

The RL agent is called and receives a reward each time the processor runs its Pull Strategy. This happens the first time during the model reset at which point the reward label on the sink is 0.

You could add a condition to the reward function that a different reward is given out if the model time is equal to 0.

It shouldn't make much of a difference though if each model run lasts long enough and/or the state for which this reward is received is not always the same.

· 5
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

katerina-fratczak avatar image katerina-fratczak commented ·

Hello Felix,

thank you for you answer. I understand now, why the 0 appears. But still I do not understand, why the rewards before 0, in the picture 0.5 and 1 are counted as well to the overall reward of the 1st run? In other words, why the 1st test run does not start with 0, as the 2nd test run does?


0 Likes 0 ·
Felix Möhlmann avatar image Felix Möhlmann katerina-fratczak commented ·
I believe, and I might be completely wrong here, that what you see in the console are the latest rewards/states of the entire test run. So the first to rows you see would actually be the last two rows of a previous simulation run.
0 Likes 0 ·
katerina-fratczak avatar image katerina-fratczak Felix Möhlmann commented ·

Yes, I believe the same. the only concern I have is, that the rewards of the items of the previous simulation run are counted into the overall reward of the new simulation run. Here I tried 90000 timesteps and I calculated the rewards in excel, so I realized, in python console it counted also the rewards from the previous simulation run (except of the first item). 2nd run begins always with 0, so there the overall reward is correct for that simulation run.reward.png

0 Likes 0 ·
reward.png (520.5 KiB)
Show more comments