question

katerina-fratczak avatar image
0 Likes"
katerina-fratczak asked katerina-fratczak commented

Reinforcement Learning strange rewards by test runs

Hello,

I have worked with your training model:


Flexsim training model, 100.000 timesteps:

Flexsim_training.py: I have concern regarding the reward 0 and three strange rewards after it and too high overall reward after the 1st test run (it happens similar also by my modified Flexsim models which I have created from yours). The 2nd test run has the 0 in the beginning, but the overall reward seems ok.

But it seems, the algorithm is trained well, when I use the flexsim_inference.py after, it works fine. So are these strange rewards by test runs a problem, or can we use it, as it is? What is a cause of these strange rewards?

1716801838333.png


Thank you

FlexSim 23.0.15
reinforcement learning
1716801838333.png (401.4 KiB)
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

1 Answer

Felix Möhlmann avatar image
0 Likes"
Felix Möhlmann answered katerina-fratczak commented

The reward in the training model is calculated as 10/('time since last item exit'). Since the processing time of the processor is 10s, this value can never exceed 1. Any additional time will reduce the reward. This could either setup, which the algorithm should learn to avoid, or delay time because no item was available to be pulled.

You can try changing the expression "getstream(current)" in the inter-arrival time of the source to any number between 1 and 100, which will change the random stream that is used to determine the inter-arrival time. When you then run more tests, you should see different 'non-rational' numbers appear (or non at all because items are created fast enough at the simulation start so the processor is never starved).

· 1
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

katerina-fratczak avatar image katerina-fratczak commented ·
Hello,

thank you, I have checked it and it is, as you write - there were no items in the queue (due to the random stream) in the start of the simulation and therefore the rewards so small (and "strange").

0 Likes 0 ·