Reinforcement learning setting problem

Question

question

Ryan_Wei asked Mar 5, '24 Felix Möhlmann commented Apr 15, '24

Reinforcement learning setting problem

In the official example, does "int done = (Model.time > 1000)" mean that the model training time is 1000 seconds?

And "model.learn(total_timesteps=10000)" in Python means that there will be a total of 10,000 training times, which is 10,000 iterations?

Sorry, I just want to confirm if my understanding is correct.

Software Version:

FlexSim 23.0.0

reinforcement learning

1709621251317.png (15.2 KiB)

1709621705141.png (9.8 KiB)

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Answer 1 · 2024-03-05T07:18:18Z

Felix Möhlmann answered Mar 5, '24 Felix Möhlmann commented Mar 5, '24

You are correct in the first part. When "done" becomes 1 (the model time when the decision is made is larger than 1000) the simulation will be reset and start a new run.

The "total_timesteps" is the amount of decision the RL agent will get to make during the training. If, for example, there are 50 decisions in each 1000s run, the training would cover 200 simulation runs.

· 2

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Ryan_Wei commented · Mar 05 at 07:49 AM

Thanks for your immediate reply

Because the model I am setting now updates the environment, actions and rewards every 43200 seconds, so in FlexSim I need to write the setting of the first part of the reward value as "int done = (Model.time > 43200)"?

As for total_timesteps, if I set its value to 432000, will there be a total of 43200/432000=10 simulation runs? Also, can I understand this as an iteration of 10 times?

0 ·

Felix Möhlmann Ryan_Wei commented · Mar 05 at 10:53 AM

To reiterate, "total_timesteps" is the amount of times the RL agent will be called. In the example model this happens every time the processor pulls a new item. If you want this to happen only every 12 hours, then make the RL tool react to a time based event, like the user event from your previous question.

Each "timestep" would then be one iteration. Though I don't really see how it makes sense to use the RL tool this way. Running a model with changing parameters is usually the purpose of the experimenter.

0 ·

1709635860998.png (8.6 KiB)

Answer 2 · 2024-04-15T11:17:26Z

sharan-nitin answered Apr 15, '24 Felix Möhlmann commented Apr 15, '24

@Felix Möhlmann , i have model which has interarrival time schedule,

first part entry = 1000 min

last part entry = 100000min

How should i adjust the values in total_timesteps in python and Model.time in reward function?

· 1

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Felix Möhlmann commented · Apr 15 at 03:02 PM

Depends on how many replications the RL agent should train on and how frequently decisions are made in the model. If, for example, there is a decision on every 10s on average and the model runs for 100.000s, then 10.000 "timesteps" are one model run.

You could also end the model run when the output of the source reaches a certain number.

int done = Model.find("Source1").as(Object).stats.output.value >= 1000 // Just an example

0 ·

question