question

Ryan_Wei avatar image
0 Likes"
Ryan_Wei asked Felix Möhlmann commented

Reinforcement learning setting problem

1709621251317.png

1709621705141.png

In the official example, does "int done = (Model.time > 1000)" mean that the model training time is 1000 seconds?

And "model.learn(total_timesteps=10000)" in Python means that there will be a total of 10,000 training times, which is 10,000 iterations?

Sorry, I just want to confirm if my understanding is correct.

FlexSim 23.0.0
reinforcement learning
1709621251317.png (15.2 KiB)
1709621705141.png (9.8 KiB)
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Felix Möhlmann avatar image
0 Likes"
Felix Möhlmann answered Felix Möhlmann commented

You are correct in the first part. When "done" becomes 1 (the model time when the decision is made is larger than 1000) the simulation will be reset and start a new run.

The "total_timesteps" is the amount of decision the RL agent will get to make during the training. If, for example, there are 50 decisions in each 1000s run, the training would cover 200 simulation runs.

· 2
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Ryan_Wei avatar image Ryan_Wei commented ·

Thanks for your immediate reply


Because the model I am setting now updates the environment, actions and rewards every 43200 seconds, so in FlexSim I need to write the setting of the first part of the reward value as "int done = (Model.time > 43200)"?


As for total_timesteps, if I set its value to 432000, will there be a total of 43200/432000=10 simulation runs? Also, can I understand this as an iteration of 10 times?

0 Likes 0 ·
Felix Möhlmann avatar image Felix Möhlmann Ryan_Wei commented ·

To reiterate, "total_timesteps" is the amount of times the RL agent will be called. In the example model this happens every time the processor pulls a new item. If you want this to happen only every 12 hours, then make the RL tool react to a time based event, like the user event from your previous question.

1709635860998.png

Each "timestep" would then be one iteration. Though I don't really see how it makes sense to use the RL tool this way. Running a model with changing parameters is usually the purpose of the experimenter.

0 Likes 0 ·
1709635860998.png (8.6 KiB)
sharan-nitin avatar image
0 Likes"
sharan-nitin answered Felix Möhlmann commented

@Felix Möhlmann , i have model which has interarrival time schedule,

first part entry = 1000 min

last part entry = 100000min


How should i adjust the values in total_timesteps in python and Model.time in reward function?

· 1
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Felix Möhlmann avatar image Felix Möhlmann commented ·
Depends on how many replications the RL agent should train on and how frequently decisions are made in the model. If, for example, there is a decision on every 10s on average and the model runs for 100.000s, then 10.000 "timesteps" are one model run.

You could also end the model run when the output of the source reaches a certain number.

int done = Model.find("Source1").as(Object).stats.output.value >= 1000 // Just an example
0 Likes 0 ·