question

Ryan_Wei avatar image
0 Likes"
Ryan_Wei asked Jeanette F commented

I have a question about reinforcement learning example

1694785042363.png

I want to know what this official website provides

int done = (Model.time > 1000);

Does this mean that it will calculate the value of Reward every 1000 seconds?

FlexSim 23.0.0
reinforcement learning
· 3
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Kavika F avatar image Kavika F ♦ commented ·
Hey @Ryan_Wei, could you please reupload your image? I'm unable to see it. Thank you!
0 Likes 0 ·
Ryan_Wei avatar image Ryan_Wei Kavika F ♦ commented ·

I'm so sorry

Here is the image:

1694875374763.png

0 Likes 0 ·
1694875374763.png (12.5 KiB)
Jeanette F avatar image Jeanette F ♦♦ commented ·

Hi @Ryan_Wei, was Felix Möhlmann's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always comment back to reopen your question.

0 Likes 0 ·

1 Answer

Felix Möhlmann avatar image
1 Like"
Felix Möhlmann answered

The reward function passes an array with two elements to the reinforcement learning algorithm. The first value ist the reward itself. The second value controls whether the algorithm continues the current simulation run (0) or concludes the run and starts a new one (1).

(Model.time > 1000) evaluates either to 0 or 1, depending on the current time in the simulation. So the first time the reward is send after the simulation passes 1000s, a new replication will be started.

5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.