I want to understand the parameters of reinforcement learning in tutorialia

Question

question

mark zhen asked Sep 13 2022 at 5:56 AM Kavika F commented Sep 19 2022 at 7:26 PM

I want to understand the parameters of reinforcement learning in tutorialia

So can you explain how the reward function illustrated by the case is designed? and the source of the parameters

For example, the meaning and source of MODEL.TIME

Meaning of CURRENT.TIME and other sources -

Software Version:

FlexSim 22.0.0

reward function

· 1

Andrew O commented · Sep 16 2022 at 6:39 PM

Hi @mark zhen , was Kavika F's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 ·

Answer 1 · 2022-09-13T18:30:52Z

Kavika F answered Sep 13 2022 at 6:30 PM Kavika F commented Sep 19 2022 at 7:26 PM

The example Reward Function in the tutorial is as follows:

It starts by creating a local variable called reward and setting it equal to the Label "Reward" that's found on the Sink1 object in the 3D model. If we look at the Reward label, it is set by the following On Entry Trigger:

This trigger tells the Sink to first increment the Reward label by

10 / (Model.time - current.LastTime)

This calculation takes the difference between the Model.time (which is the current time in the model when this trigger occurs) and the label LastTime. LastTime is set following the Increment of the Reward label to be whatever that current time is. This calculation results in Model.time - current.LastTime being the time between items finishing.

Back to the first set of code, the second line resets the Reward label on the Sink back to 0. If this wasn't done, then the Rewards would compound and no meaningful learning would take place because all actions would be rewarded at an increasing amount.

We then make a second local variable done to see if the Model has finished by checking the current model time (Model.time) to see if it is greater than 1000 (model units, in this case seconds).

Finally, we return a list with the first element as the reward given and the second element a 0 or 1 denoting whether or not the model is finished running.

1663093428157.png (4.3 KiB)

1663093568802.png (12.4 KiB)

· 2

mark zhen commented · Sep 17 2022 at 5:40 AM

@Kavika F @Andrew O @Felix Möhlmann

So model.time is the time when my model first entered the sink

What I don't understand is that the value of current.lasttime is not model.time? Then why does he have a difference

There are other reward functions I want to explore using this model as an example. Do you have any suggestions?

And I want to ask in the value I need to write the sign of the summation or the square of 2 such as 3^2 or the absolute value how should I write Or how to type e^-x and NDerivative () and

Integral(x)

0 ·

Kavika F ♦ mark zhen commented · Sep 19 2022 at 7:26 PM

Model.time just records the current model time when it is called. current.LastTime is initialized to be 0 at the beginning of the model. When the OnEntry trigger is fired, it calculates the trigger events in order from top to bottom. It starts with Increment, calculating the reward. Then, it updates LastTime to be the time that just occurred. Next time OnEntry is called, LastTime is still the time from the past, thus causing a difference between the current Model,time and the previously recorded LastTime.

Example: If OnEntry is fired at 10 seconds, my first calculation will be

10/(10-0) // == 1

Then we'll update LastTime to equal 10. If my next OnEntry is at 15, then the calculation will look like this

10/(15-10) // == 2

Then LastTime will be updated to be 15.

As for your mathematical rewards, you can take a look at the Math API for functions FlexSim supports. You can use things such as

Math.pow(3,2) // 3^2
Math.exp(5)   // e^5

0 ·

question