question

mark zhen avatar image
0 Likes"
mark zhen asked Jeanette F commented

flexsim reinforcement learning quetion

I want to know some details about models and reinforcement learning

For example, episode corresponds to the meaning represented by the model

Or what timestep corresponds to in the model. Because I wanted to understand why I had a spike in rewards at the beginning when I was training.

1697448272978.png

FlexSim 23.0.0
reinforcement learningtraining
1697448272978.png (270.6 KiB)
· 5
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Kavika F avatar image Kavika F ♦ commented ·
@mark zhen, where did you get this graph? Did you plot this from python or FlexSim?
1 Like 1 ·
Joerg Vogel avatar image Joerg Vogel commented ·
Probably a division by zero or near zero.
0 Likes 0 ·
mark zhen avatar image mark zhen Joerg Vogel commented ·

So how do I solve this problem in the model

0 Likes 0 ·
Jason Lightfoot avatar image Jason Lightfoot ♦♦ commented ·
I believe the term 'time-step' comes from the action/reward step in a Markov Decision Process, and their number is aligned to the number of cycles of action->simulate->observe->reward within your episode.
0 Likes 0 ·
Jeanette F avatar image Jeanette F ♦♦ commented ·

Hi @mark zhen ,

Were you able to solve your problem? If so, please add and accept an answer to let others know the solution. Or please respond to the previous comment so that we can continue to help you.

If we don't hear back in the next 3 business days, we'll assume you were able to solve your problem and we'll close this case in our tracker. You can always comment back at any time to reopen your question, or you can contact your local FlexSim distributor for phone or email help.

0 Likes 0 ·

1 Answer

Joerg Vogel avatar image
0 Likes"
Joerg Vogel answered mark zhen commented

A quick and dirty way would be to work with a warmup time or you transmit rewards a bit later in your model runtime.

· 6
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

mark zhen avatar image mark zhen commented ·

I don't quite understand the meaning of warm up and what impact it will have on the model.1697540691717.png0926.fsm

0 Likes 0 ·
1697540691717.png (11.4 KiB)
0926.fsm (50.6 KiB)
Jason Lightfoot avatar image Jason Lightfoot ♦♦ mark zhen commented ·

You can find the warmup description in the online documentation. It could somehow influence the timesteps in question if your rewards are based on model statistics. I'm not sure if it would explain the spike in your graph.

0 Likes 0 ·
mark zhen avatar image mark zhen Jason Lightfoot ♦♦ commented ·

Thanks but I have other questions,

For example, what is the meaning of the flexsim model corresponding to each timestep?

What does each epoch and episode mean?

0 Likes 0 ·
Show more comments