flexsim reinforcement learning quetion

Question

question

mark zhen asked Oct 16 2023 at 9:25 AM Jeanette F commented Oct 25 2023 at 4:46 PM

flexsim reinforcement learning quetion

I want to know some details about models and reinforcement learning

For example, episode corresponds to the meaning represented by the model

Or what timestep corresponds to in the model. Because I wanted to understand why I had a spike in rewards at the beginning when I was training.

Software Version:

FlexSim 23.0.0

reinforcement learning training

1697448272978.png (270.6 KiB)

· 5

Kavika F ♦ commented · Oct 18 2023 at 6:01 PM

@mark zhen, where did you get this graph? Did you plot this from python or FlexSim?

1 ·

Joerg Vogel commented · Oct 16 2023 at 9:49 AM

Probably a division by zero or near zero.

0 ·

mark zhen Joerg Vogel commented · Oct 16 2023 at 10:10 AM

So how do I solve this problem in the model

0 ·

Jason Lightfoot ♦♦ commented · Oct 17 2023 at 12:45 PM

I believe the term 'time-step' comes from the action/reward step in a Markov Decision Process, and their number is aligned to the number of cycles of action->simulate->observe->reward within your episode.

0 ·

Jeanette F ♦♦ commented · Oct 25 2023 at 4:46 PM

Hi @mark zhen ,

Were you able to solve your problem? If so, please add and accept an answer to let others know the solution. Or please respond to the previous comment so that we can continue to help you.

If we don't hear back in the next 3 business days, we'll assume you were able to solve your problem and we'll close this case in our tracker. You can always comment back at any time to reopen your question, or you can contact your local FlexSim distributor for phone or email help.

0 ·

______

Cookie preferences

Your privacy is important to us and so is an optimal experience. To help us customize information and build applications, we collect data about your use of this site.

May we collect and use your data?

Learn more about the Third Party Services we use and our Privacy Statement.

Strictly necessary – required for our site to work and to provide services to you

These cookies allow us to record your preferences or login information, respond to your requests or fulfill items in your shopping cart.

YES

Improve your experience – allows us to show you what is relevant to you

These cookies enable us to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we use to deliver information and experiences tailored to you. If you do not allow these cookies, some or all of these services may not be available for you.

YES

NO

Customize your advertising – permits us to offer targeted advertising to you

These cookies collect data about you based on your activities and interests in order to show you relevant ads and to track effectiveness. By collecting this data, the ads you see will be more tailored to your interests. If you do not allow these cookies, you will experience less targeted advertising.

YES

NO

Are you sure you want a less customized experience?

We can access your data only if you select "yes" for the categories on the previous screen. This lets us tailor our marketing so that it's more relevant for you. You can change your settings at any time by visiting our privacy statement

Your experience. Your choice.

We care about your privacy. The data we collect helps us understand how you use our products, what information you might be interested in, and what we can improve to make your engagement with Autodesk more rewarding.

May we collect and use your data to tailor your experience?

Explore the benefits of a customized experience by managing your privacy settings for this site or visit our Privacy Statement to learn more about your options.

Answer 1 · 2023-10-16T11:22:07Z

Joerg Vogel answered Oct 16 2023 at 11:22 AM mark zhen commented Oct 19 2023 at 2:26 PM

A quick and dirty way would be to work with a warmup time or you transmit rewards a bit later in your model runtime.

· 6

mark zhen commented · Oct 17 2023 at 11:05 AM

I don't quite understand the meaning of warm up and what impact it will have on the model.0926.fsm

0 ·

1697540691717.png (11.4 KiB)

0926.fsm (50.6 KiB)

Jason Lightfoot ♦♦ mark zhen commented · Oct 17 2023 at 12:08 PM

You can find the warmup description in the online documentation. It could somehow influence the timesteps in question if your rewards are based on model statistics. I'm not sure if it would explain the spike in your graph.

0 ·

mark zhen Jason Lightfoot ♦♦ commented · Oct 17 2023 at 12:52 PM

Thanks but I have other questions,

For example, what is the meaning of the flexsim model corresponding to each timestep?

What does each epoch and episode mean?

0 ·

Show more comments

question