Removing actions by reinforcement learning

Question

question

katerina-fratczak asked Nov 22 2024 at 9:22 AM Jordan Johnson answered Dec 05 2024 at 7:57 PM

Removing actions by reinforcement learning

Hello,

I need to pull items 1 to 50, but each is available only once. How to set the Action Parameters please? Shall I use Integer 1 to 50, or Options or anything else?

How to remove already chosen action from the action parameters, so that in the next round RL algorithm could choose only from the remaining item numbers?

Thank you, Katerina

Software Version:

FlexSim 23.0.15

reinforcement learning

1732266805649.png (22.1 KiB)

______

Cookie preferences

Your privacy is important to us and so is an optimal experience. To help us customize information and build applications, we collect data about your use of this site.

May we collect and use your data?

Learn more about the Third Party Services we use and our Privacy Statement.

Strictly necessary – required for our site to work and to provide services to you

These cookies allow us to record your preferences or login information, respond to your requests or fulfill items in your shopping cart.

YES

Improve your experience – allows us to show you what is relevant to you

These cookies enable us to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we use to deliver information and experiences tailored to you. If you do not allow these cookies, some or all of these services may not be available for you.

YES

NO

Customize your advertising – permits us to offer targeted advertising to you

These cookies collect data about you based on your activities and interests in order to show you relevant ads and to track effectiveness. By collecting this data, the ads you see will be more tailored to your interests. If you do not allow these cookies, you will experience less targeted advertising.

YES

NO

Are you sure you want a less customized experience?

We can access your data only if you select "yes" for the categories on the previous screen. This lets us tailor our marketing so that it's more relevant for you. You can change your settings at any time by visiting our privacy statement

Your experience. Your choice.

We care about your privacy. The data we collect helps us understand how you use our products, what information you might be interested in, and what we can improve to make your engagement with Autodesk more rewarding.

May we collect and use your data to tailor your experience?

Explore the benefits of a customized experience by managing your privacy settings for this site or visit our Privacy Statement to learn more about your options.

Answer 1 · 2024-11-28T08:37:14Z

Ralf Gruber answered Nov 28 2024 at 8:37 AM katerina-fratczak commented Dec 03 2024 at 6:50 PM

Hi Katerina,

The parameter type "Sequence" is desigened to do what you are asking:

You choose sequence length and it creates an array with that length and fills it with consecutive integers.

@Jordan Johnson Can you please chip in about how this will work in an RL environment?

Thx

Ralf

1732782855370.png (34.9 KiB)

· 3

katerina-fratczak commented · Nov 29 2024 at 2:39 PM

Hi Ralf,

thank you for your answer. Sequence we have already tried, but there is no possibility to connect it with the RL Tools Parameters - see my answer from yesterday: Sequence in reinforcement learning - FlexSim Community.

Since I wrote the question we have tried to use Options 1-50 and removed a chosen option from them after each round using GlobalVariables. Like this the random run in FlexSim works fine, each number is selected only once.

Here random run in FlexSim - chosen numbers are removed from GlobalVariables and Options are updated according to it:

But when we run RL, Python script reads the Action Parameters probably only in the beginning of the training and chooses the same numbers repeatedly. As they are not available in Options any more, FlexSim uses the last row from the Options instead. Like this the RL agent cannot learn properly. There is also confusion with rows and numbers on them (third row has number 6, which is then used in the model...).

Is there any possibility please, how to update available Action Parameters into Python after each Action?

Thank you, Katerina

0 ·

1732890228338.png (70.8 KiB)

1732890553301.png (274.4 KiB)

1732890841360.png (22.0 KiB)

Jordan Johnson ♦♦ commented · Dec 02 2024 at 8:43 PM

Using Reinforcement Learning for scheduling purposes is tricky. A while back, I talked with some RL folks (Bonsai, since discontinued). They said that there are generally better tools available for scheduling than training an agent. They mentioned Gurobi as one possibility:
https://www.gurobi.com/

But that being said, maybe there is a way forward, especially because Gurobi isn't free.

As far as I can tell, the general idea would be to use a single action: which job should be started next. For that, I probably wouldn't use a sequence parameter, but instead discrete parameter from 1 to N. Note also that if you train an AI on a certain number of jobs, you'll always need to supply that number of jobs.

But then, when a job is chosen, you'll need some way to specify that the job isn't available anymore. For that, you'll need something called an action mask. It looks like you can do that with a Maskable PPO algorithm:
https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html

In addition, you'd probably need to send some kind of state information about the current process so the agent can learn to make good scheduling decisions, as part of your observation.

0 ·

katerina-fratczak Jordan Johnson ♦♦ commented · Dec 03 2024 at 9:05 AM

Hello Jordan, thank you very much for your answer. We will try to use the action mask, as you mentioned.

0 ·

Answer 2 · 2024-12-05T19:57:08Z

Jordan Johnson answered Dec 05 2024 at 7:57 PM

One option is to see the article I wrote on this topic, complete with an example:

https://answers.flexsim.com/articles/173513/using-reinforcement-learning-for-job-sequencing.html

question