question

katerina-fratczak avatar image
0 Likes"
katerina-fratczak asked Jordan Johnson answered

Removing actions by reinforcement learning

Hello,

I need to pull items 1 to 50, but each is available only once. How to set the Action Parameters please? Shall I use Integer 1 to 50, or Options or anything else?1732266805649.png

How to remove already chosen action from the action parameters, so that in the next round RL algorithm could choose only from the remaining item numbers?

Thank you, Katerina

FlexSim 23.0.15
reinforcement learning
1732266805649.png (22.1 KiB)
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Ralf Gruber avatar image
0 Likes"
Ralf Gruber answered katerina-fratczak commented

Hi Katerina,

The parameter type "Sequence" is desigened to do what you are asking:

1732782855370.png

You choose sequence length and it creates an array with that length and fills it with consecutive integers.

@Jordan Johnson Can you please chip in about how this will work in an RL environment?

Thx

Ralf


1732782855370.png (34.9 KiB)
· 3
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

katerina-fratczak avatar image katerina-fratczak commented ·

Hi Ralf,

thank you for your answer. Sequence we have already tried, but there is no possibility to connect it with the RL Tools Parameters - see my answer from yesterday: Sequence in reinforcement learning - FlexSim Community.

Since I wrote the question we have tried to use Options 1-50 and removed a chosen option from them after each round using GlobalVariables. Like this the random run in FlexSim works fine, each number is selected only once.

Here random run in FlexSim - chosen numbers are removed from GlobalVariables and Options are updated according to it:1732890228338.png

But when we run RL, Python script reads the Action Parameters probably only in the beginning of the training and chooses the same numbers repeatedly. As they are not available in Options any more, FlexSim uses the last row from the Options instead. Like this the RL agent cannot learn properly. There is also confusion with rows and numbers on them (third row has number 6, which is then used in the model...).

1732890553301.png


1732890841360.png


Is there any possibility please, how to update available Action Parameters into Python after each Action?

Thank you, Katerina



0 Likes 0 ·
1732890228338.png (70.8 KiB)
1732890553301.png (274.4 KiB)
1732890841360.png (22.0 KiB)
Jordan Johnson avatar image Jordan Johnson ♦♦ commented ·

Using Reinforcement Learning for scheduling purposes is tricky. A while back, I talked with some RL folks (Bonsai, since discontinued). They said that there are generally better tools available for scheduling than training an agent. They mentioned Gurobi as one possibility:
https://www.gurobi.com/

But that being said, maybe there is a way forward, especially because Gurobi isn't free.

As far as I can tell, the general idea would be to use a single action: which job should be started next. For that, I probably wouldn't use a sequence parameter, but instead discrete parameter from 1 to N. Note also that if you train an AI on a certain number of jobs, you'll always need to supply that number of jobs.

But then, when a job is chosen, you'll need some way to specify that the job isn't available anymore. For that, you'll need something called an action mask. It looks like you can do that with a Maskable PPO algorithm:
https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html

In addition, you'd probably need to send some kind of state information about the current process so the agent can learn to make good scheduling decisions, as part of your observation.

0 Likes 0 ·
katerina-fratczak avatar image katerina-fratczak Jordan Johnson ♦♦ commented ·
Hello Jordan, thank you very much for your answer. We will try to use the action mask, as you mentioned.


0 Likes 0 ·
Jordan Johnson avatar image
0 Likes"
Jordan Johnson answered

One option is to see the article I wrote on this topic, complete with an example:

https://answers.flexsim.com/articles/173513/using-reinforcement-learning-for-job-sequencing.html

5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.