Pull strategy over reinforcement learning

Question

question

Gabri asked Feb 1, '24 Jason Lightfoot commented Feb 12, '24

Pull strategy over reinforcement learning

Hi,

I'm working on a reinforcement learning project. The basis is that of the reinforcement learning tutorial.

In the model I added a second source and a second queue so I have one source which generates 5 types of pallet and another one which generates 5 types of people. What I want is apply reinforcement learning, to maximize the occupancy of the pallets (25 places), taking from the second queue, the same type of people of the pallet that is getting processed.

As in the reinforcement learning tutorial, I wrote an input pull strategy in the processor, because without the python's code provided doesn't work. In the pull strategy I say to take the type that has more people in the queue than the others. The problem is that I think that the model doesn't learn, but follows only the pull strategy. I want that the model learn to do this, not that someone tell it what to do, can you help me with what I want?

Probably I have also problems with the settings of the parameters in the observation space.

I attach the model.

ChangeoverTimesRL_company.fsm

Software Version:

FlexSim 22.0.16

reinforcement learning pull strategy observation space

changeovertimesrl-company.fsm (286.5 KiB)

· 1

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jason Lightfoot ♦♦ commented · Feb 12 at 10:37 AM

Hi @Gabri, was Felix Möhlmann's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always comment back to reopen your question.

0 ·

Answer 1 · 2024-02-02T08:59:38Z

Felix Möhlmann answered Feb 2, '24 Felix Möhlmann commented Feb 6, '24

You'd jsut use the same logic from the tutorial model. The RL algorithm sets an 'action' parameter which controls what type is pulled.

I do want to give one tip about the code you wrote. You put the for-loop that determines the most ubiquitous type of people inside the while-loop that runs through all available items. This means the most common type is checked again for each item. This is unnecessary and might well cost some performance when running the model as fast as possible. You can instead determine the "itemTypeValue" variable once before the while-loop.

And some notes based on my limited experience with Reinforcement Learning: You will get faster results by eliminating superfluous information from the observations. For example, the RL algorithm doesn't really need to know how many pallets of each type there are, just if at least one is available to be pulled.

· 6

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Gabri commented · Feb 02 at 04:15 PM

Thank you for the answer and tip. But can you help me with the pull strategy? In my opinion it is too specific, I was thinking to change it in something more general so decisions would be made by reinfrocement learning through the reward function. Can you suggest something?

0 ·

Felix Möhlmann Gabri commented · Feb 02 at 05:17 PM

As I already mentioned, the decision that is made in your model is still the same as in the example model: which item type to pull. I see no reason why you would need to alter the pull strategy from the tutorial.

The pull strategy you created is already the optimum you want the RL agent to reach. So you can later use it to gauge how well the agent is doing.

0 ·

1706894192246.png (12.7 KiB)

Gabri Felix Möhlmann commented · Feb 06 at 11:09 AM

Yes, but if I'm not wrong this pull strategy was used to take packages of the same type just processed if there are, otherwise it takes the oldest in the queue. This is not what I want. I did not understand how much the pull strategy affects learning, if it has priority over learning then does not learn.

0 ·

Show more comments

question