Optimizing a model using Reinforcement Learning

Question

question

Cristina R10 asked Sep 26 2022 at 3:59 PM Felix Möhlmann commented Sep 27 2022 at 10:05 AM

Optimizing a model using Reinforcement Learning

Hi!

In the model that I created there is a source that generates two type of element identyfied by a label (and color) defined on the item, 'Type': so the red one is Type 1 and the green one in Type 2.

My purpose is to make the robot2 intelligent using the Reinforcement Learning tool: it should take an element depending on the longest queue choosing from the red one or the green one.

As Observation parameters I used the number of elements in the queues and the ItemType as Action parameters.

In order to do so I defined a label on the robot2 named 'Reward' and I inserted a trigger On Load>Increment Value then, by modifying this code, I established the value associated with the variable 'reward' depending on the case:

1. When ItemType=1 (so the robot2 chose the red one) the reward must be 100 if the number of elements in the red queue is greater than the number in the green one, otherwise the reward must be -100.

2. When ItemType=2 (so the robot2 chose the green one) the reward must be 100 if the number of elements in the green queue is greater than the number in the red one, otherwise the reward must be -100.

Is the code for the reward label exact? Is the Reward Function in the RL tool exact (is the done criteria correct?) ? Is correct not to have a pull strategy defined on the robot (so it only learns by reawards)? How many total_timesteps should I insert in the training script? 1000s is too little?

The main problem is that the robot2 doesn't become intelligent due to:

a) It continues loading and unloading items randomly (it takes first a red and then a green item continuisly)

b) In the scripts given by FlexSim, the connection to the local host in the inference script is broken (so basically I'm stucked at step 4 of this guideline https://docs.flexsim.com/en/22.1/ModelLogic/ReinforcementLearning/UsingATrainedModel/UsingATrainedModel.html )

I'm uploading my model here, please check it in order to have a full view.

Can someone please help? I'm going crazy... there are very fiew examples on RL resolutions.

Modello Tesi ottimizzato.fsm

Software Version:

FlexSim 22.2.2

robot reinforcement learning queue strategy model optimization script problem

1664207123293.png (66.3 KiB)

1664207528435.png (43.7 KiB)

modello-tesi-ottimizzato.fsm (293.9 KiB)

Answer 1 · 2022-09-27T06:23:02Z

Felix Möhlmann answered Sep 27 2022 at 6:23 AM Felix Möhlmann commented Sep 27 2022 at 10:05 AM

I am not seeing where the Action parameter is influencing the logic of the model. It looks like the flow is currently just using the standard 3d logic, meaning the items will be transported in FIFO order.

One way to use the parameter could be to use the Pull Strategy on the queue and use the parameter to dictate the port through which an item should be pulled.

Alternatively, you could build a small process flow that controls the robot arm and forego the connections in the 3d logic by pushing the items to a list and pulling them in the process flow. Followed by tasks to move the item to the correct queue.

1664259712371.png (5.1 KiB)

1664259723960.png (14.6 KiB)

· 2

Cristina R10 commented · Sep 27 2022 at 7:52 AM

I tried using this pull strategy on the queue but now I don't get the criteria by which the robot takes an item. It seems it waits for the ItemType to update (by I don't know which basis) and then the robot2 takes an item based on the value the parameter is assuming. But since it seems the parameter updates randomly, the robot2 is still not smart, indeed as result I have longer queues.

0 ·

Felix Möhlmann Cristina R10 commented · Sep 27 2022 at 10:05 AM

There is a bug in 22.2 that prevents the RL python code from the example to work correctly, if there is only a single action parameter. Take a look at the linked post to see how to fix it.

The robot takes whatever item was pulled by the queue in the order that they were pulled. The process flow version might work better, because with only the 3d-logic the queue can still pull items without the parameter having been updated in the meantime. For the model to work better with the Pull Strategy, you could also limit the content of the queue to 1, so it can only pull one item at a time. This guarantees that the parameter will update before the next pull.

Some general tips:

For the RL algorithm the state where QueueR has 2 items and QueueG has 5 is different from when QueueR has 22 and QueueG has 25. Since there are more items being created than robot2 can handle, the queues will fill up slowly, leading to new observation values (Up to 1.000.000 different states since each queue can hold 1000 items). The algorithm will be much more successful if you break the numbers down into fewer states. For example "QueueR has more items", "QueueG has more items", "Both have the same amount".

To get more variance it probably wouldn't hurt to let each training instance run for longer than 1000s.

Point b) from your original question:

Your model has 4 observation parameters, so the inference server expects that many values to be passed to it. For example

0 ·

1664272991361.png (2.0 KiB)

question