RL rewards don't increase

Question

question

Luca B6 asked Oct 4, '22 Luca B6 commented Oct 7, '22

RL rewards don't increase

Hi everyone, I have some problems with my model

The aim of this model is to train orange robot in order to make it choose correctly what good he has to pick, because there are 4 kinds of goods relating to their color. If the robot is well trained, he will learn to not to pick for example two yellow goods consecutively in a small time, because yellow robot will no have time to pick both of them. So the queue at the end of the conveyor should be empty if everything is working.

Now, I have created two parameters: LostItems (an observation parameter) which count items who enter in the last queue, and ItemType (an action parameter) which represent the 4 kinds of goods.

I have created a label called Items on the last queue, and i have set a on entry trigger with increment value which increase Items when goods in the queue.

In the orange robot i have created a label called Reward and i have set an on entry trigger with increment value by 10/(10 + (Model.find("Queue4").Items)) on current.labels["Reward"], so more items enter in the last queue, less would be the reward.

In the RL tool, my observation function is:

Model.parameters["LostItems"].value = Model.find("Queue4").Items;

And my reward function is:

double reward = Model.find("Robot1").Reward;

Model.find("Robot1").Reward = 0;

int done = (Model.time > 1000);

return [reward, done];

When I try to train my model, rewards remain stable and don't increase. I think I am not using well initial parameters.

What do you suggest to do? Here is my file: modello6_5.fsm

Software Version:

FlexSim 22.1.1

robot reinforcement learning

immagine-2022-09-29-180547.png (342.1 KiB)

modello6-5.fsm (317.1 KiB)

· 4

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jeanette F ♦♦ commented · Oct 04, 2022 at 03:59 PM

Hello @Luca B6,

The model you sent has not been debugged. It throws exceptions from the first flow item. I highly recommend that you debug your model before trying to do reinforcement learning with it.

0 ·

Luca B6 Jeanette F ♦♦ commented · Oct 04, 2022 at 04:52 PM

It's strange because it doesn't throw exceptions when I start simulation. Anyway you can try with this other file: modello6.fsm

0 ·

modello6.fsm (317.0 KiB)

Jeanette F ♦♦ Luca B6 commented · Oct 04, 2022 at 06:32 PM

The exception is the same. It involves the On Entry Trigger. In the value field you have the following

Model.find("Queue4").Items

This means that you are pointing to the object Queue4 and a label on it called Items. Which does not exist. Please look into this.

Are you wanting a reference to the first item? or the number of items in the queue?

0 ·

Show more comments

Answer 1 · 2022-10-05T06:50:28Z

Felix Möhlmann answered Oct 5, '22 Luca B6 commented Oct 7, '22

I am not getting any errors when running the model either. However, there are some additional things I'd like to point out in regards to the RL setup in the model.

1) I don't see where the action parameter has any impact on the model. The queue currently just sends out the items in FIFO order. There is no Pull Strategy or Process Flow or other logic that takes the parameter into account.

2) The number of items in the queue is useful to calculate the reward. However, it is not useful as an observation parameter. "There are 4 items in the queue so the next item should be of type X" is not a valid inference in general. Sure, in theory the algorithm could learn how to pick items for one specific order of incoming items, but as soon as the incoming order deviates, the policy would be useless. The observation should instead probably be which type of item was previously put onto the conveyor.

2.1) The number of items is also not a good observation because the number can only ever increase. Meaning the algorithm will constantly encounter new states without ever going back to a previous one. So comparing the impact of making a different decision for the same state can only really happen between separate runs, severly limiting the speed at which the algorithm cam learn.

3) Since there is no information about which types are actually available to pick from, you might want to set the action parameter up, so it means "Do not pick this type of item".

· 15

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Luca B6 commented · Oct 05, 2022 at 10:45 AM

I got it. So now I have changed the two parameter names:

- Observations: pickedItem (an integer between 1 and 4);

- Actions: wrongItem (another integer between 1 and 4);

Now I have two problems:

1) I can't find a variable or a label in robot code or in conveyor code who takes in count of the current object type which is going on in order to do something like this in the on observation function:

- Model.parameters["pickedItem"].value = (Model.find("Robot1").currentobject.type);

Which variable I should use?

2) To use actions parameter I think to do a pull strategy on the first queue (the one before orange robot) with this expression:

item.Type == Model.parameters["wrongItem"].value

Now if I do so, there will be always a wrong Item else if the robot choose correctly the item to pick?

0 ·

Felix Möhlmann Luca B6 commented · Oct 05, 2022 at 11:44 AM

1) You can do this together with calculating/setting the reward in the robot's On Entry trigger. That trigger has a reference to the item, so you can write the type to the observation parameter.

As the decision event for the RL connection, you can then use the On Exit event, so that the observation gets set before the next decision is made.

2) The source object only "holds" one item. If the queue does not pull it due to its type, the source will not generate a new item and the model will be stuck. You could instead use the "Pull Strategy" on the entry transfer (since it controls what the robot will pick up). There you can use

item.Type != Model.parameters.wrongItem.

So the algorithm chooses what type won't be picked by setting the parameter accordingly.

0 ·

1664970083299.png (5.6 KiB)

Luca B6 Felix Möhlmann commented · Oct 05, 2022 at 02:08 PM

I have done as you suggest, and I have changed the on Observation function as:

Model.parameters["pickedItem"].value;

in order to use the pickedItem value given by instructions on the snippet code. But now I get the exception:

FlexScript exception: Property Type not available for Variant with value NULL. Must by a treenode type Variant. at MODEL:/EntryTransfer1>variables/receivefromport

Maybe this is because he get some null values when the robot doesn't pick anything because it is still waiting for item creation. How can i solve this?

file: modello7.fsm

0 ·

modello7.fsm (317.2 KiB)

Show more comments

Answer 2 · 2022-10-04T21:37:31Z

Jeanette F answered Oct 4, '22

Hello @Luca B6,

I am still not sure why the exceptions are showing for me but not you. I can now identify why you are not seeing any change in your rewards. The reinforcement learning doesn't work with the express version of FlexSim because there are no random streams. In other words there is no variation to allow learning. Its the same thing over and over. You will need a license in order to have random streams. Please contact your local distributor for assistance with a license.

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

question