question

Luca B6 avatar image
0 Likes"
Luca B6 asked Luca B6 commented

RL rewards don't increase

Hi everyone, I have some problems with my model

immagine-2022-09-29-180547.png


The aim of this model is to train orange robot in order to make it choose correctly what good he has to pick, because there are 4 kinds of goods relating to their color. If the robot is well trained, he will learn to not to pick for example two yellow goods consecutively in a small time, because yellow robot will no have time to pick both of them. So the queue at the end of the conveyor should be empty if everything is working.

Now, I have created two parameters: LostItems (an observation parameter) which count items who enter in the last queue, and ItemType (an action parameter) which represent the 4 kinds of goods.

I have created a label called Items on the last queue, and i have set a on entry trigger with increment value which increase Items when goods in the queue.

In the orange robot i have created a label called Reward and i have set an on entry trigger with increment value by 10/(10 + (Model.find("Queue4").Items)) on current.labels["Reward"], so more items enter in the last queue, less would be the reward.

In the RL tool, my observation function is:

Model.parameters["LostItems"].value = Model.find("Queue4").Items;

And my reward function is:

double reward = Model.find("Robot1").Reward;

Model.find("Robot1").Reward = 0;

int done = (Model.time > 1000);

return [reward, done];


When I try to train my model, rewards remain stable and don't increase. I think I am not using well initial parameters.

What do you suggest to do? Here is my file: modello6_5.fsm

FlexSim 22.1.1
robotreinforcement learning
· 4
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Felix Möhlmann avatar image
0 Likes"
Felix Möhlmann answered Luca B6 commented

I am not getting any errors when running the model either. However, there are some additional things I'd like to point out in regards to the RL setup in the model.

1) I don't see where the action parameter has any impact on the model. The queue currently just sends out the items in FIFO order. There is no Pull Strategy or Process Flow or other logic that takes the parameter into account.

2) The number of items in the queue is useful to calculate the reward. However, it is not useful as an observation parameter. "There are 4 items in the queue so the next item should be of type X" is not a valid inference in general. Sure, in theory the algorithm could learn how to pick items for one specific order of incoming items, but as soon as the incoming order deviates, the policy would be useless. The observation should instead probably be which type of item was previously put onto the conveyor.

2.1) The number of items is also not a good observation because the number can only ever increase. Meaning the algorithm will constantly encounter new states without ever going back to a previous one. So comparing the impact of making a different decision for the same state can only really happen between separate runs, severly limiting the speed at which the algorithm cam learn.

3) Since there is no information about which types are actually available to pick from, you might want to set the action parameter up, so it means "Do not pick this type of item".

· 15
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jeanette F avatar image
0 Likes"
Jeanette F answered

Hello @Luca B6,

I am still not sure why the exceptions are showing for me but not you. I can now identify why you are not seeing any change in your rewards. The reinforcement learning doesn't work with the express version of FlexSim because there are no random streams. In other words there is no variation to allow learning. Its the same thing over and over. You will need a license in order to have random streams. Please contact your local distributor for assistance with a license.

5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.