question

katerina-fratczak avatar image
0 Likes"
katerina-fratczak asked Jordan Johnson answered

Reinforcement learning - More decision events settings

Hello,

I have my own model where I created 2 decision events - one for each queue. But there is no window to set the On Observation, Reward Function or on Request Action trigger individually (differently) for each Decision Event. Therefor I have set parameters for both queues at once there. RL training works like this. I tried to delete the second decision event and it works also, because the action parameters for the 2nd queue are set already by running the Decision Event 1.1720444358437.png

2 Decision Events:1720445855132.png


1 Decision Event - seems to be more correct:

1720446078656.png


Is it correct to use only one Decision event in this model? In which case is it necessary to use more Decision Events? Why cannot be triggers set individually for various Decision Events?

Thank you in advance :-)


15_our_3_model_push.fsm

FlexSim 23.0.15
reinforcement learning
1720444358437.png (165.1 KiB)
1720445855132.png (367.3 KiB)
1720446078656.png (421.9 KiB)
· 2
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Joerg Vogel avatar image Joerg Vogel commented ·

@Katerina_Fratczak, probably you want to add more agents to your observation environment. And there is a restriction on most Python versions called Global Interpreter Lock (GIL). You have to parse more observation agents into one single threaded python main procedure.

0 Likes 0 ·
katerina-fratczak avatar image katerina-fratczak Joerg Vogel commented ·
Hello Mr. Vogel,

no, I am just curious, why there is a possibility to add more Decision Events in FlexSim, but I do not see a possibility to define the On observation, Reward function...individually for each of them.

0 Likes 0 ·

1 Answer

Jordan Johnson avatar image
0 Likes"
Jordan Johnson answered

In Reinforcement Learning, an agent takes in all observations and sets all action values. That's why there is only one "On Observation", "On Reward", etc.

It seems to me that you are using the correct decision events; you want the agent to decide which port a box should go to, whether that box is at QueueA or QueueB. That seems fine.

I don't think you need two actions, though. It's the same action: port 1 or port 2. The only difference is which queue is "asking". But I think if you put that in your observation space somehow, then the algorithm will be able to tell which queue is "asking" and make the correct choice.

For example, see the attached zip file:

rl-sample.zip

I modified your model. I changed the observation space to show how much time has elapsed since an item entered each of the objects. The value for TimeSinceEntryA will be zero if a box just arrived at QueueA. Similarly, the value for TimeSinceEntryB will be zero if a box just arrived at QueueB. In this sense, the RL algorithm can observe which queue is asking.

Also, by observing the time since boxes entered each processor, the RL algorithm can learn to make decisions based on what is happening in the model. This is an important point. In the model you posted, you were observing your previous actions. Instead, it's important to observe the system. That way, the agent can be trained to take the system from a bad state and move it to a good state and keep it there.

It's highly likely that there is an even better observation space. Maybe it would be good to observe how many items (1 or 0) are on each processor. Or maybe, using a custom space, you could observe both how many items and the time since a box arrived, or even some other key value. Or maybe the reward function should be better. I changed it to "1 point per box" since sometimes the previous reward showed infinite due to a dvide by zero issue. It's difficult to provide more specific advice; whether the agent learns well is mostly an exercise in trial and error. Hopefully that makes some sense.


rl-sample.zip (34.6 KiB)
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.