Farah A avatar image
1 Like"
Farah A asked Farah A commented

Testing Reinforcement learning functionality

Hello everyone,

I'm discovering the new functionality of Flexsim Reinforcement Learning and i'm still not familiar with the concept application, i'm struggling a bit to make it work.

I want to generate disturbance, when an item arrives at a turn, it is blocked for few seconds, i want from my agent to be aware of that blocking state and send the information to release the item.

How to make the agent receive the information and send the action ?

Here's the model i'm working on testRL.fsm, can you please help me ?

Thank you in advance

FlexSim 22.0.0
reinforcement learning
testrl.fsm (282.0 KiB)
· 1
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Connor A avatar image Connor A commented ·

Hi @Farah A, was Jordan Johnson's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 Likes 0 ·

1 Answer

Jordan Johnson avatar image
1 Like"
Jordan Johnson answered Farah A commented

Let me clear up a concept first. The agent won't be watching the model, waiting for something to happen. Instead, something will happen in the model, and the model will ask the agent what to do. This means the Agent doesn't have to detect the issue; the agent won't be good at that. The model (or in real life, sensors) can detect the issue.

In your case, it sounds like the conveyor system gets backed up for some reason, and you want the AI to decide what to do about it. So the first step is building a model that gets backed up, and then using the Reinforcement Learning tool listen to something, such as the OnBlock of a photo eye. That way, when the model detects a block, you can ask the AI what to do. In FlexSim, that means you'll set the parameters that are part of the observation space. My guess is you'll want to send the values for how many items are at various parts of the model (such as in a given lane, or in a queue) or something like that. Whatever info you send, it should be info that the AI needs to make a decision. Note that it doesn't need to know "there's a problem", because you're only asking in that situation, and reducing the number of observation space parameters is a good idea.

Anyways, the AI will then express an action. In FlexSim that means the AI will set the parameter values in your action space. You need to make sure that your model uses the values in the action space to change its behavior. To do that, you'll first set the Reinforcement Learning tool to just pick random values. That way, you can test that your model detects the issue and responds to random actions correctly. Once you've done that, you're ready to train the AI as directed in the walkthrough material:

· 3
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Farah A avatar image Farah A commented ·

Thank you for your answer, i understand bette how it works now. i'm trying to build the model and i have some questions :

1- I put the blocked items in a intermediate Queue, i increase the value of a label "count" OnEntry, i'm trying to send the value to the observation space by writing in thr RL tool : Model.parameters["count"].value==getlabelnum(Model.find("Queue2"),"count") but the value of the observation space never changed, what is wrong with my command ?

2- I put random actions on Request action and it seems like it works well, however, in the decision event i want to put an event based onsend to port strategy of the queue so that it send the item again on the conveyor but i have an error : exception: FlexScript exception: Property "value" accessed on invalid node. at VIEW:/active/ReinforcementLearning2343796112/tabcontrol/Settings/DecisionEvents/SplitPanel/RightPanel/ObjectSampler>eventfunctions/onSample c: VIEW:/active/ReinforcementLearning2343796112/tabcontrol/Settings/DecisionEvents/SplitPanel/RightPanel/ObjectSampler

Can you please have a closer look on my model below ?

Thank you very much


0 Likes 0 ·
testrl.fsm (283.4 KiB)
Jordan Johnson avatar image Jordan Johnson ♦♦ Farah A commented ·

I don't think I understand what you're trying to do. The process flow is set up so that items always block. Whenever an item arrives at the DP, the process flow creates a token, and that token stops the item. But previously, you described a system where items only block sometimes, rather than always.

Whether the item is stopped or not, the photo eye fires its OnBlock event, because the block time is set to zero. The process flow creates a token when this happens, and that token moves the item to the queue. This is incorrect; you should not use a Move Object activity to remove an item from a conveyor. To get an item off a conveyor, you need to send the item to an exit transfer. This is why the second item doesn't flow past the decision point.

There are a couple smaller issues as well. Here is a fix for your code in the RL tool:

// == compares the two values
Model.parameters["count"].value == getlabelnum(Model.find("Queue2"),"count")

// = assigns the value, which is what you want
Model.parameters["count"].value = getlabelnum(Model.find("Queue2"),"count")

Also, the queue's OnExit trigger is incrementing a label on the item, when I think it should be incrementing the label on current.

Okay, so all that being said, I made this model:


It creates a token at the Decision Point. That token flows through a Decide, which randomly choses whether to stop the item or not.

If the decide stops the item, then after 3 seconds (the block time on the Photo Eye), the model creates a token for the On Block of the PE. That token resumes the item, and sends the item to an output queue, based on the Destination parameter.

The RL tool listens to the On Block of the photo eye, and creates an observation. That observation gets the content of the three buffer queues. Note that you don't need to increment/decrement a label; you can just get the object statistics. The RL tool also randomly picks a value for the Destination parameter, so the stopped items go to one of the three queues.

But this is unfinished. The actions taken by the RL don't currently affect the reward in any way. It's up to you to figure out what decision you want the AI to make, and when you want the AI to make that decision.

1 Like 1 ·
testrl-1.fsm (286.5 KiB)
Farah A avatar image Farah A Jordan Johnson ♦♦ commented ·
Thank you for your answer, it was helpful !
0 Likes 0 ·