Testing Reinforcement learning functionality

Question

question

Farah A asked Dec 21 2021 at 3:51 PM Farah A commented Jan 13 2022 at 10:43 AM

Testing Reinforcement learning functionality

Hello everyone,

I'm discovering the new functionality of Flexsim Reinforcement Learning and i'm still not familiar with the concept application, i'm struggling a bit to make it work.

I want to generate disturbance, when an item arrives at a turn, it is blocked for few seconds, i want from my agent to be aware of that blocking state and send the information to release the item.

How to make the agent receive the information and send the action ?

Here's the model i'm working on testRL.fsm, can you please help me ?

Thank you in advance

Software Version:

FlexSim 22.0.0

reinforcement learning

testrl.fsm (282.0 KiB)

· 1

______

Cookie preferences

Your privacy is important to us and so is an optimal experience. To help us customize information and build applications, we collect data about your use of this site.

May we collect and use your data?

Learn more about the Third Party Services we use and our Privacy Statement.

Strictly necessary – required for our site to work and to provide services to you

These cookies allow us to record your preferences or login information, respond to your requests or fulfill items in your shopping cart.

YES

Improve your experience – allows us to show you what is relevant to you

These cookies enable us to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we use to deliver information and experiences tailored to you. If you do not allow these cookies, some or all of these services may not be available for you.

YES

NO

Customize your advertising – permits us to offer targeted advertising to you

These cookies collect data about you based on your activities and interests in order to show you relevant ads and to track effectiveness. By collecting this data, the ads you see will be more tailored to your interests. If you do not allow these cookies, you will experience less targeted advertising.

YES

NO

Are you sure you want a less customized experience?

We can access your data only if you select "yes" for the categories on the previous screen. This lets us tailor our marketing so that it's more relevant for you. You can change your settings at any time by visiting our privacy statement

Your experience. Your choice.

We care about your privacy. The data we collect helps us understand how you use our products, what information you might be interested in, and what we can improve to make your engagement with Autodesk more rewarding.

May we collect and use your data to tailor your experience?

Explore the benefits of a customized experience by managing your privacy settings for this site or visit our Privacy Statement to learn more about your options.

Answer 1 · 2021-12-21T18:32:31Z

Jordan Johnson answered Dec 21 2021 at 6:32 PM Farah A commented Jan 13 2022 at 10:43 AM

Let me clear up a concept first. The agent won't be watching the model, waiting for something to happen. Instead, something will happen in the model, and the model will ask the agent what to do. This means the Agent doesn't have to detect the issue; the agent won't be good at that. The model (or in real life, sensors) can detect the issue.

In your case, it sounds like the conveyor system gets backed up for some reason, and you want the AI to decide what to do about it. So the first step is building a model that gets backed up, and then using the Reinforcement Learning tool listen to something, such as the OnBlock of a photo eye. That way, when the model detects a block, you can ask the AI what to do. In FlexSim, that means you'll set the parameters that are part of the observation space. My guess is you'll want to send the values for how many items are at various parts of the model (such as in a given lane, or in a queue) or something like that. Whatever info you send, it should be info that the AI needs to make a decision. Note that it doesn't need to know "there's a problem", because you're only asking in that situation, and reducing the number of observation space parameters is a good idea.

Anyways, the AI will then express an action. In FlexSim that means the AI will set the parameter values in your action space. You need to make sure that your model uses the values in the action space to change its behavior. To do that, you'll first set the Reinforcement Learning tool to just pick random values. That way, you can test that your model detects the issue and responds to random actions correctly. Once you've done that, you're ready to train the AI as directed in the walkthrough material:

https://docs.flexsim.com/en/22.0/ModelLogic/ReinforcementLearning/KeyConcepts/

https://docs.flexsim.com/en/22.0/ModelLogic/ReinforcementLearning/Training/

https://docs.flexsim.com/en/22.0/ModelLogic/ReinforcementLearning/UsingATrainedModel/

· 3

Farah A commented · Dec 22 2021 at 10:21 AM

Thank you for your answer, i understand bette how it works now. i'm trying to build the model and i have some questions :

1- I put the blocked items in a intermediate Queue, i increase the value of a label "count" OnEntry, i'm trying to send the value to the observation space by writing in thr RL tool : Model.parameters["count"].value==getlabelnum(Model.find("Queue2"),"count") but the value of the observation space never changed, what is wrong with my command ?

2- I put random actions on Request action and it seems like it works well, however, in the decision event i want to put an event based onsend to port strategy of the queue so that it send the item again on the conveyor but i have an error : exception: FlexScript exception: Property "value" accessed on invalid node. at VIEW:/active/ReinforcementLearning2343796112/tabcontrol/Settings/DecisionEvents/SplitPanel/RightPanel/ObjectSampler>eventfunctions/onSample c: VIEW:/active/ReinforcementLearning2343796112/tabcontrol/Settings/DecisionEvents/SplitPanel/RightPanel/ObjectSampler

Can you please have a closer look on my model below ?

Thank you very much

testRL.fsm

0 ·

testrl.fsm (283.4 KiB)

Jordan Johnson ♦♦ Farah A commented · Dec 22 2021 at 5:04 PM

I don't think I understand what you're trying to do. The process flow is set up so that items always block. Whenever an item arrives at the DP, the process flow creates a token, and that token stops the item. But previously, you described a system where items only block sometimes, rather than always.

Whether the item is stopped or not, the photo eye fires its OnBlock event, because the block time is set to zero. The process flow creates a token when this happens, and that token moves the item to the queue. This is incorrect; you should not use a Move Object activity to remove an item from a conveyor. To get an item off a conveyor, you need to send the item to an exit transfer. This is why the second item doesn't flow past the decision point.

There are a couple smaller issues as well. Here is a fix for your code in the RL tool:

// == compares the two values
Model.parameters["count"].value == getlabelnum(Model.find("Queue2"),"count")
 
// = assigns the value, which is what you want
Model.parameters["count"].value = getlabelnum(Model.find("Queue2"),"count")

Also, the queue's OnExit trigger is incrementing a label on the item, when I think it should be incrementing the label on current.

Okay, so all that being said, I made this model:

testrl_1.fsm

It creates a token at the Decision Point. That token flows through a Decide, which randomly choses whether to stop the item or not.

If the decide stops the item, then after 3 seconds (the block time on the Photo Eye), the model creates a token for the On Block of the PE. That token resumes the item, and sends the item to an output queue, based on the Destination parameter.

The RL tool listens to the On Block of the photo eye, and creates an observation. That observation gets the content of the three buffer queues. Note that you don't need to increment/decrement a label; you can just get the object statistics. The RL tool also randomly picks a value for the Destination parameter, so the stopped items go to one of the three queues.

But this is unfinished. The actions taken by the RL don't currently affect the reward in any way. It's up to you to figure out what decision you want the AI to make, and when you want the AI to make that decision.

1 ·

testrl-1.fsm (286.5 KiB)

Farah A Jordan Johnson ♦♦ commented · Jan 13 2022 at 10:43 AM

Thank you for your answer, it was helpful !

0 ·

question