custom observation space for RL

Question

question

baraam asked Jan 26, '23 Jordan Johnson commented Apr 19, '23

custom observation space for RL

Hello,
I'm trying to figure out how to use RL with Flexsim. I'm able to run the changeover example with no problem. I'm working with a scheduling problem so my observations cannot fit in a simple parameters table. I just need to have an example of "custom observation space" to have an idea how I can do that to define my observation space.

Thank you,

Software Version:

FlexSim 22.0.10

reinforcement learning

· 2

Jeanette F ♦♦ commented · Feb 01, 2023 at 05:26 PM

Hi @baraam, was one of Jordan Johnson's or Joerg Vogel's answers helpful? If so, please click the "Accept" button at the bottom of the one that best answers your question. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 ·

Joerg Vogel commented · Jan 27, 2023 at 06:42 AM

@BaraaM , I don’t know how long you observe this forum. The development of Reinforced Learning [RL] started about 2 years ago in the scope of FlexSim on a level where users could bind FlexSim easily to external sources.

I attach a setup of the search tool url to find filtered threads of this matter.

https://answers.flexsim.com/search.html?c=7&includeChildren=false&f=author%3A%22phil.bobo%22&type=question+OR+idea+OR+kbentry+OR+answer+OR+topic+OR+user&redirect=search%2Fsearch&sort=relevance&q=Rl

There are about 70 threads of RL, if you search just for a keyword like “RL”. You will notice that there are some developers involved in discussions.

Edit: perhaps you look also into suggestions of section “RELATED QUESTIONS” on this site.

-2 ·

Answer 1 · 2023-01-27T16:48:17Z

Jordan Johnson answered Jan 27, '23 Jordan Johnson commented Apr 19, '23

You'll need to do a few things.

First, in FlexSim, you'll probably want to use a Custom parameter. In FlexScript, you can then set that parameter to some value such as:

Model.parameters.MyCustomParameter = [[1, 2], [3, 4], [5, 6]]

Next, set the RL tool to use a custom parameter space. Set the space string to whatever you want, including nothing. Use the Get Observation trigger to send whatever string you want, but I strongly recommend you use JSON. For example:

Object current = ownerobject(c);

Map observation;
// Assuming all your observations are in a table called "Observations"
Array names = Model.parameters.names("Observations");
for (int i = 1; i <= names.length; i++) {
    observation[names[i]] = Model.parameters[names[i]].value;
}

return JSON.stringify(observation);

Finally, you'll need to update flexsim_env.py. You'll need to modify the _get_observation_space() function and the _convert_to_observation() functions. The first function must return a valid space for whatever RL tool you are using. The second function must parse the text FlexSim sent back for the observation (which is easy if you use JSON) and the set the values in the observation space to whatever the model reports.

· 4

baraam commented · Feb 28, 2023 at 03:55 PM

Thank you for your helpful answer, and sorry for my late comment (I was waiting to activate my licence).

I'm not sure I understood the use of MyCustomParameter since then you say :

// Assuming all your observations are in a table called "Observations"

So I have to put my observations in a table or in my custom parameter or both of them?
knowing that my state representation is a (J × 6) matrix, containing, for every row (i.e. every job) 6 attributes.

I hope my question is clear!

0 ·

Jordan Johnson ♦♦ baraam commented · Feb 28, 2023 at 08:45 PM

Since you are doing a custom observation, you can keep the values anywhere you want. You can use Parameters if you want, or you can use a global table. It's up to you. The code I wrote as an example shows you how to get all the of the values from a parameter table called "Observations". But if you are keeping your observations somewhere else, you can modify the code to return whatever you want.

1 ·

baraam Jordan Johnson ♦♦ commented · Mar 10, 2023 at 09:02 AM

Hello,

So I was able to get all the needed information that I need to send to my RL model.

But when I use Custom Observation I'm unable to get these information in my "python" model. In my model I get the name of my observation space (which I put in space definition) but without the parameters.

Like That:

you can see that my "SpaceBytes" has only the name of the observation space, so I have error cause I don't have parameters ...

How I can do to solve that please ?

Thank you

Baraa

0 ·

capture-decran-2023-03-10-095027.png (104.1 KiB)

capture-decran-2023-03-10-095705.png (72.2 KiB)

Show more comments

Answer 2 · 2023-01-27T07:17:29Z

Joerg Vogel answered Jan 27, '23 Joerg Vogel commented Jan 27, '23

You communicate with an AI by a reward value that correlates to a current set of input parameters in a FlexSim model. An Observation Space gets relevant if a set of parameters has got a strong impact on a reward value. This means that variations of parameters must change the efficiency of a model and thus the reward value. If you want to find an optimized schedule you will iterate different schedule plan sets by a variation of parameters. The reward value should favor more a schedule set than the efficiency of your model parameters. But the model efficiency must still have a relevance in your reward. You must evaluate when you update rewards and you must estimate the relation of model efficiency parameters and schedule sets.

· 1

Joerg Vogel commented · Jan 27, 2023 at 08:05 AM

A schedule can be a time table of process times and breaks or a sequence of product input.

A sequence of products results in a table of products order.

Set 1

1st product = Type 1
2nd product = Type 2
3rd product = Type 1
..
10345th product = Type 2

..

Set 2

1st product = Type 2
2nd product = Type 2
3rd product = Type 1

..

Set 3

1st product = Type 2
2nd product = Type 2
3rd product = Type 2

..

Set 1057

..

Set 13927

..

a reward should not be reported for every entering item in a sink rather than in interval of more entering items or fixed time intervals. The reward must have a correlation to the input sequence table. If your model creates a delay between input and output then the reward must be a result of the input you want to observe.

You can for example define an observation interval between two milestone products being processed in a model in a sequence of input products.
You look for two products of low and high priority. A reward value gets returned, when both have entered a sink. The input order between them and some products previous to first and some behind last product are defining your set of input variables additionally to parameters of processing times and queue capacities. I think, it won’t matter if observation intervals overlap for pairs of milestone products.

0 ·

question