baraam avatar image
baraam asked baraam commented

custom observation space for RL

I'm trying to figure out how to use RL with Flexsim. I'm able to run the changeover example with no problem. I'm working with a scheduling problem so my observations cannot fit in a simple parameters table. I just need to have an example of "custom observation space" to have an idea how I can do that to define my observation space.

Thank you,

FlexSim 22.0.10
reinforcement learning
· 2
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jeanette F avatar image Jeanette F ♦ commented ·

Hi @baraam, was one of Jordan Johnson's or Joerg Vogel's answers helpful? If so, please click the "Accept" button at the bottom of the one that best answers your question. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 Likes 0 ·
Joerg Vogel avatar image Joerg Vogel commented ·

@BaraaM , I don’t know how long you observe this forum. The development of Reinforced Learning [RL] started about 2 years ago in the scope of FlexSim on a level where users could bind FlexSim easily to external sources.

I attach a setup of the search tool url to find filtered threads of this matter.

There are about 70 threads of RL, if you search just for a keyword like “RL”. You will notice that there are some developers involved in discussions.

Edit: perhaps you look also into suggestions of section “RELATED QUESTIONS” on this site.

-1 Like -1 ·
Jordan Johnson avatar image
Jordan Johnson answered baraam commented

You'll need to do a few things.

First, in FlexSim, you'll probably want to use a Custom parameter. In FlexScript, you can then set that parameter to some value such as:

Model.parameters.MyCustomParameter = [[1, 2], [3, 4], [5, 6]]

Next, set the RL tool to use a custom parameter space. Set the space string to whatever you want, including nothing. Use the Get Observation trigger to send whatever string you want, but I strongly recommend you use JSON. For example:

Object current = ownerobject(c);

Map observation;
// Assuming all your observations are in a table called "Observations"
Array names = Model.parameters.names("Observations");
for (int i = 1; i <= names.length; i++) {
    observation[names[i]] = Model.parameters[names[i]].value;

return JSON.stringify(observation);

Finally, you'll need to update You'll need to modify the _get_observation_space() function and the _convert_to_observation() functions. The first function must return a valid space for whatever RL tool you are using. The second function must parse the text FlexSim sent back for the observation (which is easy if you use JSON) and the set the values in the observation space to whatever the model reports.

· 3
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

baraam avatar image baraam commented ·

Thank you for your helpful answer, and sorry for my late comment (I was waiting to activate my licence).

I'm not sure I understood the use of MyCustomParameter since then you say :

  1. // Assuming all your observations are in a table called "Observations"

So I have to put my observations in a table or in my custom parameter or both of them?
knowing that my state representation is a (J × 6) matrix, containing, for every row (i.e. every job) 6 attributes.

I hope my question is clear!

0 Likes 0 ·
Jordan Johnson avatar image Jordan Johnson ♦♦ baraam commented ·

Since you are doing a custom observation, you can keep the values anywhere you want. You can use Parameters if you want, or you can use a global table. It's up to you. The code I wrote as an example shows you how to get all the of the values from a parameter table called "Observations". But if you are keeping your observations somewhere else, you can modify the code to return whatever you want.

1 Like 1 ·
baraam avatar image baraam Jordan Johnson ♦♦ commented ·


So I was able to get all the needed information that I need to send to my RL model.


But when I use Custom Observation I'm unable to get these information in my "python" model. In my model I get the name of my observation space (which I put in space definition) but without the parameters.

Like That:
you can see that my "SpaceBytes" has only the name of the observation space, so I have error cause I don't have parameters ...

How I can do to solve that please ?

Thank you


0 Likes 0 ·
Joerg Vogel avatar image
Joerg Vogel answered Joerg Vogel commented

You communicate with an AI by a reward value that correlates to a current set of input parameters in a FlexSim model. An Observation Space gets relevant if a set of parameters has got a strong impact on a reward value. This means that variations of parameters must change the efficiency of a model and thus the reward value. If you want to find an optimized schedule you will iterate different schedule plan sets by a variation of parameters. The reward value should favor more a schedule set than the efficiency of your model parameters. But the model efficiency must still have a relevance in your reward. You must evaluate when you update rewards and you must estimate the relation of model efficiency parameters and schedule sets.

· 1
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Joerg Vogel avatar image Joerg Vogel commented ·

A schedule can be a time table of process times and breaks or a sequence of product input.

A sequence of products results in a table of products order.

Set 1

  • 1st product = Type 1
  • 2nd product = Type 2
  • 3rd product = Type 1
  • ..
  • 10345th product = Type 2


Set 2

  • 1st product = Type 2
  • 2nd product = Type 2
  • 3rd product = Type 1


Set 3

  • 1st product = Type 2
  • 2nd product = Type 2
  • 3rd product = Type 2


Set 1057


Set 13927


a reward should not be reported for every entering item in a sink rather than in interval of more entering items or fixed time intervals. The reward must have a correlation to the input sequence table. If your model creates a delay between input and output then the reward must be a result of the input you want to observe.

You can for example define an observation interval between two milestone products being processed in a model in a sequence of input products.
You look for two products of low and high priority. A reward value gets returned, when both have entered a sink. The input order between them and some products previous to first and some behind last product are defining your set of input variables additionally to parameters of processing times and queue capacities. I think, it won’t matter if observation intervals overlap for pairs of milestone products.

0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.