question

mark zhen avatar image
0 Likes"
mark zhen asked Jason Lightfoot edited

Rewriting Problems for Reinforcement Learning

What I want to accomplish in the model now is to find the final scheduling order within a fixed time, but regarding the observation space, I don’t understand why there will always be problems, and the example of SET UP TIME mentioned in the case, I want to understand if I want to use PROCESS TIME for the most similar thing is it ok?0312test.fsm

FlexSim 22.0.0
reinforcement learning
0312test.fsm (336.7 KiB)
· 10
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jacob W2 avatar image Jacob W2 ♦ commented ·

@mark zhen,

currently it looks like your observation space is throwing an error because of your code in line 15.

int stationNum = mostRecentFinishProcessor.name.slice(-1).toNum();

This is always returning 0 because none of your processors actually have a number at the end of their name. The slice function is pulling the letter "e" or "g".


Since I am not sure what case you are referring to, it is hard to help you regarding the rest of your question. I do however recommend looking up other forums on AI training specifically those dealing with openAI gym, which is what FlexSim uses, because users on those forums will generally have more experience with AI training than users on this forum.

0 Likes 0 ·
Kavika F avatar image Kavika F ♦ commented ·

Hey @mark zhen, you're getting an error in your "On Observation" because of your stationNum code.

int stationNum = mostRecentFinishProcessor.name.slice(-1).toNum();
Model.parameters["LastItemType"+stationNum].value = getvarnum(mostRecentFinishProcessor, "f_lastlabelval");

This would be valid if your stations had names that ended in numbers (i.e., Processor1, Processor2, etc.). However, you changed the names of your processors to be more accurate to their job (like "cut to length line"), which I think is good design, but if you'd like to grab a number from them you can instead set a label on each processor and pull that to get the "stationNum".

0 Likes 0 ·
mark zhen avatar image mark zhen Kavika F ♦ commented ·

1678779272067.png

I can't figure out how to accomplish this?

0 Likes 0 ·
1678779272067.png (286.6 KiB)
mark zhen avatar image mark zhen mark zhen commented ·

@Jacob W2 @Kavika F @Felix Möhlmann @Jason Lightfoot

i have a question now

My model does not need set up time, so do I need to rewrite my observation space?

Because I noticed that the sentence treenode curvar = assertvariable(current, "f_lastlabelval"); did not appear in the processing time'

0 Likes 0 ·
Show more comments
Show more comments
Jeanette F avatar image Jeanette F ♦♦ commented ·

Hi @mark zhen , was Felix Möhlmann's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 Likes 0 ·

1 Answer

Felix Möhlmann avatar image
1 Like"
Felix Möhlmann answered Jason Lightfoot edited

If you are going to remove the type dependent setup time the "LastItemType" is probably not going to be a useful observation anymore.

I've implemented alternative observations in the attached model (and fixed up the model, properly integrating the fourth processor for example). The observations provide information about which processor is currently querying and how many items of each type/step combination are available in the queue. They don't show the actual quantity but instead a very rough estimate of none/few/many (relative to other types) to keep the amount of different states low so the RL algorithm can hopefully learn faster.

I also changed the reward. It now gives out a value between 0 and 1 that reflects the utilization across all processors since the last query.

Finally, I deleted most of the previous code/triggers/variables that are not used in the model to hopefully make it easier to understand how the model now works.

RL_productionorder_fm.fsm


· 21
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

mark zhen avatar image mark zhen commented ·

I want to know about the settings you made in the observation space! And the logic of the code

0 Likes 0 ·
mark zhen avatar image mark zhen mark zhen commented ·

In addition, there are three problems that must be defined about reinforcement learning

s state (observation space)

a action

r reward (reward)

Can you explain, by the way, how you define these three questions?

0 Likes 0 ·
Felix Möhlmann avatar image Felix Möhlmann mark zhen commented ·

The model keeps track of how many items of each type/step combination by incrementing/decrementing entries in a global variable array in the OnEntry/OnExit trigger of the queue.

The "OnObservation" code then uses a simple heuristic to translate these numbers into one of three possible values. If there are no items of the respective combination the parameter is set to 0. Otherwise the code compares the ratio of items to the ratio you would have if each combination had the same amount. With sixteen possible values (currently only eight are used due to how the steps are defined) you would on average expect 6.67% of items to be of a given type/step combination. If the actual ratio is lower than that, the parameter is set to 1, otherwise to 2.

This is only a rough example and could probably be improved (for example not count "inactive" combinations when determining the expected ratio).

To get an idea of what observations I want to implement, I just ask myself: What information would I, as a person, want in order to make an informed decision about what item to process next?

Similarly for the reward function: How would I measure how well I am doing at any given moment? This could for example also be the achievend throughput over the last X minutes.

0 Likes 0 ·
Show more comments