question

mark zhen avatar image
0 Likes"
mark zhen asked mark zhen commented

Machine random action

What I want to do now is to make the strategy adopted by my machine pull randomly. For example, I have four options (spt lpt fifo lifo). When I finish processing, I will randomly use one of these four actions. onerandom action.fsm

What I want to do now is that I want to use reinforcement learning to complete the optimal scheduling, but I don't know how to set up my observation space and action space.

1692723650962.png

FlexSim 22.0.16
procesorinput
1692723650962.png (6.9 KiB)
random-action.fsm (30.7 KiB)
· 4
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Natalie White avatar image Natalie White commented ·

Hi @mark zhen,

The tutorial on our site is similar and should be helpful.

Can you clarify your goal for this model? Are you saying that you can change methods every single time the processor pulls a flow item from the queue? This would be similar to our tutorial. Our tutorial, however, pulls an item based on type, and there is a clear pattern as to which type is best based on the observation of which type was last pulled. For reinforcement learning to work well, there needs to be a learnable pattern between the observation and the best action.

If you simply want to know which of the four sequencing methods is best for your model, perhaps using the experimenter would be better.

0 Likes 0 ·
mark zhen avatar image mark zhen Natalie White commented ·

When I need to make the next item, I can use four scheduling rules to perform actions (such as spt lpt and other scheduling rules), and this action is random. I have read a document about his reward function It is determined that he will perform all four actions and learn the best results (for example, it takes less time, etc.), but what I want to do now is that the actions performed by my agent are based on the actions I gave His scheduling rules, and then learn the situation that each time step will think about which rule will be the best.

0 Likes 0 ·
mark zhen avatar image mark zhen mark zhen commented ·

I have read and understood the teaching guide, but what I want to do now is that I want my agent to learn these traditional scheduling methods to explore new possibilities, or the four scheduling methods Integrate for a better scheduling result

0 Likes 0 ·
Jeanette F avatar image Jeanette F ♦♦ commented ·

Hi @mark zhen , was Natalie White's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always comment back to reopen your question.

0 Likes 0 ·

1 Answer

Natalie White avatar image
0 Likes"
Natalie White answered mark zhen commented

This application of reinforcement learning is probably not realistic. Let me explain:

First, how will your model learn which rule is best? It needs some metric for "best," so you'd need to determine what that means (likely you'll want to minimize time or maximize throughput) and design a rewards system that will promote your objective.

Additionally, "best" is going to depend on the current state of your model. What exactly is it, in your model, that determines which rule is optimal? You need to be able to identify what that is and have your model observe it. This is your main problem. I don't know if there is a clear answer to this question, and if you aren't able to answer this question, then you can't successfully use reinforcement learning for your model.

In the tutorial, the best action to take (which type of item to pull next) is directly tied to the observation (which type of item was last pulled). Reinforcement learning requires a connection between the observation and the best action to take.

· 14
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

mark zhen avatar image mark zhen commented ·

I think I have a completion rate. For example, I hope that my order can be completed within a certain time. If so, I will give him a +1 reward. If not, I will give him a -1 reward. This is my reward function. Statute. Then if I finish all three within the time, I will compare the total completion time, and give him 1 for the smallest one.

0 Likes 0 ·
Natalie White avatar image Natalie White mark zhen commented ·
The problem is that you cannot know which "rule" optimizes your completion rate. Your completion rate is affected by these two things: the item's type, and the type of the previous item. That's the point of the example in the tutorial.

You can't know which rule is best at each time step. You CAN know which item type is best to pull, but you don't know which rule will have you pull that item. At various points in your model run, a certain rule will pull different items.

0 Likes 0 ·
mark zhen avatar image mark zhen Natalie White commented ·

But there are similar methods mentioned in the literature I read, but I want to complete it first (how should I write random actions)?

https://www.sciencedirect.com/science/article/pii/S0921889000000877

0 Likes 0 ·
Show more comments