Machine random action

Question

question

mark zhen asked Aug 22, '23 mark zhen commented Aug 30, '23

Machine random action

What I want to do now is to make the strategy adopted by my machine pull randomly. For example, I have four options (spt lpt fifo lifo). When I finish processing, I will randomly use one of these four actions. onerandom action.fsm

What I want to do now is that I want to use reinforcement learning to complete the optimal scheduling, but I don't know how to set up my observation space and action space.

Software Version:

FlexSim 22.0.16

procesor input

1692723650962.png (6.9 KiB)

random-action.fsm (30.7 KiB)

· 4

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Natalie White commented · Aug 23, 2023 at 03:09 PM

Hi @mark zhen,

The tutorial on our site is similar and should be helpful.

Can you clarify your goal for this model? Are you saying that you can change methods every single time the processor pulls a flow item from the queue? This would be similar to our tutorial. Our tutorial, however, pulls an item based on type, and there is a clear pattern as to which type is best based on the observation of which type was last pulled. For reinforcement learning to work well, there needs to be a learnable pattern between the observation and the best action.

If you simply want to know which of the four sequencing methods is best for your model, perhaps using the experimenter would be better.

0 ·

mark zhen Natalie White commented · Aug 23, 2023 at 04:41 PM

When I need to make the next item, I can use four scheduling rules to perform actions (such as spt lpt and other scheduling rules), and this action is random. I have read a document about his reward function It is determined that he will perform all four actions and learn the best results (for example, it takes less time, etc.), but what I want to do now is that the actions performed by my agent are based on the actions I gave His scheduling rules, and then learn the situation that each time step will think about which rule will be the best.

0 ·

mark zhen mark zhen commented · Aug 23, 2023 at 04:43 PM

I have read and understood the teaching guide, but what I want to do now is that I want my agent to learn these traditional scheduling methods to explore new possibilities, or the four scheduling methods Integrate for a better scheduling result

0 ·

Jeanette F ♦♦ commented · Aug 30, 2023 at 02:23 PM

Hi @mark zhen , was Natalie White's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always comment back to reopen your question.

0 ·

Answer 1 · 2023-08-23T18:07:23Z

Natalie White answered Aug 23, '23 mark zhen commented Aug 30, '23

This application of reinforcement learning is probably not realistic. Let me explain:

First, how will your model learn which rule is best? It needs some metric for "best," so you'd need to determine what that means (likely you'll want to minimize time or maximize throughput) and design a rewards system that will promote your objective.

Additionally, "best" is going to depend on the current state of your model. What exactly is it, in your model, that determines which rule is optimal? You need to be able to identify what that is and have your model observe it. This is your main problem. I don't know if there is a clear answer to this question, and if you aren't able to answer this question, then you can't successfully use reinforcement learning for your model.

In the tutorial, the best action to take (which type of item to pull next) is directly tied to the observation (which type of item was last pulled). Reinforcement learning requires a connection between the observation and the best action to take.

· 14

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

mark zhen commented · Aug 23, 2023 at 06:28 PM

I think I have a completion rate. For example, I hope that my order can be completed within a certain time. If so, I will give him a +1 reward. If not, I will give him a -1 reward. This is my reward function. Statute. Then if I finish all three within the time, I will compare the total completion time, and give him 1 for the smallest one.

0 ·

Natalie White mark zhen commented · Aug 23, 2023 at 07:40 PM

The problem is that you cannot know which "rule" optimizes your completion rate. Your completion rate is affected by these two things: the item's type, and the type of the previous item. That's the point of the example in the tutorial.

You can't know which rule is best at each time step. You CAN know which item type is best to pull, but you don't know which rule will have you pull that item. At various points in your model run, a certain rule will pull different items.

0 ·

mark zhen Natalie White commented · Aug 24, 2023 at 06:30 AM

But there are similar methods mentioned in the literature I read, but I want to complete it first (how should I write random actions)?

https://www.sciencedirect.com/science/article/pii/S0921889000000877

0 ·

Show more comments

question