What I want to do now is to make the strategy adopted by my machine pull randomly. For example, I have four options (spt lpt fifo lifo). When I finish processing, I will randomly use one of these four actions. onerandom action.fsm
What I want to do now is that I want to use reinforcement learning to complete the optimal scheduling, but I don't know how to set up my observation space and action space.