This article describes an example of Reinforcement Learning being used to solve scheduling problems. See the model and python files in the attached zip file.
Problem Description
This model represents a generic sheet metal processing plant. There are four machines in series. Each job requires time on all four machines. Jobs come in batches of 10. A poor sequence of jobs will cause blocking between items, lowering throughput.
If the time between batches is long, such as a shift or a day, you could use the optimizer to determine the best sequence. If the time between batches is short, however, the optimizer may not be feasible. For real sequencing problems, the time to find a good sequence can be anywhere from 5 minutes to an hour, or even longer. This makes it impractical for high-velocity situations.
The attached model requests a decision every time the first machine in the series is available. The only action is an index for the Nth available job. So the decision can be interpreted as "which job should I do next?"
Solution
The general solution is to use reinforcement learning. However, this problem required customized python scripts:
- The model uses custom parameters for observations. This allows arbitrary values for observations.
- The model uses a custom observation space. The observations include a table of the required times at each station for the remaining jobs. They also include an array of the in-progress jobs and their predicted remaining times. By using a Dict space, the python scripts can combine all the observations into a single space.
- The model uses an Action Mask. An Action Mask is a binary array with one value per value of the action. This tells the RL algorithm about invalid options.
- The python scripts require the sb3-contrib package. Use
pip install sb3-contrib
to install it.
Results
After training for 500k time-steps, the agent learns to choose jobs moderately well. If you run the inference script, you can use the experimenter to compare a random policy to a trained agent: