Using Reinforcement Learning for Job Sequencing

This article describes an example of Reinforcement Learning being used to solve scheduling problems. See the model and python files in the attached zip file.

SchedulingRL.zip

Problem Description

This model represents a generic sheet metal processing plant. There are four machines in series. Each job requires time on all four machines. Jobs come in batches of 10. A poor sequence of jobs will cause blocking between items, lowering throughput.

If the time between batches is long, such as a shift or a day, you could use the optimizer to determine the best sequence. If the time between batches is short, however, the optimizer may not be feasible. For real sequencing problems, the time to find a good sequence can be anywhere from 5 minutes to an hour, or even longer. This makes it impractical for high-velocity situations.

The attached model requests a decision every time the first machine in the series is available. The only action is an index for the Nth available job. So the decision can be interpreted as "which job should I do next?"

Solution

The general solution is to use reinforcement learning. However, this problem required customized python scripts:

The model uses custom parameters for observations. This allows arbitrary values for observations.
The model uses a custom observation space. The observations include a table of the required times at each station for the remaining jobs. They also include an array of the in-progress jobs and their predicted remaining times. By using a Dict space, the python scripts can combine all the observations into a single space.
The model uses an Action Mask. An Action Mask is a binary array with one value per value of the action. This tells the RL algorithm about invalid options.
The python scripts require the sb3-contrib package. Use
```
pip install sb3-contrib
```
to install it.

Results

After training for 500k time-steps, the agent learns to choose jobs moderately well. If you run the inference script, you can use the experimenter to compare a random policy to a trained agent:

Cookie preferences

Your privacy is important to us and so is an optimal experience. To help us customize information and build applications, we collect data about your use of this site.

May we collect and use your data?

Learn more about the Third Party Services we use and our Privacy Statement.

Are you sure you want a less customized experience?

We can access your data only if you select "yes" for the categories on the previous screen. This lets us tailor our marketing so that it's more relevant for you. You can change your settings at any time by visiting our privacy statement

Your experience. Your choice.

We care about your privacy. The data we collect helps us understand how you use our products, what information you might be interested in, and what we can improve to make your engagement with Autodesk more rewarding.

May we collect and use your data to tailor your experience?

Explore the benefits of a customized experience by managing your privacy settings for this site or visit our Privacy Statement to learn more about your options.

article