Hello!
I am trying to complete a project on using Reinforcement Learning with FlexSim to do the following:
4 sources will be used as inputs for the 4 distinct products into the production line ("Product 1", "Product 2", “Product 3” and “Product 4”). The 4 products are then collected in Queue 1 to be output into Processor 1 which has varying setup time when switching from processing one product to another, and different processing times based on the 4 distinct product types. (same as the FlexSim Reinforcement Learning tutorial) After Processor 1 has completed the procedure on the product, the product will proceed to Queue 2.
From Queue 2, they will be directed to 4 “specialized” processors: Processor 2, Processor 3, Processor 4, Processor 5. Each processor is “specialized” at processing one item type faster than all other processors, for example, Processor 2 processes "Product 1" faster than “Product 2”, while Processor 3 processes "Product 2" faster than “Product 1”. After processing has been completed, the product will then enter Sink 1 where the process is completed.
To optimize the system's efficiency through product scheduling and routing, reinforcement learning (RL) will be implemented at two key points:
1. Processor 1 – At this stage the agent will decide which product to pull into Processor 1 from Queue 1 depending on which sequence has the shortest total elapsed time (setup time and processing time)
2. Queue 2 – RL will be employed to optimize the routing of products to Processor 2 or Processor 3 or Processor 4 or Processor 5. The goal of the RL agent here is to send the product to the “specialized” processor or the next best processor to use if the “specialized” processor is currently processing a product.
From my understanding, the scripts made available by FlexSim in the Reinforcement Learning tutorial (flexsim_env.py and flexsim_training.py) are only for training one RL agent. As such, I have 2 identical models, but one model has RL agent implementation at Processor 1 only, while one model has RL agent implementation at Queue 2 only. The scripts are able to train the agent at Processor 1 but are unable to train the agent at Queue 2. Thus, I would like to check if I have done something wrong here.
Additionally, after validating that both models with one agent implemented in each are able to work, I would like to combine them. Is this possible? flexsim_env.pyflexsim_training.py