Hi! I need to apply reinforcement learning to the robot at the top of the figure. It has to pick up the boxes from the right queue and decide on which combiner to leave it, based on the number of boxes present on the pallet placed on the bottom combiner. As observation I am going to set the number of boxes on the pallet and as action the number of combiners to be chosen. How can I proceed?