I want to learn how to increase my total throughput within a fixed period of time. My current reward function is as follows, but his learning is not very smooth
Model.find("Sink2").as(Object).inObjects.length
I would like to ask if there are any other ideas that can make my reward function learning more effective, (and I would also like to ask how I should write about the punishment mechanism) For example, if he is less than a standard I set, I will deduct points for him. If it is bigger than the standard I set, I will give him extra points