question

mark zhen avatar image
0 Likes"
mark zhen asked Jason Lightfoot commented

Throughput Reinforcement Learning


I want to learn how to increase my total throughput within a fixed period of time. My current reward function is as follows, but his learning is not very smooth

1670822008630.png

Model.find("Sink2").as(Object).inObjects.length

I would like to ask if there are any other ideas that can make my reward function learning more effective, (and I would also like to ask how I should write about the punishment mechanism) For example, if he is less than a standard I set, I will deduct points for him. If it is bigger than the standard I set, I will give him extra points

FlexSim 22.0.0
reinforcement learning
1670822008630.png (10.5 KiB)
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

1 Answer

Joerg Vogel avatar image
1 Like"
Joerg Vogel answered Jason Lightfoot commented
currently you evaluate the static value of inObjects by this code or more visually the number of input ports.

Maybe you want to get a more progressiv value like input of items, then you can try to do this by stats property over input.value

· 15
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

mark zhen avatar image mark zhen commented ·

Then if I want to write a function that is the total processing time of a single machine minus the total idle time of a single machine, how should I write it?

0 Likes 0 ·
Jason Lightfoot avatar image Jason Lightfoot ♦ mark zhen commented ·

To access those values you want to use :

<object>.stats.state().getTotalTimeAt(STATE_IDLE)

replacing STATE_IDLE with the appropriate macro.

1 Like 1 ·
mark zhen avatar image mark zhen Jason Lightfoot ♦ commented ·

1671345753064.png

I wonder if I should write like this?

0 Likes 0 ·
1671345753064.png (5.8 KiB)
Show more comments
Joerg Vogel avatar image Joerg Vogel mark zhen commented ·
@mark zhen, we were at this point already. Please ask this request as a new question, because then more users, distributors and developers can participate ! Thank you.
0 Likes 0 ·
Joerg Vogel avatar image Joerg Vogel mark zhen commented ·
Direct hint: you shouldn’t consider this, because there might occurs a situation when total idle time is larger than total processing time. I am not sure how your reward system reacts on negative values.
0 Likes 0 ·
mark zhen avatar image mark zhen commented ·

I don't know why he reported an error, can you help me deal with it?allcombos-22-0-1.fsm

0 Likes 0 ·
allcombos-22-0-1.fsm (328.8 KiB)
mark zhen avatar image mark zhen mark zhen commented ·
0 Likes 0 ·
Joerg Vogel avatar image Joerg Vogel mark zhen commented ·
Comment as new question created.
0 Likes 0 ·
mark zhen avatar image mark zhen Joerg Vogel commented ·

why mine is closed

0 Likes 0 ·
Show more comments