question

mark zhen avatar image
0 Likes"
mark zhen asked Kavika F commented

Have a question about reinforcement learning

rltest-2.fsmI want to complete my reinforcement learning model based on this literature. Here is a mention about my action space1669557699904.png

There is also a section on reward functions.

1669561257325.png

The definition of s is as follows!!

1669561298933.png

FlexSim 22.0.0
reinforcement learning
1669557699904.png (32.8 KiB)
1669561257325.png (39.1 KiB)
1669561298933.png (67.5 KiB)
rltest-2.fsm (280.5 KiB)
· 4
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jeanette F avatar image Jeanette F ♦♦ commented ·

Hi @mark zhen ,

We haven't heard back from you. Were you able to solve your problem? If so, please add and accept an answer to let others know the solution. Or please respond to the previous comment so that we can continue to help you.

If we don't hear back in the next 3 business days, we'll assume you were able to solve your problem and we'll close this case in our tracker. You can always comment back at any time to reopen your question, or you can contact your local FlexSim distributor for phone or email help.

0 Likes 0 ·
Kavika F avatar image Kavika F ♦ commented ·
Could you post the link to this paper?
0 Likes 0 ·
mark zhen avatar image mark zhen Kavika F ♦ commented ·


https://hackmd.io/@shaoeChen/SyjI6W2zB/https%3A%2F%2Fhackmd.io%2F%40shaoeChen%2FS1UmWvfN9#Reward-Function

1 Like 1 ·
Kavika F avatar image Kavika F ♦ commented ·

Hi @mark zhen , was Joerg Vogel's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 Likes 0 ·

1 Answer

Joerg Vogel avatar image
0 Likes"
Joerg Vogel answered

In a Case when there is not any next operation of “a” you need to define a maximum waiting time to compute a reward for any last processed item in a model run or you must be absolutely sure, when you end a model run the current processed items are not taking part of a reward system. You must count items only if their rewards could be calculated completely.

Edit: if you don’t find any suitable approach you can set maximum waiting time (Edit II: waiting time + process time = reward) to model run time.This prevents any situation of any undefined reward and is also your lower boundary of your system. I would use this method to pre store an reward and I would update this value when the involved processor state changes in model.

Edit II: The above description normalizes your rewards against longest processing time. Then there might occur percentage values greater than 100%. If you set all pre stored rewards to a self defined lower boundary value and you update them later normalized against run time length, then you can allocate all rewards.
lowest boundary reward = process time (a) / model runtime length

you scale all experiments by a factor of run time length of current experiment divided by maximum length of all Experiments.

· 1
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

mark zhen avatar image mark zhen commented ·

Could you give me a little example?

0 Likes 0 ·