Have a question about reinforcement learning

Question

question

mark zhen asked Nov 27, '22 Kavika F commented Jan 4, '23

Have a question about reinforcement learning

rltest-2.fsmI want to complete my reinforcement learning model based on this literature. Here is a mention about my action space

There is also a section on reward functions.

The definition of s is as follows!!

Software Version:

FlexSim 22.0.0

reinforcement learning

1669557699904.png (32.8 KiB)

1669561257325.png (39.1 KiB)

1669561298933.png (67.5 KiB)

rltest-2.fsm (280.5 KiB)

· 4

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jeanette F ♦♦ commented · Dec 05, 2022 at 08:53 PM

Hi @mark zhen ,

We haven't heard back from you. Were you able to solve your problem? If so, please add and accept an answer to let others know the solution. Or please respond to the previous comment so that we can continue to help you.

If we don't hear back in the next 3 business days, we'll assume you were able to solve your problem and we'll close this case in our tracker. You can always comment back at any time to reopen your question, or you can contact your local FlexSim distributor for phone or email help.

0 ·

Kavika F ♦ commented · Dec 19, 2022 at 06:31 PM

Could you post the link to this paper?

0 ·

mark zhen Kavika F ♦ commented · Dec 20, 2022 at 07:09 AM

https://hackmd.io/@shaoeChen/SyjI6W2zB/https%3A%2F%2Fhackmd.io%2F%40shaoeChen%2FS1UmWvfN9#Reward-Function

1 ·

Kavika F ♦ commented · Jan 04, 2023 at 06:02 PM

Hi @mark zhen , was Joerg Vogel's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 ·

Answer 1 · 2022-11-28T08:36:36Z

Joerg Vogel answered Nov 28, '22

In a Case when there is not any next operation of “a” you need to define a maximum waiting time to compute a reward for any last processed item in a model run or you must be absolutely sure, when you end a model run the current processed items are not taking part of a reward system. You must count items only if their rewards could be calculated completely.

Edit: if you don’t find any suitable approach you can set maximum waiting time (Edit II: waiting time + process time = reward) to model run time.This prevents any situation of any undefined reward and is also your lower boundary of your system. I would use this method to pre store an reward and I would update this value when the involved processor state changes in model.

Edit II: The above description normalizes your rewards against longest processing time. Then there might occur percentage values greater than 100%. If you set all pre stored rewards to a self defined lower boundary value and you update them later normalized against run time length, then you can allocate all rewards.
lowest boundary reward = process time (a) / model runtime length

you scale all experiments by a factor of run time length of current experiment divided by maximum length of all Experiments.

· 1

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

mark zhen commented · Dec 10, 2022 at 07:49 AM

Could you give me a little example?

0 ·

question