The Problem with Reinforcement Learning

Question

question

mark zhen asked Jan 28, '23 Jeanette F commented Feb 15, '23

The Problem with Reinforcement Learning

allcombos-22-0-1.fsmI'm currently having a few problems

The first problem is that I want to have a penalty mechanism for my current reward function, but I observe that there is no negative part in my VISUAL CODE and the reward function I set should only be an integer. I don’t understand why this is the case .

Also, I want to collect my training process into a graph. Is there a way to do it? Like the picture below (the source of the picture is from the Internet)

Software Version:

FlexSim 22.0.0

reinforcement learning python

1674891279251.png (3.5 KiB)

1674891304726.png (9.7 KiB)

1674891385651.png (220.2 KiB)

allcombos-22-0-1.fsm (335.1 KiB)

· 1

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jeanette F ♦♦ commented · Feb 15, 2023 at 05:24 PM

Hi @mark zhen , was Felix Möhlmann's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 ·

Answer 1 · 2023-01-30T08:24:43Z

Felix Möhlmann answered Jan 30, '23 Felix Möhlmann commented Feb 9, '23

By writing double in front of the variable name you are declaring a new variable within the scope of the if/else-condition (if you did this in the same scope as the original "rewardA" variable the compiler would throw an error due to a duplicate variable). So currently the code is generating a new variable, sets its value and as soon as the code leaves the if/else-condition that variable vanishes.

To access the original "rewardA", only use the name.

You might be able to grab the data directly from the python code, though as this wouldn't directly involve FlexSim it might be better asked in a different programming forum.

You could also write data to an excel file whenever a run finishes (done == 1) during the training.

(The file path would obviously be different in your case)

1675066646780.png (3.3 KiB)

1675066913317.png (17.7 KiB)

· 28

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

mark zhen commented · Jan 30, 2023 at 03:14 PM

What I want to do is, if my reward function is REWARD A is less than N, it will be punished. How can I express it with IF. Also, do I write the part about writing into EXCEL in my VISUL CODE? In this case, my reward should have a negative part, but I don’t see it in the above picture.

0 ·

Felix Möhlmann mark zhen commented · Jan 30, 2023 at 07:09 PM

That is what you are currently doing: If rewardA is less than (or equal to) 300, it is set to -100, otherwise to 1.

The code is placed in the reward function. After determining the value of the "done" variable, but before returning the reward. The code is only run if "done" is not equal to zero, so at the end of each replication. As an example i chose to write "LastEntryTime" (which you use a a performance measure) to the excel file. You can of course also sum up all rewards in a global variable over the course of the run and then write that value to Excel.

0 ·

1675105741739.png (8.7 KiB)

mark zhen Felix Möhlmann commented · Jan 31, 2023 at 05:10 PM

I want to know how your approach is implemented. Can you attach a file for my reference? And if I want to know the value of each variable, how should I read it. For example, I want to know the value of reward 1,2,3allcombos-22-0-1.fsm

0 ·

1675185014406.png (51.1 KiB)

allcombos-22-0-1.fsm (334.9 KiB)

Show more comments

question