question

mark zhen avatar image
0 Likes"
mark zhen asked Jeanette F commented

The Problem with Reinforcement Learning

allcombos-22-0-1.fsmI'm currently having a few problems

The first problem is that I want to have a penalty mechanism for my current reward function, but I observe that there is no negative part in my VISUAL CODE and the reward function I set should only be an integer. I don’t understand why this is the case .

1674891279251.png

1674891304726.png


Also, I want to collect my training process into a graph. Is there a way to do it? Like the picture below (the source of the picture is from the Internet)

1674891385651.png


FlexSim 22.0.0
reinforcement learningpython
1674891279251.png (3.5 KiB)
1674891304726.png (9.7 KiB)
1674891385651.png (220.2 KiB)
allcombos-22-0-1.fsm (335.1 KiB)
· 1
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jeanette F avatar image Jeanette F ♦♦ commented ·

Hi @mark zhen , was Felix Möhlmann's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 Likes 0 ·

1 Answer

·
Felix Möhlmann avatar image
1 Like"
Felix Möhlmann answered Felix Möhlmann commented

By writing double in front of the variable name you are declaring a new variable within the scope of the if/else-condition (if you did this in the same scope as the original "rewardA" variable the compiler would throw an error due to a duplicate variable). So currently the code is generating a new variable, sets its value and as soon as the code leaves the if/else-condition that variable vanishes.

To access the original "rewardA", only use the name.

1675066646780.png


You might be able to grab the data directly from the python code, though as this wouldn't directly involve FlexSim it might be better asked in a different programming forum.

You could also write data to an excel file whenever a run finishes (done == 1) during the training.

1675066913317.png

(The file path would obviously be different in your case)


1675066646780.png (3.3 KiB)
1675066913317.png (17.7 KiB)
· 28
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

mark zhen avatar image mark zhen commented ·

What I want to do is, if my reward function is REWARD A is less than N, it will be punished. How can I express it with IF. Also, do I write the part about writing into EXCEL in my VISUL CODE? In this case, my reward should have a negative part, but I don’t see it in the above picture.


0 Likes 0 ·
Felix Möhlmann avatar image Felix Möhlmann mark zhen commented ·

That is what you are currently doing: If rewardA is less than (or equal to) 300, it is set to -100, otherwise to 1.

The code is placed in the reward function. After determining the value of the "done" variable, but before returning the reward. The code is only run if "done" is not equal to zero, so at the end of each replication. As an example i chose to write "LastEntryTime" (which you use a a performance measure) to the excel file. You can of course also sum up all rewards in a global variable over the course of the run and then write that value to Excel.

1675105741739.png

0 Likes 0 ·
1675105741739.png (8.7 KiB)
mark zhen avatar image mark zhen Felix Möhlmann commented ·

1675185014406.png

I want to know how your approach is implemented. Can you attach a file for my reference? And if I want to know the value of each variable, how should I read it. For example, I want to know the value of reward 1,2,3allcombos-22-0-1.fsm

0 Likes 0 ·
1675185014406.png (51.1 KiB)
allcombos-22-0-1.fsm (334.9 KiB)
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.