reward function

Question

question

Maryam H2 asked Jun 10 2024 at 5:40 PM Felix Möhlmann commented Sep 11 2024 at 5:59 AM

reward function

for a reward function in the RL tool, how i can set up the reward somehow so that it refers to a row in the Performance Measures table?

For example:

 Reward = (Quantity_i ) - (StayTime_i)

where Quantity_i shows the items in a queue for each item type i ( this has been set in the performance measure table) and StayTime_i is the stay time of item type i in a rack ( this also has been set in a performance measure table). The sign of StayTime_i is negative because i want it to penalize for the time it remains in the racks.

Software Version:

FlexSim 24.1.0

reward function

· 1

Answer 1 · 2024-06-12T06:20:23Z

Felix Möhlmann answered Jun 12 2024 at 6:20 AM Felix Möhlmann commented Sep 11 2024 at 5:59 AM

Performance measure values are flexscript nodes. You need to evaluate the node (passing in the stored reference if needed) to get the actual value.

treenode pfmValue = Model.find("/Tools/PerformanceMeasureTables/PerformanceMeasures>variables/performanceMeasures/1/Value");
Variant value = pfmValue.subnodes[1].evaluate(pfmValue.subnodes[2].value);

capture1.png (10.6 KiB)

· 5

Maryam H2 commented · Sep 05 2024 at 4:30 PM

Hi @Felix Möhlmann @Jeanette F

the code above does not return any value. I have both the target inventory levels in a parameters table and the current inventory levels in a performance measure table. If I want to penalize the variation from the target inventory levels and encourage the agent to take actions that minimize this penalty, how should I structure the reward function? Do you have an example I could reference?

I was thinking how i can define a reward function as below to start:

def reward_function(current_inventory, target_inventory):
    # Calculate the difference between current and target inventory levels
    deviation = abs(current_inventory - target_inventory)
    # Penalize the deviation (negative reward)
    reward = -deviation
    # scale the penalty by a factor 
    penalty_factor = 0.1 # Adjustable
    reward = -penalty_factor * deviation
    return reward

Also, is there a way to instruct the agent to minimize the frequency of actions (such as placing orders for item types and receiving them in queues) in order to reduce ordering costs and extend the time interval between orders as much as possible? If so, how can I do this?

0 ·

Maryam H2 Maryam H2 commented · Sep 10 2024 at 2:06 PM

hi @Felix Möhlmann any idea about my question?

0 ·

Felix Möhlmann Maryam H2 commented · Sep 10 2024 at 2:33 PM

The code from my original answer does not. I just tested it again in version 24.0.2. You might have to adjust the path to get the value of the correct PFM though. The "1" in the path is the rank of the performance measure.

The fundamental logic of your code makes sense. It's just not FlexScript. I have heard and read that clamping the reward to lie between -1 and 1 works best for many RL algorithms, so that might be worth trying.

If you want to define a function in FlexSim, have a look at user commands. Your logic in a user command (plus clamping the value to the [-1, 1] interval) would look something like this:

double current_inventory = param(1);
double target_inventory = param(2);
 
double reward = -Math.fabs(current_inventory - target_inventory);
reward *= 0.1;
reward = Math.min(1, Math.max(reward, -1));
 
return reward;

You determine when a decision is made by setting up the decision events. If the agent can influence this (for example ordering a larger quantity and a decision is only made when stock falls below a certain level) then it should learn to do so, if the reward function takes into account the amount of time since the last decision.

0 ·

Show more comments

question

reward function

1 Answer

Things to know…

question details

question

reward function

1 Answer

Things to know…

question details

Related Questions