question

janders1 avatar image
0 Likes"
janders1 asked janders1 answered

RL not enough values to unpack (expected 5, got 4)

While doing the RL tutorial here, I am getting the following error. I am guessing the step function should be returning "observation, reward, done, info", but it is trying to unpack an extra arg called "truncated".


Waiting for input to close FlexSim...& "C:/Program Files/Python311/python.exe" "//fs-caedm.et.byu.edu/homes/.caedm/My Documents/FlexSim 2024 Projects/RL Tutorial/flexsim_training.py"
Traceback (most recent call last):
  File "\\fs-caedm.et.byu.edu\homes\.caedm\My Documents\FlexSim 2024 Projects\RL Tutorial\flexsim_training.py", line 60, in <module>        
    main()
  File "\\fs-caedm.et.byu.edu\homes\.caedm\My Documents\FlexSim 2024 Projects\RL Tutorial\flexsim_training.py", line 29, in main
    model.learn(total_timesteps=10)
  File "C:\Program Files\Python311\Lib\site-packages\stable_baselines3\ppo\ppo.py", line 315, in learn
    return super().learn(
           ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 277, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 194, in collect_rollouts        
    new_obs, rewards, dones, infos = env.step(clipped_actions)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 206, in step
    return self.step_wait()
           ^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 58, in step_wait
    obs, self.buf_rews[env_idx], terminated, truncated, self.buf_infos[env_idx] = self.envs[env_idx].step(
                                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\stable_baselines3\common\monitor.py", line 94, in step
    observation, reward, terminated, truncated, info = self.env.step(action)
FlexSim 24.1.0
reinforcement training
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

1 Answer

janders1 avatar image
1 Like"
janders1 answered

Fixed it. In flexsim_env.py, last line of step function does not have "truncated". Add "truncated" before "info", and it works.

5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.