I want to know some details about models and reinforcement learning
For example, episode corresponds to the meaning represented by the model
Or what timestep corresponds to in the model. Because I wanted to understand why I had a spike in rewards at the beginning when I was training.