Mathematics in Reinforcement Learning: Geometric Series

Calculating goals from rewards

Brandon Walker
3 min readDec 28, 2021
qimono via pixabay

Rewards and Goals

In supervised machine learning your algorithm will receive instructive feedback. This means that you tell your model the best answer, and it updates itself to make more accurate predictions. Evaluative feedback powers reinforcement learning, this is just telling a model how well it did, but not what the perfect decisions were. In reinforcement learning we call this feedback a reward. In each time step, the decision our RL model makes receives a reward. The goal (also called objective) of a reinforcement learning system is to maximize reward over the long run.

If we are to compare decisions, we must know the expected value of the goal depending on the actions we take. We would then tell our model pick the action with the largest valued goal. The trouble with the equation above, is that if the number of times we receive reward is infinite, then the goal will be infinite. This makes the goals impossible to compare. We need to adjust this equation slightly.

Thus we now have attach a term that will decay our rewards over time. This decay factor applied to a series of values is called a geometric series. There are a few reasons to make your goal a geometric series:

--

--