The Art of Learning Through Trial and Error
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by performing actions in an environment to maximize some notion of cumulative reward. Unlike supervised learning, RL doesn't require labeled input/output pairs, and unlike unsupervised learning, it focuses on maximizing rewards.
The agent learns through trial and error. At each step, it:
All goals can be described by the maximization of expected cumulative reward.
R = r₁ + γr₂ + γ²r₃ + ...
Where γ (gamma) is the discount factor (0 ≤ γ ≤ 1)
The agent must balance trying new things (exploration) with using known good actions (exploitation).
The mathematical framework for modeling RL problems with:
The Markov Property: The future depends only on the present state, not the past.
Learn value function V(s) or Q(s,a)
Directly learn policy π(a|s)
Learn model of environment
Determining which actions led to rewards in long sequences.
Rewards might be rare, making learning difficult.