6.2 Interactive Tutorial: Backpropagation Step by Step

Introduction

Backpropagation is the fundamental algorithm that allows neural networks to learn. It's how errors at the output propagate backward through the network, telling each weight exactly how much it contributed to the mistake and how to adjust.

๐Ÿข

Activity

Backpropagation Tutorial

Backpropagation combines the chain rule from calculus with efficient computation to train neural networks. This tutorial breaks it down into manageable steps.

How to Explore It

  1. ๐Ÿ”ข You'll see a simple network with random initial weights.
  2. ๐Ÿ“ฅ An input-output training example is presented.
  3. โžก๏ธ You compute the forward pass step by step.
  4. ๐Ÿ“Š You calculate the error at the output.
  5. โฌ…๏ธ You propagate gradients backward using the chain rule.
  6. ๐Ÿ”„ You update the weights using gradient descent.
  7. ๐ŸŽฏ See how the network improves with each pass!
What to watch for: Each training example flows through the network (forward pass), produces an error, and that error flows backward (backward pass) to update every weight. You will perform these calculations yourself.

Interactive Demonstration

Backpropagation Step-by-Step Trainer

Difficulty: Epoch 1
1 Forward
โ†’
2 Error
โ†’
3 Backward
โ†’
4 Update
Neural Network
Current Weights
Example
xโ‚ -
yโ‚ -
ฮท = 0.5

Step 1

๐Ÿ“ Formula:
Calculation Notebook
Only correct computations are recorded here, organized by phase.

Quick guide: We log each correct expression and result.

  • $z_j^{(k)} = \sum_i a_i^{(k-1)} w_{ij}^{(k)}$
  • $a_j^{(k)} = \sigma(z_j^{(k)})$
  • $\sigma(z) = \frac{1}{1 + e^{-z}}$

Core Concepts

The Two Phases of Backpropagation

Training a neural network involves two alternating phases:

  1. Forward Pass: Input flows through the network, layer by layer, producing an output
  2. Backward Pass: The error at the output flows backward, computing gradients for each weight
  3. Weight Update: Each weight is adjusted in the direction that reduces the error

This process repeats for many training examples until the network learns the desired behavior.

The Chain Rule is the Key

The magic of backpropagation is the chain rule from calculus:

$$\frac{\partial E}{\partial w} = \frac{\partial E}{\partial o} \cdot \frac{\partial o}{\partial net} \cdot \frac{\partial net}{\partial w}$$

This tells us: "How does the error change when we change this weight?" by breaking it into simpler steps.

  • $\frac{\partial E}{\partial o}$: How error changes with output
  • $\frac{\partial o}{\partial net}$: How output changes with weighted sum (activation derivative)
  • $\frac{\partial net}{\partial w}$: How weighted sum changes with weight (just the input!)
The Activation Function

In this tutorial, we use the sigmoid activation function:

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

Its derivative has a beautiful property:

$$\sigma'(x) = \sigma(x) \cdot (1 - \sigma(x))$$

This means if you know the output, you can easily compute the derivative!

Learning Rate

The learning rate ($\eta$) controls how big each weight update is:

$$w_{new} = w_{old} - \eta \cdot \frac{\partial E}{\partial w}$$
  • Too large: The network may overshoot and never converge
  • Too small: Learning is very slow
  • Just right: Smooth convergence to good solutions

In this tutorial, we use $\eta = 0.5$ for clear, visible updates.

Jan 23, 2024