The Matchbox Learning Machine
Image generated by Nano Banana (Gemini)

1.4 Interactive Game: Mouse and Cheese (Reinforcement Learning)

Introduction

Reinforcement Learning is a type of AI that learns to make decisions through trial and error, similar to how an animal learns to navigate a maze to find food. The "agent" (our mouse) explores an environment and receives "rewards" for actions that bring it closer to its goal.

🏒

Activity

Mouse and Cheese: Reinforcement Learning

Scenario: The mouse must learn to find the cheese on a board. The mouse explores different strategies, focusing on actions that lead to getting cheese. When a sequence leads to success, that route is positively reinforced. Over time, it learns the most efficient strategy.

How to Explore It

  1. The agent must choose a sequence of actions to reach an objective, navigating through a space of possible states.
  2. Actions that lead to success are positively reinforced, becoming more likely in the future.
  3. Extensive training allows finding consistently effective and robust strategies.
What to watch for: Reinforcement Learning allows AI to discover the best strategies by itself without needing pre-labeled examples, only through experience and environmental feedback.

Controls and Configuration

Mouse and Cheese Game

Action Values per Square

Fundamental Theoretical Concepts

Elements of Reinforcement Learning

Basic Components
  • Agent: The mouse that makes decisions
  • Environment: The board with squares, cheese and traps
  • States: Each position (row, column) on the board
  • Actions: Possible movements (↑↓←→)
  • Rewards: Positive feedback (cheese) or negative (trap)
  • Policy: The learned strategy for choosing actions

Learning Methodology

Reinforcement Process
  1. Initial exploration: The agent takes semi-random actions based on equiprobable probabilities
  2. Experience: Each trajectory generates a state-action-reward sequence
  3. Update: Successful actions increase their selection probability
  4. Convergence: Gradually, an optimal policy emerges

Block Training: Statistical Robustness

Why Train in Independent Blocks?

Block training (10 experiments Γ— 100 games) simulates a rigorous scientific process:

  • Cross-validation: Each block is an independent experiment that should reach similar conclusions
  • Variance reduction: Multiple experiments minimize the effect of initial randomness
  • Robust convergence: Ensures learning doesn't depend on specific initial conditions
  • Knowledge aggregation: The final result combines learning from multiple "virtual agents"

Real-World Applications

This type of reinforcement learning has direct applications in:

  • Personalized medicine: Optimization of treatment protocols
  • Robotics: Autonomous navigation in complex environments
  • Finance: Adaptive trading strategies
  • Games: Development of AI that surpasses human players (AlphaGo, OpenAI Five)

Printable Activity Materials

Download PDF Version

πŸ“„ Download The Mouse and the Cheese Game (PDF)

This printable version contains all the materials you need to conduct this reinforcement learning activity offline. Perfect for workshops, classrooms, or hands-on demonstrations where participants can physically experience how AI agents learn through trial and error.

Oct 27, 2023