The Matchbox Learning Machine — *Image generated by Nano Banana (Gemini)*

1.4 Interactive Game: Mouse and Cheese (Reinforcement Learning)

Introduction

Reinforcement Learning is a type of AI that learns to make decisions through trial and error, similar to how an animal learns to navigate a maze to find food. The "agent" (our mouse) explores an environment and receives "rewards" for actions that bring it closer to its goal.

🏢

Activity

Mouse and Cheese: Reinforcement Learning

Scenario: The mouse must learn to find the cheese on a board. The mouse explores different strategies, focusing on actions that lead to getting cheese. When a sequence leads to success, that route is positively reinforced. Over time, it learns the most efficient strategy.

How to Explore It

The agent must choose a sequence of actions to reach an objective, navigating through a space of possible states.
Actions that lead to success are positively reinforced, becoming more likely in the future.
Extensive training allows finding consistently effective and robust strategies.

What to watch for: Reinforcement Learning allows AI to discover the best strategies by itself without needing pre-labeled examples, only through experience and environmental feedback.

Controls and Configuration

Board rows:

Board columns:

Number of games:

Mouse and Cheese Game

Action Values per Square

Fundamental Theoretical Concepts

Elements of Reinforcement Learning

Basic Components

Agent: The mouse that makes decisions
Environment: The board with squares, cheese and traps
States: Each position (row, column) on the board
Actions: Possible movements (↑↓←→)
Rewards: Positive feedback (cheese) or negative (trap)
Policy: The learned strategy for choosing actions

Learning Methodology

Reinforcement Process

Initial exploration: The agent takes semi-random actions based on equiprobable probabilities
Experience: Each trajectory generates a state-action-reward sequence
Update: Successful actions increase their selection probability
Convergence: Gradually, an optimal policy emerges

Block Training: Statistical Robustness

Why Train in Independent Blocks?

Block training (10 experiments × 100 games) simulates a rigorous scientific process:

Cross-validation: Each block is an independent experiment that should reach similar conclusions
Variance reduction: Multiple experiments minimize the effect of initial randomness
Robust convergence: Ensures learning doesn't depend on specific initial conditions
Knowledge aggregation: The final result combines learning from multiple "virtual agents"

Real-World Applications

This type of reinforcement learning has direct applications in:

Personalized medicine: Optimization of treatment protocols
Robotics: Autonomous navigation in complex environments
Finance: Adaptive trading strategies
Games: Development of AI that surpasses human players (AlphaGo, OpenAI Five)

Printable Activity Materials

Download PDF Version

📄 Download The Mouse and the Cheese Game (PDF)

This printable version contains all the materials you need to conduct this reinforcement learning activity offline. Perfect for workshops, classrooms, or hands-on demonstrations where participants can physically experience how AI agents learn through trial and error.