What is Connect 4?
Connect 4 is a two-player strategy game played on a vertical 6-row by 7-column grid. Players take turns dropping colored discs into columns. The disc falls to the lowest available slot in that column. The first player to form an unbroken line of four discs — horizontally, vertically, or diagonally — wins. If the board fills up with no winner, the game is a draw.
Despite its simple rules, Connect 4 has deep strategic complexity. The game was mathematically solved in 1988: the first player can always force a win with perfect play. This makes it an excellent testbed for AI techniques ranging from simple heuristics to deep reinforcement learning.
Choose Your Opponent
The game ships with six AI opponents of increasing sophistication, plus human-vs-human mode.
Random Rule-based
Picks a valid column uniformly at random. A useful baseline — any competent player should beat it consistently.
Greedy Rule-based
Wins immediately if possible, blocks the opponent's winning move otherwise, and falls back to random. Simple but surprisingly effective.
Epsilon-Greedy Rule-based
Plays the Greedy strategy most of the time but explores a random move with probability ε (default 0.1). Adds unpredictability.
DQN RL-trained
Deep Q-Network. Learns a value function over board states through experience replay and a target network. Fast single-pass inference.
PPO RL-trained
Proximal Policy Optimization. Directly learns a stochastic policy using clipped surrogate objectives and a policy pool for opponent diversity.
AlphaZero RL-trained
Self-play with Monte Carlo Tree Search guided by a neural network that predicts both policy and value. The strongest AI in this project.
How to Play
Installation
git clone https://github.com/hyungwonkim/Connect4.git
cd Connect4
pip install -e .
CLI Mode
Play directly in your terminal. On each turn, enter a column number (0–6) to drop your piece.
# Human vs Human
python main.py
# Human vs AI opponent
python main.py --player2 greedy
python main.py --player2 dqn
python main.py --player2 alphazero
# With a specific checkpoint
python main.py --player2 alphazero --checkpoint checkpoints/alphazero/best.pt
Available --player2 options:
| Flag | Opponent |
|---|---|
human | Human (default) |
random | Random player |
greedy | Greedy player |
epsilon_greedy | Epsilon-Greedy player |
dqn | DQN (requires checkpoint) |
ppo | PPO (requires checkpoint) |
alphazero | AlphaZero (requires checkpoint) |
Pygame GUI
Launch the graphical interface with animated piece drops and a start screen to select your opponent.
python main.py --gui
Web App
Run the FastAPI backend and open the browser-based frontend.
# Start the server (default: http://localhost:8000)
uvicorn server.api:app --reload
Training the RL Players
Each RL player can be trained from scratch on your own machine. Training requires
PyTorch and benefits from a GPU, but
CPU training works for smaller runs. Trained checkpoints are saved to checkpoints/.
DQN
python -m connect4.training.train_dqn
Trains a Deep Q-Network using experience replay, a target network, and epsilon-greedy exploration.
Checkpoints are saved to checkpoints/dqn/. Monitor with TensorBoard:
tensorboard --logdir runs/
PPO
python -m connect4.training.train_ppo
Trains a policy network using Proximal Policy Optimization with a policy pool for
opponent diversity. Checkpoints are saved to checkpoints/ppo/.
AlphaZero
python -m connect4.training.train_alphazero
Runs iterative self-play with MCTS to generate training data, then updates the neural
network. Each iteration evaluates the new model against the previous best and promotes
it if it wins. Checkpoints are saved to checkpoints/alphazero/.
Evaluating Agents
python -m connect4.training.evaluate
Runs a round-robin tournament between all agents (rule-based and RL) and prints a
win-rate matrix. Use --num-games to control sample size.
Project Structure
Connect4/
main.py # Entry point (CLI, GUI, web)
connect4/
board.py # Board class (6x7 grid, win detection)
game.py # Game loop and turn management
players/
base.py # Abstract BasePlayer interface
human.py # CLI human input
random_player.py # Random move selection
greedy_player.py # Win/block/random heuristic
epsilon_greedy_player.py # Greedy + exploration
rl/
networks.py # DQNNet, PPONet, AlphaZeroNet
common.py # Board encoding, masking utilities
self_play_env.py # Self-play environment wrapper
dqn/dqn_player.py # DQN inference player
ppo/ppo_player.py # PPO inference player
alphazero/
alphazero_player.py # AlphaZero inference player
mcts.py # Monte Carlo Tree Search
training/
train_dqn.py # DQN training script
train_ppo.py # PPO training script
train_alphazero.py # AlphaZero training script
evaluate.py # Agent evaluation tournament
gui/
pygame_gui.py # Pygame graphical interface
server/
api.py # FastAPI backend
frontend/
index.html # Web frontend