Connect 4 - AI Game Engine

What is Connect 4?

Connect 4 is a two-player strategy game played on a vertical 6-row by 7-column grid. Players take turns dropping colored discs into columns. The disc falls to the lowest available slot in that column. The first player to form an unbroken line of four discs — horizontally, vertically, or diagonally — wins. If the board fills up with no winner, the game is a draw.

Despite its simple rules, Connect 4 has deep strategic complexity. The game was mathematically solved in 1988: the first player can always force a win with perfect play. This makes it an excellent testbed for AI techniques ranging from simple heuristics to deep reinforcement learning.

Choose Your Opponent

The game ships with six AI opponents of increasing sophistication, plus human-vs-human mode.

Random Rule-based

Picks a valid column uniformly at random. A useful baseline — any competent player should beat it consistently.

Greedy Rule-based

Wins immediately if possible, blocks the opponent's winning move otherwise, and falls back to random. Simple but surprisingly effective.

Epsilon-Greedy Rule-based

Plays the Greedy strategy most of the time but explores a random move with probability ε (default 0.1). Adds unpredictability.

DQN RL-trained

Deep Q-Network. Learns a value function over board states through experience replay and a target network. Fast single-pass inference.

PPO RL-trained

Proximal Policy Optimization. Directly learns a stochastic policy using clipped surrogate objectives and a policy pool for opponent diversity.

AlphaZero RL-trained

Self-play with Monte Carlo Tree Search guided by a neural network that predicts both policy and value. The strongest AI in this project.

How to Play

Installation

git clone https://github.com/hyungwonkim/Connect4.git
cd Connect4
pip install -e .

CLI Mode

Play directly in your terminal. On each turn, enter a column number (0–6) to drop your piece.

# Human vs Human
python main.py

# Human vs AI opponent
python main.py --player2 greedy
python main.py --player2 dqn
python main.py --player2 alphazero

# With a specific checkpoint
python main.py --player2 alphazero --checkpoint checkpoints/alphazero/best.pt

Available --player2 options:

Flag	Opponent
`human`	Human (default)
`random`	Random player
`greedy`	Greedy player
`epsilon_greedy`	Epsilon-Greedy player
`dqn`	DQN (requires checkpoint)
`ppo`	PPO (requires checkpoint)
`alphazero`	AlphaZero (requires checkpoint)

Pygame GUI

Launch the graphical interface with animated piece drops and a start screen to select your opponent.

python main.py --gui

Web App

Run the FastAPI backend and open the browser-based frontend.

# Start the server (default: http://localhost:8000)
uvicorn server.api:app --reload

Training the RL Players

Each RL player can be trained from scratch on your own machine. Training requires PyTorch and benefits from a GPU, but CPU training works for smaller runs. Trained checkpoints are saved to checkpoints/.

DQN

python -m connect4.training.train_dqn

Trains a Deep Q-Network using experience replay, a target network, and epsilon-greedy exploration. Checkpoints are saved to checkpoints/dqn/. Monitor with TensorBoard:

tensorboard --logdir runs/

PPO

python -m connect4.training.train_ppo

Trains a policy network using Proximal Policy Optimization with a policy pool for opponent diversity. Checkpoints are saved to checkpoints/ppo/.

AlphaZero

python -m connect4.training.train_alphazero

Runs iterative self-play with MCTS to generate training data, then updates the neural network. Each iteration evaluates the new model against the previous best and promotes it if it wins. Checkpoints are saved to checkpoints/alphazero/.

Evaluating Agents

python -m connect4.training.evaluate

Runs a round-robin tournament between all agents (rule-based and RL) and prints a win-rate matrix. Use --num-games to control sample size.

Project Structure

Connect4/
  main.py                          # Entry point (CLI, GUI, web)
  connect4/
    board.py                       # Board class (6x7 grid, win detection)
    game.py                        # Game loop and turn management
    players/
      base.py                      # Abstract BasePlayer interface
      human.py                     # CLI human input
      random_player.py             # Random move selection
      greedy_player.py             # Win/block/random heuristic
      epsilon_greedy_player.py     # Greedy + exploration
      rl/
        networks.py                # DQNNet, PPONet, AlphaZeroNet
        common.py                  # Board encoding, masking utilities
        self_play_env.py           # Self-play environment wrapper
        dqn/dqn_player.py          # DQN inference player
        ppo/ppo_player.py          # PPO inference player
        alphazero/
          alphazero_player.py      # AlphaZero inference player
          mcts.py                  # Monte Carlo Tree Search
    training/
      train_dqn.py                 # DQN training script
      train_ppo.py                 # PPO training script
      train_alphazero.py           # AlphaZero training script
      evaluate.py                  # Agent evaluation tournament
    gui/
      pygame_gui.py                # Pygame graphical interface
  server/
    api.py                         # FastAPI backend
  frontend/
    index.html                     # Web frontend