Building a World Cup Monte Carlo Simulation in Python

Today is the start of the 2026 FIFA World Cup, the largest sporting competition every four years. As a fun project, I decided to build a model to predict the tournament. With sports, you never really predict the winner of anything because of real-world chaos and other factors, but you can use data to simulate what happens if teams maintain their current form.

In cases like this, traditional machine learning models typically fail because the data doesn’t properly update the model in real time. So, I built a different kind of predictive engine. This article goes through the decisions I made while building this and exactly what I did to simulate the 2026 World Cup. To check out the full code for this, you can find it on my GitHub: github.com/AAsteriskz7/WC26-MonteCarlo

The system has four core layers: Ingestion, Feature Engineering, Simulation, and Presentation. I will go through each layer, the code, and the architecture behind it.

Why Monte Carlo?

For the core engine, I used a Monte Carlo Simulation. This is a common mathematical technique that uses repeated random sampling to map out unpredictable systems. So instead of trying to predict a single winner, the engine runs 10000 games every day and uses probability distributions to predict the outcomes of every match. By compiling the results of those 10000 matches, we get true probabilities for each team with things like "Argentina has a 14.2% chance of winning, but a 4.1% chance of crashing out in the group stage."

Before we look at the math, we need to understand why standard ML methods fail for the World Cup. The table below outlines each method, how it works, and why it fails for our scenario:

Predictive Approach	How It Works	Why It Fails for a Live World Cup
Pure Machine Learning (XGBoost / Neural Networks)	Trains on historical match features to predict a direct winner.	Overfits on small datasets. Fails to understand bracket structures.
Static Elo / Regression	Calculates a single team rating and predicts the higher-rated team wins.	Assumes the better team wins 100% of the time. The highest Elo team always wins the simulation, which isn't how football works.
Markov Chains	Calculates state transitions from group stages to knockouts.	Tries to manually map every permutation, creating an intractable state-space explosion.

MonteCarlo works because it bypasses all of these issues. Here is exactly why it’s perfect for a live tournament:

It embraces the chaos: Because every game is a digital "coin flip," the engine naturally simulates real-world upsets.
It solves the bracket problem: You can’t predict who wins knockouts unless you know who won the group stages. If a strong team accidentally finishes second in their group or even last, Monte Carlo maps out every pathway for us, showing the newer, harder path they face.
It is adaptive: If France loses 2-0 in their first game, the simulation instantly adapts, and the next 10,000 runs will show a much harder reality for them.

The Mathematical Core

The engine relies on three layers to simulate the tournament accurately:

Historical Elo: Calculates team strength using every international match since 1872.
Poission Goal Distributions: Converts Elo ratings into actual game results (e.g., 2-0, 0-0)
Market Value Index (MVI): Adjusts Elo based on the team's current financial value, which closely reflects the strength of the starting squad.

Phase 1: Rolling Historical Elo Model

To calculate the baseline elo, the engine uses this formula:

$$E_A = \frac{1}{1 + 10^{(R_B - R_A)/400}}$$

Where R_A is Team A's current rating, and R_B is Team B's current rating. The number 400 is a scaling factor, and in this case, it indicates that if Team A is rated 400 points higher, they have around a 91% chance of winning.

Once the match finishes, the engine calculates a new R'_A based on whether the team overperformed or underperformed:

$$R'_A = R_A + K \cdot (S_A - E_A)$$

Phase 2: Poisson Goal-Distribution Model

In a tournament like the World Cup, just knowing who is better isn't enough. We need exact scores to figure out which teams move on in each stage and to resolve any tiebreakers.

To convert ELO into a scoreline, I used the classic Poisson Distribution, which calculates the probability of random events occurring given the average rate.

The probability of a team scoring exactly k goals in a match is calculated as:

$$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$

Where k is the number of goals we are testing for (0, 1, 2, 3, etc.).
Lambda is the expected number of goals that the team is projected to score.
e is Euler's constant

To get the expected number of goals (lambda), we map the Elo difference against the global scoring average:

$$\lambda_H = e^{\beta_0 + \beta_1 \cdot (Elo_H - Elo_A)}$$

$$\lambda_A = e^{\beta_0 - \beta_1 \cdot (Elo_H - Elo_A)}$$

beta_0 represents the scoring rate (the average goals scored per team per game).
beta_1 is a scaling factor that determines how much an Elo advantage increases a team’s goal output.

By running these two independent distributions against each other, the engine is able to generate can generate a probability grid for every possible scoreline.

Phase 3: The Market Value Index (MVI):

Historical Elo works well, but it has a critical flaw. A team may have a bad history, but suddenly has many strong players. To account for teams like this, I built the MVI (Market Value Index).

I went ahead and compared all the financial market valuations of the players. To focus purely on the core strength of the team, the engine only accounts for the 11 most valuable players:

$$MVI_{\text{team}} = \frac{\sum_{i=1}^{11} \text{Value}i}{\mu{\text{tournament}}}$$

Where the numerator is the total financial value of a country's top
The denominator is the average Starting XI value across all 48 teams in the tournament.

An MVI of 1.0 means the squad is perfectly average. An MVI of 3.5 means their starting roster is worth 3.5 times the tournament average. We can then adjust the historical elo based on this information before determining the scoreline using the Poisson distribution.

$$Elo_{\text{adjusted}} = Elo_{\text{base}} + (W \times \ln(MVI_{\text{team}}))$$

The Tuning Weight (W = 50): This adjusts a team's Elo by roughly 15–20%. History remains the anchor, but current talent gets a heavy vote.
The Logarithmic Dampener (ln): Financial value in football is non-linear; a €150M player isn't automatically three times better than a €50M player. So (ln) squashes the curve and prevents multi-billion-euro squads from inflating their Elo exponentially.

Data Engineering & Decoupling the Ingestion Loop

To calculate the MVI, the engine processes a dataset of 1,248 players, which is an insane number. Handling this volume of data always comes with performance issues. To fix this, I isolated the data preparation into a decoupled staging script.

The script takes raw JSON batches for each nation and then extracts the top 11 players of every team and sorts them. This creates an accurate substitute for the starting lineup strength.

import os
import json
import pandas as pd

def process_all_squads(raw_dir, output_path):
    processed_teams = []
    
    for file_name in os.listdir(raw_dir):
        if not file_name.endswith('.json'):
            continue
            
        with open(os.path.join(raw_dir, file_name), 'r') as f:
            data = json.load(f)
            
        team_name = data['team']
        # Sort players by financial valuation descending
        sorted_roster = sorted(data['players'], key=lambda x: x['value'], reverse=True)
        
        # Isolate top 11 players for the starting XI proxy
        starting_xi = sorted_roster[:11]
        starting_xi_value = sum(p['value'] for p in starting_xi)
        star_player = starting_xi[0]['name']
        
        processed_teams.append({
            'team': team_name,
            'starting_xi_value': starting_xi_value,
            'star_player': star_player
        })
        
    df = pd.DataFrame(processed_teams)
    # Calculate tournament global mean for the denominator
    global_mean = df['starting_xi_value'].mean()
    df['mvi'] = df['starting_xi_value'] / global_mean
    
    df.to_csv(output_path, index=False)

By separating this step, we can extract a CSV with just these details (squad_features.csv), and we are able to have the core engine run entirely on pre-calculated data features.

The Monte Carlo Simulation Engine

The engine is in a separate file, and it runs the entire tournament structure 10,000 times. To replicate the real-world constraints, it enforces two major rules: Group Stage Sorting Matrix and Knockout Extra Time Logic.

Group Stage Sorting Matrix

When teams finish with the same number of points, we have to replicate official tie-breaking protocols. So for this, I used a multi-layered pandas sort to evaluate points, goal difference, and goals scored in the exact order.

group_df = group_df.sort_values(
    by=['Points', 'GoalDifference', 'GoalsFor'], 
    ascending=False
)

Knockout Extra Time Logic

Unlike the group stage, knockout games require a winner. If the Poisson distribution generates a draw, the engine has to break the tie. To do this, I implemented an asymmetric coin flip weighted by the teams' adjusted Elo ratings.

def resolve_knockout(team_a_elo, team_b_elo):
    prob_a = 1 / (1 + 10 ** ((team_b_elo - team_a_elo) / 400))
    import random
    return "team_a" if random.random() < prob_a else "team_b"

The Bracket Logger

To advance in the tournament, the script needs to save scores and update the bracket to only evaluate the winners of each stage. For this, I implemented the script on the first iteration to record match pathways and scorelines into a single file: sample_bracket.json.

Live State Management & Engineering Fault Tolerance

What makes this project stand out even more is its live state management and updates during the World Cup. So once the tournament begins (today), the system shifts from being just a predictive model to a tracker by handling two major risks: The Elimination Trap and Timezone Offsets.

The Elimination Trap

At the start of each run, the engine reads elo_results.csv and checks to see if a match already has a real-world score recorder. If it does, it locks that score in for all 10000 runs. This instantly forces any eliminated team to drop to a 0% probability, allowing us to continue to predict accurately.

Timezone Offsets

Matches across North America have many late-night finishes that spill into the next day in UTC. I set up a cron job to pull the latest scores and results every day, but a standard UTC cloud cron job will miss these late results. To fix this, I anchored the parameters to the West Coast timezone.

params = {
    'league': '1',
    'season': '2026',
    'timezone': 'America/Los_Angeles'
}

The script filters data by match status, accepting only completed games, so the pipeline does not error with corrupted or partial data.

Autonomous CI/CD Pipeline

To pull live data, I configured a GitHub Actions workflow. It handles the live data ingestion, runs the 10,000 simulations, and saves the new states fully autonomously.

name: Daily World Cup Data Update

on:
  schedule:
    # Runs at 06:00 UTC every day to ensure all matches have concluded
    - cron: '0 6 * * *'
  # Allows you to trigger the run manually from the GitHub Actions tab
  workflow_dispatch: 

permissions:
  contents: write # Needed so the bot can push changes back to the repo

jobs:
  update-data:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'
          cache: 'pip'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run live update pipeline
        env:
          # This pulls your API key from GitHub Secrets
          API_SPORTS_KEY: ${{ secrets.API_SPORTS_KEY }}
        run: python src/update_live_data.py

      - name: Commit and push updated data
        run: |
          git config --local user.email "github-actions[bot]@users.noreply.github.com"
          git config --local user.name "github-actions[bot]"
          
          # Stage the updated data files
          git add data/processed/elo_results.csv
          git add data/processed/simulation_results.csv
          
          # Check if anything actually changed, and if so, commit and push
          git diff --quiet && git diff --staged --quiet || (git commit -m "Auto-update World Cup live data & simulations" && git push)

Streamlit Integration

The frontend is a simple Streamlit dashboard directly linked to the repository. Whenever the GitHub Action finishes, it pushes the fresh simulation_results.csv and sample_bracket.json files. Streamlist then updates the dashboard and displays it live.

Conclusion & Source Code

Building this project was lots of fun and came with a challenge. It took me a few weeks of working on it to learn all the concepts and figure out the optimal way to do this. There’s no guarantee the World Cup plays out exactly like this. But if it does, having the receipts to prove I predicted it early will be pretty cool. Unfortunately, the engine didn’t pick my favorite team to win. So, I’m hoping the math is right about everything else, and completely wrong about that.

Check out the full code on my GitHub: https://github.com/AAsteriskz7/WC26-MonteCarlo, and make sure to follow my GitHub and Hashnode to get updates on everything I do.

Check out the live dashboard on Streamlit Cloud: https://aasteriskz-wc26-montecarlo.streamlit.app/ and watch the progress throughout the World Cup!

Building an Autonomous Monte Carlo Engine to Predict the 2026 World Cup

Why Monte Carlo?

The Mathematical Core

Phase 1: Rolling Historical Elo Model

Phase 2: Poisson Goal-Distribution Model

Phase 3: The Market Value Index (MVI):

Data Engineering & Decoupling the Ingestion Loop

The Monte Carlo Simulation Engine

Live State Management & Engineering Fault Tolerance

The Elimination Trap

Timezone Offsets

Autonomous CI/CD Pipeline

Streamlit Integration

Conclusion & Source Code

Comments

More from this blog

Architecting the Agentic Loop: Antigravity 2.0, Gemini 3.5 Flash, and Managed Sandbox Execution

Stop Prompting, Start Executing: How I Use Gemini Skills to 10x My Workflow.

Command Palette

Why Monte Carlo?

The Mathematical Core

Phase 1: Rolling Historical Elo Model

Phase 2: Poisson Goal-Distribution Model

Phase 3: The Market Value Index (MVI):

Data Engineering & Decoupling the Ingestion Loop

The Monte Carlo Simulation Engine

Live State Management & Engineering Fault Tolerance

The Elimination Trap

Timezone Offsets

Autonomous CI/CD Pipeline

Streamlit Integration

Conclusion & Source Code

Comments

More from this blog