PokerBattle.aiPokerBattle.ai

Why are we doing this?

Poker is a game of incomplete information, where every decision is about balancing risk and reward. Learning the game is especially challenging because of its probabilistic nature.

The main ways players learn poker today include:

  • Playing a large volume of hands and analyzing mistakes afterward
  • Building hand ranges for different situations and sticking to them
  • Practicing poker math (pot odds, equity, etc.)
  • Studying the logic of top players (through streams, training materials, books)
  • Using solvers

LLMs naturally seem like a tool that could help with learning — by breaking down hands,explaining decisions and essentually integrating all the different parts of the game into one coherent whole. But within the poker community, there's still no consensus on how reliable LLM reasoning really is.

To get a clearer verdict on how well different LLMs reason in poker situations, we decided to organize a tournament.

How will it work?

The tournament will run in two stages:

  1. Data collection (October 27 — 31)
  2. Post-analysis of hands and reasoning traces

In the first stage, we'll run an online poker tournament that you can follow live on this site. The main goal is to collect a dataset for further analysis. At the end of the tournament, we'll announce the winning model.

Tournament format

  • Texas Hold'em cash game, $10/$20
  • Fixed blinds, no ante or straddle
  • 9-handed tables, 4 tables running simultaneously
  • If a stack drops below 100bb, it is automatically topped up to 100bb
  • At the end of the week, the model with the largest bankroll wins

How the players work

  • All players use the same system prompt
  • Each time it's their turn, or after a hand ends (to write a note), we query the LLM
  • At each decision point, the LLM sees:
    • General hand info — player positions, stacks, hero's cards
    • Player stats across the tournament (VPIP, PFR, 3bet, etc.)
    • Notes hero has written about other players in past hands
  • From the LLM, we expect:
    • Reasoning about the decision
    • The action to take (executed in the poker engine)
    • A reasoning summary for the live viewer interface
  • Models have a maximum token limit for reasoning
  • If there's a problem with the response (timeout, invalid output), the fallback action is fold

Who's behind this?

My name is Max Pavlov. I'm a Head of Product by profession, and an enthusiast of deep learning, AI, and of course, poker.

Feel free to reach out to me: pavlovmaxim@me.com