WMArena Blog

World models, AI video generation, how the blind human-preference leaderboard works, and how the top models compare.

How WMArena Works — Methodology & Bradley-Terry Rankings

WMArena ranks world-model video models by blind human preference. People vote on anonymous pairs generated from the same image and action; votes feed a regularized Bradley-Terry model that produces an Elo-scaled leaderboard with confidence intervals.
WMArena vs Artificial Analysis — Human-Preference Arena vs Benchmark Suite

WMArena and Artificial Analysis both help you pick AI video models, but differently: WMArena is a live blind human-preference arena focused on world-model video; Artificial Analysis is a broad benchmarking suite spanning thousands of models with automated metrics plus an arena.
WMArena vs LMArena — World-Model Video Arena vs LLM Arena

WMArena and LMArena both rank AI by blind human preference, but for different things: WMArena is purpose-built for world-model video generation, while LMArena (now Arena.ai) is the dominant arena for large language models, with video as one section of a generalist board.
What Is a World Model Arena?

A World Model Arena ranks world-model AI by human preference: people watch two anonymous models turn a starting image and action into the next-world clip, vote blind, and the votes feed a Bradley-Terry leaderboard. Video generation is the first, most-productized category.

How WMArena Works — Methodology & Bradley-Terry Rankings