WMARENA Open the arena →

WMArena vs LMArena

Both are blind, human-preference arenas that rank AI with Bradley-Terry ratings — but they rank different things. WMArena is purpose-built for world-model video generation: people vote on which model better renders the next moment of a scene. LMArena (now Arena.ai) is the dominant arena for large language models, and it includes video as one section of a generalist, multi-modality board.

Short version: for video / world-model rankings, use WMArena. For LLM rankings, use LMArena. They are independent projects built on the same human-preference method.

What is LMArena?

LMArena began as Chatbot Arena, built by UC Berkeley's Sky Computing Lab and the LMSYS group, and became the reference leaderboard for large language models. Users compare two anonymous model responses and vote; the votes feed a Bradley-Terry (Elo-scaled) ranking. It later productized as lmarena.ai and, in early 2026, rebranded to Arena.ai, expanding into many modalities — text, code, web development, image, and a video section. Its authority is anchored by the widely cited "Chatbot Arena" paper. In short: a broad, generalist arena whose center of gravity is LLMs.

What is WMArena?

WMArena is a World Model Arena — a human-preference benchmark for world models, starting with the most productized category: video generation (the "renderer" type of world model). You pick a starting image and an action, two anonymous image-to-video models each render the next-world clip, you vote blind, identities are revealed, and a Bradley-Terry leaderboard updates. WMArena is not a generalist board with video bolted on; video and world models are the whole point.

Side by side

WMArenaLMArena / Arena.ai
Primary focusWorld-model video generation (image-to-video)Large language models, across many modalities
Video coverageThe core product — purpose-builtOne section of a generalist board
TaskStarting image + action → next-world clipPrompt → model response (text, code, image, video, …)
MethodBlind, pairwise, crowdsourced human preference (Bradley-Terry)Blind, pairwise, crowdsourced human preference (Bradley-Terry)
Best forRanking AI video / world modelsRanking LLMs
OriginIndependent, world-model focusedUC Berkeley / LMSYS (Chatbot Arena)

Which one should you use?

Are they the same thing?

No — they are independent. What they share is a method: blind, pairwise, human-preference ranking with Bradley-Terry, which Chatbot Arena popularized for LLMs and which WMArena applies to world models. The reason this method spread is simple: automated metrics correlate weakly with what people actually prefer, and a blind crowdsourced arena measures perceived quality directly and is hard to game.

Vote in the arena See the leaderboard What is a World Model Arena?