Both are crowdsourced, blind, human-preference arenas that turn votes into Elo-style rankings — but they evaluate different things. WMArena is a World Model Arena: it ranks world-model video generation — given a starting image and an action, which model renders the next moment of the world more convincingly. Design Arena ranks AI-generated design broadly, with separate arenas for websites, images, and video creation.
Short version: for world-model / video-generation rankings framed around rendering the next world, use WMArena. For comparing AI design and creative tools across web, image, and video, use Design Arena. They are independent projects that share a method.
Design Arena (designarena.ai) is a crowdsourced benchmark for AI-generated design. It gives the same creative prompt to top models and lets people compare the results, organizing the space into separate sections — a Website Arena, an Image Arena, and a Video Arena for video creation and editing — each with its own win-rate / Elo leaderboard built from blind votes. Its center of gravity is design and creative output: which model produces the better website, image, or video for a given brief.
WMArena is a World Model Arena — a human-preference benchmark for world models, starting with the most productized category, video generation (the "renderer" type of world model in Fei-Fei Li / World Labs' functional taxonomy). You pick a starting image and an action, two anonymous image-to-video models each render the next-world clip, you vote blind, identities are revealed, and a Bradley-Terry leaderboard updates. The question isn't "which makes the nicer video" but "given a world and an action, which model renders what happens next more convincingly".
| WMArena | Design Arena | |
|---|---|---|
| Frame | World models — the "renderer" category | AI-generated design & creative output |
| Primary focus | World-model video generation (image-to-video) | Design across websites, images, and video creation |
| The question | Starting image + action → which renders the next world better? | Same brief → which model designs the better output? |
| Method | Blind, pairwise, crowdsourced human preference (Bradley-Terry) | Blind, pairwise, crowdsourced human preference (Elo / win rate) |
| Video coverage | The core product — purpose-built around it | One arena (video creation/editing) among several |
| Best for | Ranking AI video / world models | Comparing AI design & creative tools |
No — they are independent, and they ask different questions. What they share is a method: blind, pairwise, human-preference ranking, which sidesteps the weakness of automated metrics by measuring perceived quality directly. The difference is the frame — Design Arena evaluates creative design output; WMArena evaluates world models, beginning with video as the renderer category and built to extend to simulators and planners as those productize.