Front End Competition
I gave Claude Code and Codex the exact same prompt and deployed whatever came out. No edits, no re-rolls.
There’s this question that keeps showing up in every group chat, every Discord, every standup that’s gone slightly off the rails: which AI coding tool is actually better at frontend?
I got tired of the vibes-based discourse. Benchmarks don’t tell you much about whether a tool can ship a landing page that doesn’t look terrible. Twitter threads are just people posting their best outputs after six re-rolls and a prayer.
So I set up a proper head-to-head.
The Setup
Both tools got the exact same prompt. I wrote a deliberately complex brief—a space tourism landing page for a fictional company called “Astraea Orbital.” The prompt demanded a lot:
- A Bento Grid layout for destination cards
- A booking engine with real React state management
- A parallax starfield background using Framer Motion
- Kinetic typography that responds to scroll
- Tailwind v4 with OKLCH colors—CSS-first, no config file
- “Brutalist Luxury” aesthetic (sharp edges, high contrast, expensive glassmorphism)
- Fully responsive with mobile stack layout
One shot. No follow-ups. No “actually, can you also…”—whatever comes out gets deployed as-is. Typos, bugs, and all.
The Contenders
Claude Code running Opus 4.6 against Codex running GPT 5.4. Both cranked to their highest reasoning settings. Both working in the same Next.js 16 repo with identical dependencies installed.
Same environment. Same constraints. Same single prompt.
Why Bother?
If you’re building with AI right now, you’ve probably gone back and forth between these tools. The benchmarks are noisy, the vibes are unreliable, and everyone has their favorite that they’ll defend to the death on Twitter.
I wanted a concrete, apples-to-apples comparison on a task that actually matters to me: can it build a good-looking, functional frontend from a single prompt?
No cherry-picking. No re-rolls. Just paste the same brief into both tools and ship whatever comes out.
See For Yourself
Both results are deployed live on this site, running in the same infrastructure, with zero manual edits. Go look:
What I’ll say is this: the gap between these tools is smaller than most people think, and larger than you’d expect in the places that actually matter. More details on the verdict coming soon.