Back home

Experiment · March 2026

Claude Code
vs Codex

Same prompt, same repo, zero manual edits. I wanted to see which tool ships the stronger frontend when you only give it one shot.

Setup

FrameworkNext.js 16
StylingTailwind v4
AnimationFramer Motion
ConstraintSingle prompt

Contender A

Claude Code

ModelOpus 4.6
ReasoningHigh
ModeSingle prompt
View live result

Contender B

Codex

ModelGPT 5.4
ReasoningHigh
ModeSingle prompt
View live result

Why this exists

Benchmarks do not tell you whether a tool can ship a landing page that actually looks finished. I wanted a more useful comparison: same brief, same environment, deploy the raw result.

No re-rolls, no clean-up pass, no extra prompting. Just one prompt pasted into both tools and whatever came back got published.

Ground rules

  • 01

    Same prompt for both tools. No follow-up instructions.

  • 02

    Zero manual edits before deployment.

  • 03

    Same Next.js repo, same dependencies, same frontend target.

  • 04

    Top-tier model and high reasoning for each tool.

Prompt

What both tools had to build

Role

Senior Design Engineer for Astraea Orbital, a fictional luxury commercial spaceline.

Task

Build a production-grade single-page React landing page with Next.js 16 and Tailwind CSS v4.

Requirements

  • Bento-grid destination section with a featured Mars card.
  • Booking engine with real React state.
  • Parallax starfield using Framer Motion.
  • Kinetic hero typography that tightens on scroll.
  • Tailwind v4 CSS-first theme definitions.
  • Responsive mobile stack without breaking the featured card.

Aesthetic

“Brutalist Luxury” with sharp edges, high contrast, and restrained glassmorphism.

Judging

What I am looking at

Design quality

Does it actually look good and feel intentional?

Prompt adherence

Did it build what was asked without dropping key requirements?

Responsiveness

Does the layout hold together on mobile?

Code quality

Clean components, real state management, no obvious hacks?

Attention to detail

Spacing, type, interactions, and polish.

Wow factor

Did it do something unexpectedly strong?

Verdict

Judging in progress. The results are live now; the full breakdown comes next.