Workflow
Codex as the engineer you brief: spec → diff → tests → review (Day 24 of the 30-Day Challenge)
The task
Ship a small feature with OpenAI Codex using the professional loop: write a brief, read the diff, run the tests, review like it's a junior's PR. This is the Day 24 code-lane build from the 30-Day AI-Native Challenge — and it works on the free ChatGPT tier at low limits, which makes it the $0 entry into agentic coding.
Before AI
"Write me a function" prompting produces code you paste, run, debug, and half-rewrite. The skill isn't prompting harder — it's briefing better and reviewing properly, the same two things that make human delegation work.
What you'll need
- Codex (any ChatGPT tier; free works at the lowest limits) or Claude Code as the drop-in alternative
- A small repo you know — a side project, or the app you built on Day 25
- One genuinely small feature (an endpoint, a filter, a CSV export). Not a refactor. Not "improve the code."
The workflow
1. Write the brief. The template that separates a spec from a wish:
TASK: Add [feature] to this repo. BEHAVIOR: - Given [input/state], it should [exact outcome] - Edge case: [empty input / missing field / duplicate] should [behavior] - Out of scope: [the adjacent thing you do NOT want touched] CONSTRAINTS: - Follow the existing patterns in [point to a similar file] - No new dependencies without asking me first - Write tests: happy path + each edge case above PROCESS: 1. Tell me your plan and which files you'll touch — WAIT for my ok 2. Implement 3. Run the tests and show me the output — real output, not a description of it
That plan-first step is the cheapest review point you'll ever get: correcting a plan costs one sentence; correcting a diff costs an afternoon.
2. Review the plan. Wrong files? Missed the edge case? Say so now, in one line.
3. Read the actual diff. Not the summary — the diff. You're checking three things: does it match the plan, did it touch anything out of scope, and would you have named/structured it that way. If a hunk confuses you:
Explain the change in [file] line by line. Why [specific choice]?
4. Make it prove the tests ran. "All tests pass" is a claim, not evidence. You want the runner output. If it can't run them in your setup, run them yourself before anything merges.
5. Review like a senior, not a fan. The checklist:
Review your own diff against this list, and answer honestly: - Secrets or credentials anywhere? Hardcoded config? - Error handling: what happens when [the edge case] hits in production? - What's the ugliest part of this change, and why did you do it that way?
That last question is surprisingly effective — models are more candid critiquing their own diff than defending it.
6. Commit with an honest message — including that Codex wrote it, if your team tracks that. You reviewed it; that's the part that makes it yours.
Verify it worked
The feature does the BEHAVIOR lines, the edge-case tests exist and pass in your terminal, and out-of-scope files show zero changes in the diff. All three, or it's not done.
Troubleshooting
- Touches files it shouldn't? Your out-of-scope line was vague. Name the files/dirs explicitly.
- Tests are theater (asserting
true, testing the mock)? Add: "each test must fail if the behavior breaks — show me which assertion catches which behavior." - Grinds on your repo's setup? Give it the environment facts up front (package manager, test command, node/python version) — the same onboarding you'd give a contractor.
Reality check
The METR result — devs 19% slower with AI while feeling 20% faster — is about skipping exactly the steps above. The loop's speed comes from briefs that prevent rework and reviews that catch it early, not from typing less.
Data & security
Never let generated code near payments, auth, or customer data without human review — your challenge's responsible-AI checklist applies double here. Watch for secrets in diffs; agents love hardcoding the config they were shown.
Going further
Same loop, bigger tool: Claude Code on a legacy refactor. Then package what you learned as a reusable skill — the SKILL.md workflow.
Your takeaway
The brief→plan→diff→tests→review loop — the difference between shipping AI code and shipping AI liabilities, practiced on something small enough to be safe.
Source: Agentic Daily