← All workflows

Workflow

Codex as the engineer you brief: spec → diff → tests → review (Day 24 of the 30-Day Challenge)

✓ TestedDeveloperFor Developer
Time savedthe scaffold-and-boilerplate half of a small feature

The task

Ship a small feature with OpenAI Codex using the professional loop: write a brief, read the diff, run the tests, review like it's a junior's PR. This is the Day 24 code-lane build from the 30-Day AI-Native Challenge — and it works on the free ChatGPT tier at low limits, which makes it the $0 entry into agentic coding.

Before AI

"Write me a function" prompting produces code you paste, run, debug, and half-rewrite. The skill isn't prompting harder — it's briefing better and reviewing properly, the same two things that make human delegation work.

What you'll need

  • Codex (any ChatGPT tier; free works at the lowest limits) or Claude Code as the drop-in alternative
  • A small repo you know — a side project, or the app you built on Day 25
  • One genuinely small feature (an endpoint, a filter, a CSV export). Not a refactor. Not "improve the code."

The workflow

1. Write the brief. The template that separates a spec from a wish:

Code
TASK: Add [feature] to this repo.

BEHAVIOR:
- Given [input/state], it should [exact outcome]
- Edge case: [empty input / missing field / duplicate] should [behavior]
- Out of scope: [the adjacent thing you do NOT want touched]

CONSTRAINTS:
- Follow the existing patterns in [point to a similar file]
- No new dependencies without asking me first
- Write tests: happy path + each edge case above

PROCESS:
1. Tell me your plan and which files you'll touch — WAIT for my ok
2. Implement
3. Run the tests and show me the output — real output, not a description of it

That plan-first step is the cheapest review point you'll ever get: correcting a plan costs one sentence; correcting a diff costs an afternoon.

2. Review the plan. Wrong files? Missed the edge case? Say so now, in one line.

3. Read the actual diff. Not the summary — the diff. You're checking three things: does it match the plan, did it touch anything out of scope, and would you have named/structured it that way. If a hunk confuses you:

Code
Explain the change in [file] line by line. Why [specific choice]?

4. Make it prove the tests ran. "All tests pass" is a claim, not evidence. You want the runner output. If it can't run them in your setup, run them yourself before anything merges.

5. Review like a senior, not a fan. The checklist:

Code
Review your own diff against this list, and answer honestly:
- Secrets or credentials anywhere? Hardcoded config?
- Error handling: what happens when [the edge case] hits in production?
- What's the ugliest part of this change, and why did you do it that way?

That last question is surprisingly effective — models are more candid critiquing their own diff than defending it.

6. Commit with an honest message — including that Codex wrote it, if your team tracks that. You reviewed it; that's the part that makes it yours.

Verify it worked

The feature does the BEHAVIOR lines, the edge-case tests exist and pass in your terminal, and out-of-scope files show zero changes in the diff. All three, or it's not done.

Troubleshooting

  • Touches files it shouldn't? Your out-of-scope line was vague. Name the files/dirs explicitly.
  • Tests are theater (asserting true, testing the mock)? Add: "each test must fail if the behavior breaks — show me which assertion catches which behavior."
  • Grinds on your repo's setup? Give it the environment facts up front (package manager, test command, node/python version) — the same onboarding you'd give a contractor.

Reality check

The METR result — devs 19% slower with AI while feeling 20% faster — is about skipping exactly the steps above. The loop's speed comes from briefs that prevent rework and reviews that catch it early, not from typing less.

Data & security

Never let generated code near payments, auth, or customer data without human review — your challenge's responsible-AI checklist applies double here. Watch for secrets in diffs; agents love hardcoding the config they were shown.

Going further

Same loop, bigger tool: Claude Code on a legacy refactor. Then package what you learned as a reusable skill — the SKILL.md workflow.

Your takeaway

The brief→plan→diff→tests→review loop — the difference between shipping AI code and shipping AI liabilities, practiced on something small enough to be safe.

Source: Agentic Daily

Exact prompts included · Untested steps are marked · Corrections are public