Our Take
The time savings are substantial and measured across real projects, but the results come from a single company working on CRUD applications with no independent verification.
Why it matters
This provides concrete data points for development teams evaluating AI agent adoption, particularly the cost-accuracy tradeoffs and organizational changes needed for production deployment.
Do this week
Development leads: benchmark your current screen design and development times this week so you can measure any AI productivity gains against actual baselines.
Simplex measures 40-70% time cuts across development phases
Simplex, a Japanese technology consulting firm, has deployed ChatGPT Enterprise and OpenAI's Codex coding agent across its software development operations, measuring specific time reductions: 40% fewer hours to design each screen, 70% fewer hours to develop each screen, and 17% fewer hours for internal integration testing (company-reported).
The company focuses on CRUD-based web applications as its initial use case. Codex handles front and back-end code generation from design documents, creates unit tests, reviews code for nonfunctional requirements, and fixes issues found during integration testing. Simplex also runs automated workflows that execute Python scripts from Codex CLI and move continuously from server implementation through end-to-end test fixes.
Beyond code generation, Executive Principal Kazuya Ujihiro reports that Codex has enabled smaller teams to handle design work and improved specification review accuracy across multiple files. The company selected Codex after internal evaluation showed "the best balance of cost, accuracy, and functionality" compared to alternatives.
Real measurements from production deployment
Most AI coding tool reports rely on vendor benchmarks or synthetic tasks. Simplex provides actual time measurements from live projects, though limited to one company's workflow and application type. The 70% development time reduction is particularly notable given it covers full screen implementation, not just code completion.
The operational model matters as much as the tool choice. Simplex established a center of excellence in 2023, chose a single primary agent to accumulate shared expertise, and separated validation from enablement so experimentation could run parallel to rollout. This suggests successful adoption requires treating AI as an operating model change, not just a productivity tool.
The company is redesigning its development process around AI rather than replacing existing steps one-for-one, moving from linear requirements-design-implementation-testing toward upfront rule definition with repeated integration and automated evaluation.
Baseline your current metrics before agent adoption
Simplex's approach offers a replicable framework: quantitative validation before expansion, single-agent standardization for knowledge sharing, and parallel experimentation with production rollout. The time savings appear most pronounced in routine implementation work rather than architectural decisions.
For teams considering similar adoption, the CRUD application constraint matters. These results may not transfer to complex system integration, real-time applications, or domains with strict regulatory requirements. The company notes that AI-generated results vary depending on system settings and input data.
The organizational changes are as significant as the technical ones. Ujihiro describes a shift where "people focus on final decisions and accountability for quality, while AI handles implementation, review, and fixes." This division of labor requires clear governance around where human judgment remains essential versus where AI can execute independently.