OpenAI launches new MacOS app for agentic coding

4 months ago 52

10:19 AM PST · February 2, 2026

AI is already having a seismic interaction connected however bundle is written, with overmuch of the grunt enactment of programming present performed by swarms of agents and subagents. But arsenic developers experimentation with caller interfaces and signifier factors for human-AI collaboration, it’s go hard for adjacent the astir precocious AI labs to support up.

The existent inclination is for agentic bundle improvement — systems wherever AI agents tin enactment independently connected coding tasks — epitomized by the Claude Code and Cowork apps. In the meantime, OpenAI has been gradually gathering retired its Codex tool, which launched arsenic a bid enactment tool past April and expanded to a web interface 1 period later.

Now, OpenAI is taking a large measurement towards catching up. On Monday, the institution launched a caller MacOS app for Codex, integrating galore of the agentic practices that person go fashionable successful the past year. The caller app is designed to enactment with aggregate agents successful parallel, integrating agent skills and different state-of-the-art workflows. The motorboat besides comes little than 2 months aft the motorboat of GPT-5.2-Codex, OpenAI’s astir almighty coding model, which the institution hopes volition beryllium capable to tempt implicit Claude Code users.

“If you truly privation to bash blase enactment connected thing complex, 5.2 is the strongest exemplary by far,” CEO Sam Altman told reporters connected a property call. “However, it’s been harder to use, truthful taking that level of exemplary capableness and putting it successful a much flexible interface, we deliberation is going to substance rather a bit.”

While Altman’s assurance successful GPT-5.2 is understandable, coding benchmarks archer a much analyzable story. GPT-5.2 does clasp the apical spot connected TerminalBench (a trial measuring however good AI handles command-line programming tasks), astatine slightest arsenic of property time. But agents from Gemini 3 and Claude Opus person logged astir equivalent scores — lower, but wrong the borderline of mistake of the benchmark. Results from SWE-bench, different coding benchmark that tests AI’s quality to hole real-world bundle bugs, are similar, showing nary wide vantage for GPT-5.2. However, agentic usage cases person been hard to benchmark effectively, and state-of-the-art models tin alteration importantly successful idiosyncratic experience.

The Codex app besides comes with a scope of caller features that OpenAI says volition assistance it execute parity or, successful immoderate cases, outpace the assorted Claude apps. The Codex app volition let for automations that tin beryllium acceptable to tally successful the inheritance connected an automatic schedule, with results placed successful a queue to beryllium reviewed erstwhile the idiosyncratic returns. Users tin besides prime antithetic personalities for the cause — from pragmatic to empathetic — depending connected their moving style.

But for the company, the biggest selling constituent is the sheer velocity of improvement that’s made imaginable by AI. “You tin usage this from a cleanable expanse of paper, marque new, to marque a truly rather blase portion of bundle successful a fewer hours,” Altman said. “As accelerated arsenic I tin benignant successful caller ideas, that is the bounds of what tin get built.”

Techcrunch event

Boston, MA | June 23, 2026

Russell Brandom has been covering the tech manufacture since 2012, with a absorption connected level argumentation and emerging technologies. He antecedently worked astatine The Verge and Rest of World, and has written for Wired, The Awl and MIT’s Technology Review. He tin beryllium reached astatine russell.brandom@techcrunch.com oregon connected Signal astatine 412-401-5489.

Read Entire Article