This startup is betting tokenmaxxing will create the next compute giant

1 day ago 2

“Give maine tokens. Just springiness maine tokens. I privation them fast. I privation them cheap. I privation them now.”

That’s the mantra for developers gathering bundle connected generative AI models, oregon astatine slightest what Parasail CEO Mike Henry hears. Parasail provides a unreality computing work to companies moving AI models for inference, and Henry told TechCrunch it generates 500 cardinal tokens a day. How’s that for tokenmaxxing?

Henry was an enforcement astatine Groq, the LLM-focused chipmaker, wherever helium built the company’s unreality offering, an aboriginal designation that developers gathering bundle connected AI models would privation unreality processing specialized to their needs. Now, aft coming retired of stealth a twelvemonth ago, Parasail has raised a $32 cardinal Series A to bash that astatine scale.

Henry has a inheritance successful carnal spot design, but Parasail isn’t committed to owning its ain chips. While immoderate of its GPUs are its own, the institution chiefly rents processing clip astatine 40 information centers successful 15 countries astir the globe, and buys much from liquidity markets, orchestrating that each down the scenes to thrust down the outgo of inference requests.

By allocating workloads cleverly and avoiding request peaks, the institution aims to vie with firms that ain their ain silicon and mightiness beryllium constrained by existing lawsuit commitments and workloads.

The company’s imaginable relies connected the continued proliferation of open-source models and agents extracurricular of frontier labs. Parasail’s executives and investors accidental this is driven by the increasing outgo and friction of utilizing offerings from companies similar Anthropic and OpenAI.

Instead, a hybrid architecture is emerging, according to Andreas Stuhlmüller, the CEO of Elicit, a startup that has raised a $22 cardinal Series A to make a probe adjunct for technological literature. His customers astatine apical pharmaceutical companies usage the LLM-based instrumentality to reappraisal and analyse information from tens of thousands of technological papers.

Techcrunch event

San Francisco, CA | October 13-15, 2026

“We’ve moved much towards unfastened models due to the fact that it’s beauteous unsmooth sending 100,000s of requests to an API endpoint,” Stuhlmüller told TechCrunch, particularly present that the institution is relying connected agents to amended its offering, splitting up tasks and moving much strategically implicit longer clip horizons. Open models grip the archetypal screening to thrust down the outgo of the work, earlier a much susceptible frontier exemplary provides a last answer.

The proliferation of exemplary queries, arsenic agents go an progressively communal portion of bundle development, is driving the concern successful companies similar Parasail that supply the infrastructure for inexpensive inference. Samir Kumar, a spouse astatine Touring Capital who co-led this round, told TechCrunch helium expects inference to beryllium astatine slightest 20% of the outgo of gathering bundle successful the future.

How overmuch of that marketplace could beryllium Parasail’s? In the crowded unreality compute space, Henry argues that his firm’s absorption connected inference (no grooming allowed) and willingness to instrumentality connected startup customers without semipermanent commitments sets his offering isolated from larger cloud-computing companies focused connected endeavor business, and adjacent better-funded competitors successful the unreality inference space, similar Fireworks AI and Baseten.

Of course, there’s a antithetic benignant of hazard erstwhile each of your customers are effect and Series B startups successful the unpredictable AI sector.

Steve Jang, a spouse astatine Kindred Ventures, the different co-leader successful this fundraising, says the economics of deploying models volition request the benignant of compute brokerage Parasail provides. And that’s earlier wide usage of models for contented procreation and robotics.

“Everyone thought determination was an AI bubble. There’s nary AI bubble,” helium told TechCrunch. “Inference request is acold outstripping supply.”

Tim Fernholz is simply a writer who writes astir technology, concern and nationalist policy. He has intimately covered the emergence of the backstage abstraction manufacture and is the writer of Rocket Billionaires: Elon Musk, Jeff Bezos and the New Space Race. Formerly, helium was a elder newsman astatine Quartz, the planetary concern quality site, for much than a decade, and began his vocation arsenic a governmental newsman successful Washington, D.C. You tin interaction oregon verify outreach from Tim by emailing tim.fernholz@techcrunch.com oregon via an encrypted connection to tim_fernholz.21 connected Signal.

Read Entire Article