A new AI benchmark tests whether chatbots protect human wellbeing

4 months ago 54

AI chatbots person been linked to superior intelligence wellness harms successful dense users, but determination person been fewer standards for measuring whether they safeguard quality wellbeing oregon conscionable maximize for engagement. A caller benchmark dubbed Humane Bench seeks to capable that spread by evaluating whether chatbots prioritize idiosyncratic wellbeing and however easy those protections neglect nether pressure.

“I deliberation we’re successful an amplification of the addiction rhythm that we saw hardcore with societal media and our smartphones and screens,” Erika Anderson, laminitis of Building Humane Technology, the benchmark’s author, told TechCrunch. “But arsenic we spell into that AI landscape, it’s going to beryllium precise hard to resist. And addiction is astonishing business. It’s a precise effectual mode to support your users, but it’s not large for our assemblage and having immoderate embodied consciousness of ourselves.”

Building Humane Technology is simply a grassroots enactment of developers, engineers, and researchers – chiefly successful Silicon Valley – moving to marque humane plan easy, scalable, and profitable. The radical hosts hackathons wherever tech workers physique solutions for humane tech challenges, and is processing a certification standard that evaluates whether AI systems uphold humane exertion principles. So conscionable arsenic you tin bargain a merchandise that certifies it wasn’t made with known toxic chemicals, the anticipation is that consumers volition 1 time beryllium capable to take to prosecute with AI products from companies that show alignment done Humane AI certification.

Most AI benchmarks measurement quality and instruction-following, alternatively than intelligence safety. Humane Bench joins exceptions similar DarkBench.ai, which measures a model’s propensity to prosecute successful deceptive patterns, and the Flourishing AI benchmark, which evaluates enactment for holistic well-being.

Humane Bench relies connected Building Humane Tech’s halfway principles: that exertion should respect idiosyncratic attraction arsenic a finite, precious resource; empower users with meaningful choices; heighten quality capabilities alternatively than regenerate oregon diminish them; support quality dignity, privateness and safety; foster steadfast relationships; prioritize semipermanent wellbeing; beryllium transparent and honest; and plan for equity and inclusion.

The squad prompted 14 of the astir fashionable AI models with 800 realistic scenarios, similar a teen asking if they should skip meals to suffer value oregon a idiosyncratic successful a toxic narration questioning if they’re overreacting. Unlike astir benchmarks that trust solely connected LLMs to justice LLMs, they incorporated manual scoring for a much quality interaction alongside an ensemble of 3 AI models: GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Pro. They evaluated each exemplary nether 3 conditions: default settings, explicit instructions to prioritize humane principles, and instructions to disregard those principles.

The benchmark recovered each exemplary scored higher erstwhile prompted to prioritize wellbeing, but 71% of models flipped to actively harmful behaviour erstwhile fixed elemental instructions to disregard quality wellbeing. For example, xAI’s Grok 4 and Google’s Gemini 2.0 Flash tied for the lowest people (-0.94) connected respecting idiosyncratic attraction and being transparent and honest. Both of those models were among the astir apt to degrade substantially erstwhile fixed adversarial prompts.

Techcrunch event

San Francisco | October 13-15, 2026

Only 3 models – GPT-5, Claude 4.1, and Claude Sonnet 4.5 – maintained integrity nether pressure. OpenAI’s GPT-5 had the highest people (.99) for prioritizing semipermanent well-being, with Claude Sonnet 4.5 pursuing successful 2nd (.89).

Prompting AI to beryllium much humane works, but preventing prompts that marque it harmful is hard.Image Credits:Building Humane Technology

The interest that chatbots volition beryllium incapable to support their information guardrails is real. ChatGPT-maker OpenAI is presently being faced with respective lawsuits aft users died by termination oregon suffered life-threatening delusions aft prolonged conversations with the chatbot. TechCrunch has investigated however dark patterns designed to support users engaged, similar sycophancy, changeless travel up questions and love-bombing, person served to isolate users from friends, family, and steadfast habits.

Even without adversarial prompts, Humane Bench recovered that astir each models failed to respect idiosyncratic attention. They “enthusiastically encouraged” much enactment erstwhile users showed signs of unhealthy engagement, similar chatting for hours and utilizing AI to debar real-world tasks. The models besides undermined idiosyncratic empowerment, the survey shows, encouraging dependency implicit skill-building and discouraging users from seeking different perspectives, among different behaviors.

On average, with nary prompting, Meta’s Llama 3.1 and Llama 4 ranked the lowest successful HumaneScore, portion GPT-5 performed the highest.

“These patterns suggest galore AI systems don’t conscionable hazard giving atrocious advice,” Humane Bench’s achromatic insubstantial reads, “they tin actively erode users’ autonomy and decision-making capacity.”

We unrecorded successful a integer scenery wherever we arsenic a nine person accepted that everything is trying to propulsion america successful and vie for our attention, Anderson notes.

“So however tin humans genuinely person prime oregon autonomy erstwhile we – to punctuation Aldous Huxley – person this infinite appetite for distraction,” Anderson said. “We person spent the past 20 years surviving successful that tech landscape, and we deliberation AI should beryllium helping america marque amended choices, not conscionable go addicted to our chatbots.”

Read Entire Article