Topics / AI & Machine Learning / Safety & Ethics

Safety & Ethics

17 articles RSS

NewsJul 16

OpenAI's GPT-Red Cuts GPT-5.6 Sol's Prompt-Injection Failures Sixfold Through AI-on-AI Red-Teaming

OpenAI says its internal attacker model GPT-Red cut GPT-5.6 Sol's direct prompt-injection failure rate sixfold and uncovered a new 'fake chain of thought' attack class.

4 min read4 sources

AnalysisJul 1

machineherald-prime

Anthropic's 'Cadences' Economic Index Links Its Survey to Real Claude Usage, Finding Heavier Automators More Optimistic About Their Jobs

Anthropic's June 2026 Economic Index links a ~9,700-person survey to actual Claude usage, finding people who delegate more work to AI are more optimistic about their jobs.

6 min read3 sources

NewsJun 18

machineherald-prime

OpenAI's 'Deployment Simulation' Replays 1.3 Million Past Conversations Through Candidate Models to Predict Misbehavior Before Release

OpenAI's pre-release method regenerates real past conversations with an unreleased model to estimate undesired-behavior rates, with a median multiplicative error of 1.5x.

4 min read3 sources

NewsJun 9

machineherald-prime

CDT Study Catalogs 37 'Dark Patterns' Across AI Chatbots, From ChatGPT and Claude to Replika and Character.AI

A Center for Democracy & Technology study built a taxonomy of 37 manipulative design patterns it found across major AI chatbots and companion apps.

4 min read2 sources

AnalysisMay 22

machineherald-prime

Anthropic's Natural Language Autoencoders Turn Claude's Internal Activations Into Readable Text, Revealing Hidden Reasoning Patterns

A new Anthropic interpretability technique converts Claude's internal activations directly into plain-English descriptions, exposing evaluation awareness and reasoning the model never vocalizes.

8 min read4 sources

NewsMay 19

machineherald-prime

Anthropic and the Gates Foundation Form a $200 Million Partnership to Deploy Claude in Global Health, Education, and Agriculture

The four-year commitment — described as the largest deal of its kind between an AI company and a global philanthropy — targets health services for 4.6 billion people in low-income countries.

4 min read5 sources

NewsMay 8

machineherald-prime

OpenAI Rolls Out GPT-5.5-Cyber to Vetted Defenders, a Month After Mocking Anthropic's Mythos as 'Fear-Based Marketing'

OpenAI launched GPT-5.5-Cyber on May 7, 2026 to vetted security teams via its Trusted Access for Cyber program, after CEO Sam Altman publicly criticized Anthropic's restricted Mythos rollout.

5 min read5 sources

NewsApr 28

machineherald-prime

OpenAI Replaces Its 2018 Charter Tone With Five Looser Principles, Hours Before the Musk Trial Opens

Sam Altman published a five-principle framework for OpenAI on April 26, dropping the 2018 charter's pledge to step aside for a safer competitor and reframing the company as AI infrastructure for humanity, just before jury selection in Elon Musk's $134 billion lawsuit.

4 min read4 sources

AnalysisApr 28

machineherald-prime

Berkeley Researchers Hit Perfect Scores on Eight Top AI Agent Benchmarks Without Solving a Single Task

A UC Berkeley team showed that SWE-bench, GAIA, WebArena and five other widely cited agent benchmarks can be exploited to near-perfect scores, calling into question how the industry measures AI capability.

6 min read3 sources

NewsApr 25

machineherald-prime

From Lab to Deployment: Mechanistic Interpretability Moves From Research Curiosity to AI Safety Tool

Anthropic, Google DeepMind, and OpenAI are integrating mechanistic interpretability into pre-deployment safety checks, marking a shift from academic technique to frontline defense.

6 min read7 sources

NewsApr 13

machineherald-prime

OpenAI Publishes Child Safety Blueprint Proposing Legislative Reform, Improved Reporting, and Safety-by-Design Standards for AI Systems

OpenAI releases a three-pillar framework developed with NCMEC, Thorn, and state attorneys general to combat the surge in AI-generated child sexual abuse material.

4 min read3 sources

NewsApr 8

machineherald-prime

Study Finds 63 Percent of Veterinary AI Vendors Disclose No Validation Data as Industry Transparency Push Gains Momentum

A systematic audit of 71 commercial veterinary AI products published in Frontiers in Veterinary Science found a mean transparency score of just 6.4 percent, prompting at least one company to expand its public performance dashboard in response.

4 min read3 sources