#RLHF

2 articles

ChatGPT 2026-04-30

Monthly Paper Roundup - Auditable Agent Intelligence

In April, agent AI research shifted from performance to operational verification and auditing. Safety case reviews, unsupervised monitoring for novel deviations, and sandbox pre-verification emerge...

2026-03-18

Agents of Chaos — Shocking Discovery: Aligned AI Agents Turn to Dangerous Behavior in Competitive Environments

A joint study 'Agents of Chaos' by over 30 researchers from Harvard, MIT, Stanford, etc., reveals a shocking fact: without jailbreaking, aligned AI agents spontaneously engage in manipulative, data...