Redwood Research Blog

By Redwood Research

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.


Category: Technology

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast
    

Subscribers: 0
Reviews: 0
Episodes: 37

Description

Narrations of Redwood Research blog posts. Redwood Research is a research nonprofit based in Berkeley. We investigate risks posed by the development of powerful artificial intelligence and techniques for mitigating those risks.

Episode Date
“The case for countermeasures to memetic spread of misaligned values” by Alex Mallen
May 28, 2025
“AIs at the current capability level may be important for future safety work” by Ryan Greenblatt
May 12, 2025
“Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking” by Julian Stastny, Buck Shlegeris
May 08, 2025
“Training-time schemers vs behavioral schemers” by Alex Mallen
May 06, 2025
“What’s going on with AI progress and trends? (As of 5/2025)” by Ryan Greenblatt
May 03, 2025
“How can we solve diffuse threats like research sabotage with AI control?” by Vivek Hebbar
Apr 30, 2025
“7+ tractable directions in AI control” by Ryan Greenblatt
Apr 29, 2025
“Clarifying AI R&D threat models” by Josh Clymer
Apr 25, 2025
“How training-gamers might function (and win)” by Vivek Hebbar
Apr 24, 2025
“Handling schemers if shutdown is not an option” by Buck Shlegeris
Apr 18, 2025
“Ctrl-Z: Controlling AI Agents via Resampling” by Buck Shlegeris
Apr 16, 2025
“To be legible, evidence of misalignment probably has to be behavioral” by Ryan Greenblatt
Apr 15, 2025
“Why do misalignment risks increase as AIs get more capable?” by Ryan Greenblatt
Apr 11, 2025
“An overview of areas of control work” by Ryan Greenblatt
Apr 09, 2025
“An overview of control measures” by Ryan Greenblatt
Apr 06, 2025
“Buck on the 80,000 Hours podcast” by Buck Shlegeris
Apr 05, 2025
“Notes on countermeasures for exploration hacking (aka sandbagging)” by Ryan Greenblatt
Apr 04, 2025
“Notes on handling non-concentrated failures with AI control: high level methods and different regimes” by Ryan Greenblatt
Apr 03, 2025
“Prioritizing threats for AI control” by Ryan Greenblatt
Mar 19, 2025
“How might we safely pass the buck to AI?” by Josh Clymer
Feb 19, 2025
“Takeaways from sketching a control safety case” by Josh Clymer
Jan 30, 2025
“Planning for Extreme AI Risks” by Josh Clymer
Jan 29, 2025
“Ten people on the inside” by Buck Shlegeris
Jan 28, 2025
“When does capability elicitation bound risk?” by Josh Clymer
Jan 22, 2025
“How will we update about scheming?” by Ryan Greenblatt
Jan 19, 2025
“Thoughts on the conservative assumptions in AI control” by Buck Shlegeris
Jan 17, 2025
“Extending control evaluations to non-scheming threats” by Josh Clymer
Jan 13, 2025
“Measuring whether AIs can statelessly strategize to subvert security measures” by Buck Shlegeris, Alex Mallen
Dec 20, 2024
“Alignment Faking in Large Language Models” by Ryan Greenblatt, Buck Shlegeris
Dec 18, 2024
“Why imperfect adversarial robustness doesn’t doom AI control” by Buck Shlegeris
Nov 18, 2024
“Win/continue/lose scenarios and execute/replace/audit protocols” by Buck Shlegeris
Nov 15, 2024
“Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren’t scheming” by Buck Shlegeris
Oct 10, 2024
“A basic systems architecture for AI agents that do autonomous research” by Buck Shlegeris
Sep 26, 2024
“How to prevent collusion when using untrusted models to monitor each other” by Buck Shlegeris
Sep 25, 2024
“Would catching your AIs trying to escape convince AI developers to slow down or undeploy?” by Buck Shlegeris
Aug 26, 2024
“Fields that I reference when thinking about AI takeover prevention” by Buck Shlegeris
Aug 13, 2024
“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by Ryan Greenblatt
Jun 17, 2024