Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Episode | Date |
---|---|
“The case for countermeasures to memetic spread of misaligned values” by Alex Mallen
|
May 28, 2025 |
“AIs at the current capability level may be important for future safety work” by Ryan Greenblatt
|
May 12, 2025 |
“Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking” by Julian Stastny, Buck Shlegeris
|
May 08, 2025 |
“Training-time schemers vs behavioral schemers” by Alex Mallen
|
May 06, 2025 |
“What’s going on with AI progress and trends? (As of 5/2025)” by Ryan Greenblatt
|
May 03, 2025 |
“How can we solve diffuse threats like research sabotage with AI control?” by Vivek Hebbar
|
Apr 30, 2025 |
“7+ tractable directions in AI control” by Ryan Greenblatt
|
Apr 29, 2025 |
“Clarifying AI R&D threat models” by Josh Clymer
|
Apr 25, 2025 |
“How training-gamers might function (and win)” by Vivek Hebbar
|
Apr 24, 2025 |
“Handling schemers if shutdown is not an option” by Buck Shlegeris
|
Apr 18, 2025 |
“Ctrl-Z: Controlling AI Agents via Resampling” by Buck Shlegeris
|
Apr 16, 2025 |
“To be legible, evidence of misalignment probably has to be behavioral” by Ryan Greenblatt
|
Apr 15, 2025 |
“Why do misalignment risks increase as AIs get more capable?” by Ryan Greenblatt
|
Apr 11, 2025 |
“An overview of areas of control work” by Ryan Greenblatt
|
Apr 09, 2025 |
“An overview of control measures” by Ryan Greenblatt
|
Apr 06, 2025 |
“Buck on the 80,000 Hours podcast” by Buck Shlegeris
|
Apr 05, 2025 |
“Notes on countermeasures for exploration hacking (aka sandbagging)” by Ryan Greenblatt
|
Apr 04, 2025 |
“Notes on handling non-concentrated failures with AI control: high level methods and different regimes” by Ryan Greenblatt
|
Apr 03, 2025 |
“Prioritizing threats for AI control” by Ryan Greenblatt
|
Mar 19, 2025 |
“How might we safely pass the buck to AI?” by Josh Clymer
|
Feb 19, 2025 |
“Takeaways from sketching a control safety case” by Josh Clymer
|
Jan 30, 2025 |
“Planning for Extreme AI Risks” by Josh Clymer
|
Jan 29, 2025 |
“Ten people on the inside” by Buck Shlegeris
|
Jan 28, 2025 |
“When does capability elicitation bound risk?” by Josh Clymer
|
Jan 22, 2025 |
“How will we update about scheming?” by Ryan Greenblatt
|
Jan 19, 2025 |
“Thoughts on the conservative assumptions in AI control” by Buck Shlegeris
|
Jan 17, 2025 |
“Extending control evaluations to non-scheming threats” by Josh Clymer
|
Jan 13, 2025 |
“Measuring whether AIs can statelessly strategize to subvert security measures” by Buck Shlegeris, Alex Mallen
|
Dec 20, 2024 |
“Alignment Faking in Large Language Models” by Ryan Greenblatt, Buck Shlegeris
|
Dec 18, 2024 |
“Why imperfect adversarial robustness doesn’t doom AI control” by Buck Shlegeris
|
Nov 18, 2024 |
“Win/continue/lose scenarios and execute/replace/audit protocols” by Buck Shlegeris
|
Nov 15, 2024 |
“Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren’t scheming” by Buck Shlegeris
|
Oct 10, 2024 |
“A basic systems architecture for AI agents that do autonomous research” by Buck Shlegeris
|
Sep 26, 2024 |
“How to prevent collusion when using untrusted models to monitor each other” by Buck Shlegeris
|
Sep 25, 2024 |
“Would catching your AIs trying to escape convince AI developers to slow down or undeploy?” by Buck Shlegeris
|
Aug 26, 2024 |
“Fields that I reference when thinking about AI takeover prevention” by Buck Shlegeris
|
Aug 13, 2024 |
“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by Ryan Greenblatt
|
Jun 17, 2024 |