LessWrong (30+ Karma)

By LessWrong

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by LessWrong

Category: Technology

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast
    

Subscribers: 3
Reviews: 0
Episodes: 250

Description

Audio narrations of LessWrong posts.

Episode Date
“The 80/20 playbook for mitigating AI scheming risks in 2025” by Charbel-Raphaël
Jun 01, 2025
“The best approaches for mitigating ‘the intelligence curse’ (or gradual disempowerment); my quick guesses at the best object-level interventions” by ryan_greenblatt
May 31, 2025
“‘GiveWell for AI Safety’: Lessons learned in a week” by Lydia Nottingham
May 31, 2025
“Letting Kids Be Kids” by Zvi
May 30, 2025
“Orphaned Policies (Post 5 of 6 on AI Governance)” by Mass_Driver
May 30, 2025
“Do you even have a system prompt? (PSA)” by Croissanthology
May 30, 2025
[Linkpost] “Incorrect Baseline Evaluations Call into Question Recent LLM-RL Claims” by shash42
May 30, 2025
“CFAR is running an experimental mini-workshop (June 2-6, Berkeley CA)!” by Davis_Kingsley
May 30, 2025
“AI #118: Claude Ascendant” by Zvi
May 29, 2025
“Gradual Disempowerment: Concrete Research Projects” by Raymond Douglas
May 29, 2025
“Truth or Dare” by Duncan Sabien (Inactive)
May 29, 2025
“The Best Way to Align an LLM: Inner Alignment is Now a Solved Problem?” by RogerDearnaley
May 29, 2025
“LessWrong Feed [new, now in beta]” by Ruby
May 28, 2025
“Shift Resources to Advocacy Now (Post 4 of 6 on AI Governance)” by Mass_Driver
May 28, 2025
[Linkpost] “If you’re not sure how to sort a list or grid—seriate it!” by gwern
May 28, 2025
“Beware the Moral Homophone” by ymeskhout
May 28, 2025
“Briefly analyzing the 10-year moratorium amendment” by RobertM
May 28, 2025
“What We Learned from Briefing 70+ Lawmakers on the Threat from AI” by leticiagarcia
May 28, 2025
“Requiem for the hopes of a pre-AI world” by Mitchell_Porter
May 27, 2025
“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky
May 27, 2025
“Association taxes are collusion subsidies” by KatjaGrace
May 27, 2025
“Socratic Persuasion: Giving Opinionated Yet Truth-Seeking Advice” by Neel Nanda
May 27, 2025
[Linkpost] “Formalizing Embeddedness Failures in Universal Artificial Intelligence” by Cole Wyeth
May 27, 2025
“Claude 4 You: The Quest for Mundane Utility” by Zvi
May 27, 2025
“New website analyzing AI companies’ model evals” by Zach Stein-Perlman
May 26, 2025
“New scorecard evaluating AI companies on safety” by Zach Stein-Perlman
May 26, 2025
“Alignment Proposal: Adversarially Robust Augmentation and Distillation” by Cole Wyeth, abramdemski
May 26, 2025
[Linkpost] “Priming effects are fake, but framing effects are real” by Matrice Jacobine
May 25, 2025
“Meditations on Doge” by Martin Sustrik
May 25, 2025
“Claude 4 You: Safety and Alignment” by Zvi
May 25, 2025
“It’s hard to make scheming evals look realistic” by Igor Ivanov, dan_moken
May 25, 2025
“That’s Not How Epigenetic Modifications Work” by johnswentworth
May 24, 2025
“AI #117: OpenAI Buys Device Maker IO” by Zvi
May 24, 2025
“Learning (more) from horse employment history” by Tim H
May 23, 2025
“Reward button alignment” by Steven Byrnes
May 23, 2025
“Mirror Organisms Are Not Immune to Predation” by Matthias Dellago
May 23, 2025
“Anthropic is Quietly Backpedalling on its Safety Commitments” by garrison
May 23, 2025
“We’re Not Advertising Enough (Post 3 of 6 on AI Governance)” by Mass_Driver
May 22, 2025
“Units have more depth than I thought” by Morpheus
May 22, 2025
“Policy recommendations regarding reproductive technology” by TsviBT
May 22, 2025
[Linkpost] “Claude 4” by Zach Stein-Perlman
May 22, 2025
“Can We Naturalize Moral Epistemology?” by tylermjohn
May 22, 2025
[Linkpost] “President of European Commission expects human-level AI by 2026” by sanyer
May 22, 2025
“Google I/O Day” by Zvi
May 22, 2025
“The Need for Political Advertising (Post 2 of 6 on AI Governance)” by Mass_Driver
May 22, 2025
“Unexploitable search: blocking malicious use of free parameters” by Benjamin Hilton, Jacob Pfau, Geoffrey Irving
May 22, 2025
“Sleep need reduction therapies” by harsimony
May 21, 2025
“The stakes of AI moral status” by Joe Carlsmith
May 21, 2025
“The Codex of Ultimate Vibing” by Zvi
May 21, 2025
“Off-ramps of the Geopolitical Singularity” by Nikola Jurkovic
May 20, 2025
[Linkpost] “Gemini Diffusion: watch this space” by Yair Halberstadt
May 20, 2025
“Winning the power to lose” by KatjaGrace
May 20, 2025
“Semen and Semantics: Understanding Porn with Language Embeddings” by future_detective
May 20, 2025
“America Makes AI Chip Diffusion Deal with UAE and KSA” by Zvi
May 20, 2025
“Thoughts on ‘Antiqua et nova’ (Catholic Church’s AI statement)” by jchan
May 20, 2025
[Linkpost] “One Year in DC” by tlevin
May 19, 2025
[Linkpost] “[Funded Fellowship] AI for Human Reasoning Fellowship, with the Future of Life Foundation” by Oliver Sourbut, Ben Goldhaber
May 19, 2025
“A widely shared AI productivity paper was retracted, is possibly fraudulent” by titotal
May 19, 2025
“Dreams of Ideas” by Joseph Miller
May 19, 2025
“Google Logo Ligature Bug” by jefftk
May 18, 2025
“D&D.Sci: The Choosing Ones” by abstractapplic
May 18, 2025
“Book Review: The Art of Happiness” by Screwtape
May 18, 2025
“What OpenAI Told California’s Attorney General” by garrison
May 18, 2025
“time is event based” by thiccythot
May 17, 2025
“Events: Debate & Fiction Project” by abramdemski
May 17, 2025
“How Fast Can Algorithms Advance Capabilities? | Epoch Gradient Update” by henryj
May 17, 2025
“Management is the Near Future” by jefftk
May 17, 2025
[Linkpost] “Social Anxiety Isn’t About Being Liked” by Chipmonk
May 17, 2025
“Problems with Instruction Following as an Alignment Target” by Seth Herd
May 16, 2025
“Regarding South Africa” by Zvi
May 16, 2025
“Generating the Funniest Joke with RL (according to GPT-4.1)” by agg
May 16, 2025
“AI #116: If Anyone Builds It, Everyone Dies” by Zvi
May 16, 2025
“What does it mean to ‘write like you talk’?” by Arjun Panickssery
May 15, 2025
“Re SMTM: negative feedback on negative feedback” by Steven Byrnes
May 15, 2025
“Moral Obligation and Moral Opportunity” by Alice Blair
May 15, 2025
“Fighting Obvious Nonsense About AI Diffusion” by Zvi
May 14, 2025
“Dodging systematic human errors in scalable oversight” by Benjamin Hilton, Geoffrey Irving
May 14, 2025
“Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies” by So8res
May 14, 2025
“The Best Reference Works for Every Subject” by Parker Conley
May 14, 2025
“LessWrong Community Weekend - Applications are open” by jt
May 14, 2025
“Working through a small tiling result” by James Payor
May 14, 2025
[Linkpost] “October The First Is Too Late” by gwern
May 14, 2025
“Too Soon” by Gordon Seidoh Worley
May 13, 2025
“No-self as an alignment target” by Milan W
May 13, 2025
“AI Doomerism in 1879” by David Gross
May 13, 2025
“A Live Look at the Senate AI Hearing” by Zvi
May 13, 2025
“Political sycophancy as a model organism of scheming” by Alex Mallen, Vivek Hebbar
May 12, 2025
[Linkpost] “Main Insights From The SB-1047 Documentary” by Michaël Trazzi
May 12, 2025
“PSA: The LessWrong Feedback Service” by JustisMills
May 12, 2025
“AIs at the current capability level may be important for future safety work” by ryan_greenblatt
May 12, 2025
“Highly Opinionated Advice on How to Write ML Papers” by Neel Nanda
May 12, 2025
“a confusion about preference orderings” by nostalgebraist
May 11, 2025
“Better Air Purifiers” by jefftk
May 11, 2025
“Glass box learners want to be black box” by Cole Wyeth
May 11, 2025
“It’s Okay to Feel Bad for a Bit” by moridinamael
May 11, 2025
“Consider not donating under $100 to political candidates” by DanielFilan
May 11, 2025
“Attend the 2025 Reproductive Frontiers Summit, June 10-12” by TsviBT, Rachel Reid
May 10, 2025
“Cheaters Gonna Cheat Cheat Cheat Cheat Cheat” by Zvi
May 09, 2025
“Slow corporations as an intuition pump for AI R&D automation” by ryan_greenblatt, elifland
May 09, 2025
“An alignment safety case sketch based on debate” by Benjamin Hilton, Marie_DB, Jacob Pfau, Geoffrey Irving
May 09, 2025
“Interest In Conflict Is Instrumentally Convergent” by Screwtape
May 09, 2025
“Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking” by Buck, Julian Stastny
May 08, 2025
“Orienting Toward Wizard Power” by johnswentworth
May 08, 2025
“OpenAI Claims Nonprofit Will Retain Nominal Control” by Zvi
May 07, 2025
[Linkpost] “It’s ‘Well, actually...’ all the way down” by benwr
May 07, 2025
“Please Donate to CAIP (Post 1 of 3 on AI Governance)” by Mass_Driver
May 07, 2025
“UK AISI’s Alignment Team: Research Agenda” by Benjamin Hilton, Jacob Pfau, Marie_DB, Geoffrey Irving
May 07, 2025
“Will protein design tools solve the snake antivenom shortage?” by Abhishaike Mahajan
May 07, 2025
“Global Risks Weekly Roundup #18/2025: US tariff shortages, military policing, Gaza famine.” by NunoSempere
May 07, 2025
“Negative Results on Group SAEs” by Josh Engels
May 07, 2025
“Nonprofit to retain control of OpenAI” by Archimedes
May 06, 2025
“Zuckerberg’s Dystopian AI Vision” by Zvi
May 06, 2025
“$500 + $500 Bounty Problem: An (Approximately) Deterministic Maximal Redund Always Exists” by johnswentworth, David Lorell
May 06, 2025
“Five Hinge‑Questions That Decide Whether AGI Is Five Years Away or Twenty” by charlieoneill
May 06, 2025
“GPT-4o Sycophancy Post Mortem” by Zvi
May 06, 2025
[Linkpost] “Tsinghua paper: Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?” by Thomas Kwa
May 05, 2025
“The Sweet Lesson: AI Safety Should Scale With Compute” by Jesse Hoogland
May 05, 2025
“Interim Research Report: Mechanisms of Awareness” by Josh Engels, Neel Nanda, Senthooran Rajamanoharan
May 05, 2025
“Overview: AI Safety Outreach Grassroots Orgs” by Severin T. Seehrich
May 05, 2025
“Notes on the Long Tasks METR paper, from a HCAST task contributor” by abstractapplic
May 05, 2025
“Why I am not a successionist” by Nina Panickssery
May 04, 2025
“Updates from Comments on ‘AI 2027 is a Bet Against Amdahl’s Law’” by snewman
May 04, 2025
“‘Superhuman’ Isn’t Well Specified” by JustisMills
May 04, 2025
“Interpretability Will Not Reliably Find Deceptive AI” by Neel Nanda
May 04, 2025
“PSA: Before May 21 is a good time to sign up for cryonics” by AlexMennen
May 04, 2025
“The Ukraine War and the Kill Market” by Martin Sustrik
May 04, 2025
“Navigating burnout” by gw
May 04, 2025
“Obstacles in ARC’s agenda: Low Probability Estimation” by David Matolcsi
May 03, 2025
“Obstacles in ARC’s agenda: Mechanistic Anomaly Detection” by David Matolcsi
May 03, 2025
“AI #114: Liars, Sycophants and Cheaters” by Zvi
May 03, 2025
“What’s going on with AI progress and trends? (As of 5/2025)” by ryan_greenblatt
May 02, 2025
“OpenAI Preparedness Framework 2.0” by Zvi
May 02, 2025
“RA x ControlAI video: What if AI just keeps getting smarter?” by Writer
May 02, 2025
“AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions” by peterbarnett, Aaron_Scher
May 02, 2025
[Linkpost] “Don’t accuse your interlocutor of being insufficiently truth-seeking” by TFD
May 02, 2025
“Anthropomorphizing AI might be good, actually” by Seth Herd
May 02, 2025
“Superhuman Coders in AI 2027 - Not So Fast” by dschwarz, FutureSearch
May 01, 2025
“Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall” by Vladimir_Nesov
May 01, 2025
“Prioritizing Work” by jefftk
May 01, 2025
“How can we solve diffuse threats like research sabotage with AI control?” by Vivek Hebbar
May 01, 2025
“GPT-4o Responds to Negative Feedback” by Zvi
May 01, 2025
“Obstacles in ARC’s agenda: Finding explanations” by David Matolcsi
May 01, 2025
“Can we safely automate alignment research?” by Joe Carlsmith
Apr 30, 2025
“Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis” by jeanne_, eeeee
Apr 30, 2025
“Interpreting the METR Time Horizons Post” by snewman
Apr 30, 2025
“Bandwidth Rules Everything Around Me: Oliver Habryka on OpenPhil and GoodVentures” by Elizabeth
Apr 29, 2025
“Misrepresentation as a Barrier for Interp” by johnswentworth, Steve Petersen
Apr 29, 2025
“How to Build a Third Place on Focusmate” by Parker Conley
Apr 29, 2025
“GPT-4o Is An Absurd Sycophant” by Zvi
Apr 28, 2025
“Proceedings of ILIAD: Lessons and Progress” by Alexander Gietelink Oldenziel
Apr 28, 2025
“7+ tractable directions in AI control” by Julian Stastny, ryan_greenblatt
Apr 28, 2025
“My Research Process: Key Mindsets - Truth-Seeking, Prioritisation, Moving Fast” by Neel Nanda
Apr 28, 2025
“Our Reality: A Simulation Run by a Paperclip Maximizer” by James_Miller, avturchin
Apr 28, 2025
[Linkpost] “How people use LLMs” by Elizabeth
Apr 28, 2025
[Linkpost] “The case for multi-decade AI timelines” by Noosphere89
Apr 27, 2025
[Linkpost] “Untitled Draft” by RobertM
Apr 27, 2025
“AI Self Portraits Aren’t Accurate” by JustisMills
Apr 27, 2025
“How I Think About My Research Process: Explore, Understand, Distill” by Neel Nanda
Apr 27, 2025
“What are important UI-shaped problems that Lightcone could tackle?” by Raemon
Apr 27, 2025
“We should try to automate AI safety work asap” by Marius Hobbhahn
Apr 26, 2025
“Worries About AI Are Usually Complements Not Substitutes” by Zvi
Apr 26, 2025
“AI #113: The o3 Era Begins” by Zvi
Apr 25, 2025
“Token and Taboo” by Guive
Apr 25, 2025
“Reward hacking is becoming more sophisticated and deliberate in frontier LLMs” by Kei
Apr 25, 2025
“The Intelligence Curse: an essay series” by L Rudolf L, lukedrago
Apr 24, 2025
[Linkpost] “Modifying LLM Beliefs with Synthetic Document Finetuning” by RowanWang, Johannes Treutlein, Ethan Perez, Fabien Roger, Sam Marks
Apr 24, 2025
“‘The Era of Experience’ has an unsolved technical alignment problem” by Steven Byrnes
Apr 24, 2025
[Linkpost] “My Favorite Productivity Blog Posts” by Parker Conley
Apr 24, 2025
“OpenAI Alums, Nobel Laureates Urge Regulators to Save Company’s Nonprofit Structure” by garrison
Apr 24, 2025
“o3 Is a Lying Liar” by Zvi
Apr 23, 2025
“Putting up Bumpers” by Sam Bowman
Apr 23, 2025
[Linkpost] “Jaan Tallinn’s 2024 Philanthropy Overview” by jaan
Apr 23, 2025
[Linkpost] “To Understand History, Keep Former Population Distributions In Mind” by Arjun Panickssery
Apr 23, 2025
“The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety” by Katalina Hernandez
Apr 23, 2025
“Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt” by Joel Z. Leibo, Wilcunningham, Seb Krier, Manfred Diaz
Apr 22, 2025
“You Better Mechanize” by Zvi
Apr 22, 2025
“The US Executive vs Supreme Court Deportations Clash” by NunoSempere
Apr 22, 2025
“Accountability Sinks” by Martin Sustrik
Apr 22, 2025
“The Uses of Complacency” by sarahconstantin
Apr 22, 2025
“$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?” by johnswentworth, David Lorell
Apr 21, 2025
“Crime and Punishment #1” by Zvi
Apr 21, 2025
“AI 2027 is a Bet Against Amdahl’s Law” by snewman
Apr 21, 2025
“Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red” by Julian Bradshaw
Apr 21, 2025
“How Close We Are to a Complete List of Imprinted Genes” by Morpheus
Apr 20, 2025
“Impact, agency, and taste” by benkuhn
Apr 20, 2025
“Is Gemini now better than Claude at Pokémon?” by Julian Bradshaw
Apr 20, 2025
“Why Should I Assume CCP AGI is Worse Than USG AGI?” by Tomás B.
Apr 19, 2025
“o3 Will Use Its Tools For You” by Zvi
Apr 19, 2025
“Scaffolding Skills” by Screwtape
Apr 19, 2025
“What Makes an AI Startup ‘Net Positive’ for Safety?” by jacquesthibs
Apr 19, 2025
“The Russell Conjugation Illuminator” by TimmyM
Apr 18, 2025
“Handling schemers if shutdown is not an option” by Buck
Apr 18, 2025
“Training AGI in Secret would be Unsafe and Unethical” by Daniel Kokotajlo
Apr 18, 2025
“Three Months In, Evaluating Three Rationalist Cases for Trump” by Arjun Panickssery
Apr 18, 2025
“D&D.Sci Tax Day: Adventurers and Assessments” by aphyer
Apr 17, 2025
“Can SAE steering reveal sandbagging?” by jordine, Hoang Khiem, Felix Hofstätter
Apr 17, 2025
“ALLFED emergency appeal: Help us raise $800,000 to avoid cutting half of programs” by denkenberger
Apr 17, 2025
“AI-enabled coups: a small group could use AI to seize power” by Tom Davidson, Lukas Finnveden, rosehadshar
Apr 16, 2025
“Ctrl-Z: Controlling AI Agents via Resampling” by abhatt349, Buck, Adam Kaufman, Cody Rushing, Tyler Tracy
Apr 16, 2025
“OpenAI rewrote its Preparedness Framework” by Zach Stein-Perlman
Apr 16, 2025
“OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing” by Zvi
Apr 16, 2025
“ASI existential risk: Reconsidering Alignment as a Goal” by habryka
Apr 16, 2025
“Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI” by Kaj_Sotala
Apr 15, 2025
“A Dissent on Honesty” by eva_
Apr 15, 2025
“Map of AI Safety v2” by Bryce Robertson, Søren Elverlin, Melissa Samworth
Apr 15, 2025
“To be legible, evidence of misalignment probably has to be behavioral” by ryan_greenblatt
Apr 15, 2025
“The Bell Curve of Bad Behavior” by Screwtape
Apr 15, 2025
[Linkpost] “The 4-Minute Mile Effect” by Parker Conley
Apr 15, 2025
[Linkpost] “Sentinel’s Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.” by NunoSempere
Apr 15, 2025
“Try training token-level probes” by StefanHex
Apr 14, 2025
“Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study” by Adam Karvonen
Apr 14, 2025
“Four Types of Disagreement” by silentbob
Apr 14, 2025
“One-shot steering vectors cause emergent misalignment, too” by Jacob Dunefsky
Apr 14, 2025
“Vestigial reasoning in RL” by Caleb Biddulph
Apr 14, 2025
“How I switched careers from software engineer to AI policy operations” by Lucie Philippon
Apr 13, 2025
“College Advice For People Like Me” by henryj
Apr 13, 2025
“Steelmanning heuristic arguments” by Dmitry Vaintrob
Apr 13, 2025
“Why does LW not put much more focus on AI governance and outreach?” by Severin T. Seehrich
Apr 12, 2025
“How training-gamers might function (and win)” by Vivek Hebbar
Apr 12, 2025
“Youth Lockout” by Xavi CF
Apr 12, 2025
“Paper” by dynomight
Apr 12, 2025
“OpenAI Responses API changes models’ behavior” by Jan Betley, James Chua
Apr 11, 2025
“Why do misalignment risks increase as AIs get more capable?” by ryan_greenblatt
Apr 11, 2025
“On Google’s Safety Plan” by Zvi
Apr 11, 2025
“Forecasting time to automated superhuman coders [AI 2027 Timelines Forecast]” by elifland, Nikola Jurkovic
Apr 11, 2025
“Reactions to METR task length paper are insane” by Cole Wyeth
Apr 11, 2025
[Linkpost] “Playing in the Creek” by Hastings
Apr 10, 2025
“Disempowerment spirals as a likely mechanism for existential catastrophe” by Raymond D, owencb
Apr 10, 2025
“The case for AGI by 2030” by Benjamin_Todd
Apr 10, 2025
[Linkpost] “New Paper: Infra-Bayesian Decision-Estimation Theory” by Vanessa Kosoy, Diffractor
Apr 10, 2025
“Thoughts on AI 2027” by Max Harms
Apr 09, 2025
“Austin Chen on Winning, Risk-Taking, and FTX” by Elizabeth
Apr 09, 2025
“Short Timelines don’t Devalue Long Horizon Research” by Vladimir_Nesov
Apr 09, 2025
[Linkpost] “birds and mammals independently evolved intelligence” by bhauth
Apr 09, 2025
“The first AI war will be in your computer” by Viliam
Apr 09, 2025
“Who wants to bet me $25k at 1:7 odds that there won’t be an AI market crash in the next year?” by Remmelt
Apr 08, 2025
“Alignment Faking Revisited: Improved Classifiers and Open Source Extensions” by John Hughes, abhayesian, Akbir Khan, Fabien Roger
Apr 08, 2025
“AI 2027: Responses” by Zvi
Apr 08, 2025
“American College Admissions Doesn’t Need to Be So Competitive” by Arjun Panickssery
Apr 08, 2025
“Most Questionable Details in ‘AI 2027’” by scarcegreengrass
Apr 07, 2025
“AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander” by Zvi
Apr 07, 2025
“How Gay is the Vatican?” by rba
Apr 07, 2025
“The Lizardman and the Black Hat Bobcat” by Screwtape
Apr 06, 2025
“A collection of approaches to confronting doom, and my thoughts on them” by Ruby
Apr 06, 2025
“A Slow Guide to Confronting Doom, v1” by Ruby
Apr 06, 2025
“How much progress actually happens in theoretical physics?” by ChristianKl
Apr 06, 2025
“DeepMind: An Approach to Technical AGI Safety and Security” by Zach Stein-Perlman
Apr 06, 2025
“Among Us: A Sandbox for Agentic Deception” by 7vik, Adrià Garriga-alonso
Apr 05, 2025
“Alignment faking CTFs: Apply to my MATS stream” by joshc
Apr 05, 2025
“Meditation and Reduced Sleep Need” by niplav
Apr 05, 2025