Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Episode | Date |
---|---|
Discussion with Eliezer Yudkowsky on AGI interventions by Rob Bensinger, Eliezer Yudkowsky
|
Dec 10, 2021 |
What failure looks like by Paul Christiano
|
Dec 10, 2021 |
The Parable of Predict-O-Matic by Abram Demski
|
Dec 10, 2021 |
What 2026 looks like by Daniel Kokotajlo
|
Dec 10, 2021 |
Are we in an AI overhang? by Andy Jones
|
Dec 10, 2021 |
DeepMind: Generally capable agents emerge from open-ended play by Daniel Kokotajlo
|
Dec 10, 2021 |
Alignment Research Field Guide by Abram Demski
|
Dec 10, 2021 |
Hiring engineers and researchers to help align GPT-3 by Paul Christiano
|
Dec 10, 2021 |
2018 AI Alignment Literature Review and Charity Comparison by Larks
|
Dec 10, 2021 |
Another (outer) alignment failure story by Paul Christiano
|
Dec 10, 2021 |
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More by Ben Pace
|
Dec 10, 2021 |
Some AI research areas and their relevance to existential safety by Andrew Critch
|
Dec 10, 2021 |
Announcing the Alignment Research Center by Paul Christiano
|
Dec 10, 2021 |
The Rocket Alignment Problem by Eliezer Yudkowsky
|
Dec 10, 2021 |
The case for aligning narrowly superhuman models by Ajeya Cotra
|
Dec 10, 2021 |
Realism about rationality by Richard Ngo
|
Dec 10, 2021 |
Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain by Daniel Kokotajlo
|
Dec 10, 2021 |
Goodhart Taxonomy by Scott Garrabrant
|
Dec 10, 2021 |
The ground of optimization by Alex Flint
|
Dec 10, 2021 |
An overview of 11 proposals for building safe advanced AI by Evan Hubinger
|
Dec 10, 2021 |
Chris Olah’s views on AGI safety by Evan Hubinger
|
Dec 10, 2021 |
Draft report on AI timelines by Ajeya Cotra
|
Dec 10, 2021 |
An Untrollable Mathematician Illustrated by Abram Demski
|
Dec 10, 2021 |
Radical Probabilism by Abram Demski
|
Dec 10, 2021 |
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) by Andrew Critch
|
Dec 10, 2021 |
Utility Maximization = Description Length Minimization by johnswentworth
|
Dec 10, 2021 |
Risks from Learned Optimization: Introduction by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
|
Dec 10, 2021 |
Matt Botvinick on the spontaneous emergence of learning algorithms by Adam Scholl
|
Dec 10, 2021 |
the scaling "inconsistency": openAI’s new insight by nostalgebraist
|
Dec 10, 2021 |
Introduction to Cartesian Frames by Scott Garrabrant
|
Dec 10, 2021 |
My research methodology by Paul Christiano
|
Dec 10, 2021 |
Fun with +12 OOMs of Compute by Daniel Kokotajlo
|
Dec 10, 2021 |
Seeking Power is Often Convergently Instrumental in MDPs by Paul Christiano
|
Dec 10, 2021 |
The Solomonoff Prior is Malign by Mark Xu
|
Dec 10, 2021 |
2020 AI Alignment Literature Review and Charity Comparison by Larks
|
Dec 10, 2021 |
Inner Alignment: Explain like I'm 12 Edition by Rafael Harth
|
Dec 10, 2021 |
Evolution of Modularity by johnswentworth
|
Dec 10, 2021 |
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models" by Rob Bensinger
|
Dec 10, 2021 |
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised by gwern
|
Dec 10, 2021 |
Understanding “Deep Double Descent” by Evan Hubinger
|
Dec 10, 2021 |
Can you control the past? by Joe Carlsmith
|
Dec 10, 2021 |
Developmental Stages of GPTs by orthonormal
|
Dec 10, 2021 |
My computational framework for the brain by Steve Byrnes
|
Dec 10, 2021 |
Redwood Research’s current project by Buck Shlegeris
|
Dec 10, 2021 |
2019 AI Alignment Literature Review and Charity Comparison by Larks
|
Dec 10, 2021 |
Testing The Natural Abstraction Hypothesis: Project Intro by johnswentworth
|
Dec 10, 2021 |
The theory-practice gap by Buck Shlegeris by Buck Shlegeris
|
Dec 10, 2021 |
Selection vs Control by Abram Demski
|
Dec 10, 2021 |
Why Subagents? by johnswentworth
|
Dec 10, 2021 |
Possible takeaways from the coronavirus pandemic for slow AI takeoff by Vika
|
Dec 10, 2021 |
Embedded Agency (full-text version) by Scott Garrabrant, Abram Demski
|
Dec 10, 2021 |
Cortés, Pizarro, and Afonso as Precedents for Takeover by Daniel Kokotajlo
|
Dec 10, 2021 |
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers by lifelonglearner, Peter Hase
|
Dec 10, 2021 |
Disentangling arguments for the importance of AI safety by Richard Ngo
|
Dec 10, 2021 |
A Semitechnical Introductory Dialogue on Solomonoff Induction by Eliezer Yudkowsky
|
Dec 10, 2021 |
Thoughts on Human Models by Ramana Kumar, Scott Garrabrant
|
Dec 06, 2021 |
AI Alignment 2018-19 Review by Rohin Shah
|
Dec 06, 2021 |
Paul's research agenda FAQ by Alex Zhu
|
Dec 06, 2021 |
Forecasting Thread: AI TimelinesQ by Amanda Ngo, Daniel Kokotajlo, Ben Pace
|
Dec 06, 2021 |
An Orthodox Case Against Utility Functions by Abram Demski
|
Dec 06, 2021 |
Saving Time by Scott Garrabrant
|
Dec 06, 2021 |
Beyond Astronomical Waste by Wei Dai
|
Dec 06, 2021 |
interpreting GPT: the logit lens by nostalgebraist
|
Dec 06, 2021 |
Full-time AGI Safety! by Steve Byrnes
|
Dec 06, 2021 |
AMA: Paul Christiano, alignment researcher by Paul Christiano
|
Dec 06, 2021 |
Inner Alignment in Salt-Starved Rats by Steve Byrnes
|
Dec 06, 2021 |
Against GDP as a metric for timelines and takeoff speeds by Daniel Kokotajlo
|
Dec 06, 2021 |
Soft takeoff can still lead to decisive strategic advantage by Daniel Kokotajlo
|
Dec 06, 2021 |
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda by Chi Nguyen
|
Dec 06, 2021 |
An Intuitive Guide to Garrabrant Induction by Mark Xu
|
Dec 06, 2021 |
Prisoners' Dilemma with Costs to Modeling by Scott Garrabrant
|
Dec 06, 2021 |
How much chess engine progress is about adapting to bigger computers? by Paul Christiano
|
Dec 06, 2021 |
Debate update: Obfuscated arguments problem by Beth Barnes
|
Dec 06, 2021 |
The Fusion Power Generator Scenario by johnswentworth
|
Dec 05, 2021 |
The Alignment Problem: Machine Learning and Human ValuesRobust Delegation by Rohin Shah
|
Dec 05, 2021 |
The Commitment Races problem by Daniel Kokotajlo
|
Dec 05, 2021 |
What I’ll be doing at MIRI by Evan Hubinger
|
Dec 05, 2021 |
Problem relaxation as a tactic by Alex Turner
|
Dec 05, 2021 |
Zero Sum is a misnomer by Abram Demski
|
Dec 05, 2021 |
Challenges to Christiano’s capability amplification proposal by Eliezer Yudkowsky
|
Dec 05, 2021 |
AI Safety Success Stories by Wei Dai
|
Dec 05, 2021 |
My current framework for thinking about AGI timelines by Alex Zhu
|
Dec 05, 2021 |
Book review: "A Thousand Brains" by Jeff Hawkins, Steve Byrnes
|
Dec 05, 2021 |
The date of AI Takeover is not the day the AI takes over by Daniel Kokotajlo
|
Dec 05, 2021 |
How do we prepare for final crunch time?Q by Eli Tyre
|
Dec 05, 2021 |
Alignment By Default by johnswentworth
|
Dec 05, 2021 |
Fixing The Good Regulator Theorem by johnswentworth
|
Dec 05, 2021 |
Call for research on evaluating alignment (funding + advice available) by Beth Barnes
|
Dec 05, 2021 |
Can you get AGI from a Transformer? by Steve Byrnes
|
Dec 05, 2021 |
Measuring hardware overhang by hippke
|
Dec 05, 2021 |
Welcome & FAQ! by Ruben Bloom, Oliver Habryka
|
Dec 05, 2021 |
Robustness to Scale by Scott Garrabrant
|
Dec 05, 2021 |
What can the principal-agent literature tell us about AI risk?
|
Dec 05, 2021 |
Our take on CHAI’s research agenda in under 1500 words by Alex Flint
|
Dec 05, 2021 |
Alignment As A Bottleneck To Usefulness Of GPT-3 by johnswentworth
|
Dec 05, 2021 |
AGI safety from first principles: Introduction by Richard Ngo
|
Dec 05, 2021 |
Less Realistic Tales of Doom by Mark Xu
|
Dec 05, 2021 |
AI and Compute trend isn't predictive of what is happening by alexlyzhov
|
Dec 05, 2021 |
Towards a New Impact Measure by Alex Turner
|
Dec 05, 2021 |
Utility ≠ Reward by Vladimir Mikulik
|
Dec 05, 2021 |
Knowledge Neurons in Pretrained Transformers by Evan Hubinger
|
Dec 05, 2021 |
Comprehensive AI Services as General Intelligence by Rohin Shah
|
Dec 05, 2021 |
List of resolved confusions about IDA by Wei Dai
|
Dec 05, 2021 |
Announcement: AI alignment prize round 3 winners and next round by Vladimir Slepnev
|
Dec 05, 2021 |
Frequent arguments about alignment by John Schulman
|
Dec 05, 2021 |
Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22] by Oliver Habryka, Buck Shlegeris
|
Dec 05, 2021 |
Alignment Newsletter One Year Retrospective by Rohin Shah
|
Dec 05, 2021 |
Formal Inner Alignment, Prospectus by Abram Demski
|
Dec 05, 2021 |
Writeup: Progress on AI Safety via Debate by Beth Barnes, Paul Christiano
|
Dec 05, 2021 |
Clarifying inner alignment terminology by Evan Hubinger
|
Dec 05, 2021 |
A Critique of Functional Decision Theory by wdmacaskill
|
Dec 05, 2021 |
Experimentally evaluating whether honesty generalizes by Paul Christiano
|
Dec 05, 2021 |
History of the Development of Logical Induction by Scott Garrabrant
|
Dec 05, 2021 |
Optimization Amplifies by Scott Garrabrant
|
Dec 05, 2021 |
Introducing the AI Alignment Forum (FAQ) by Oliver Habryka, Ben Pace, Raymond Arnold, Jim Babcock
|
Dec 05, 2021 |
Ought: why it matters and ways to help by Paul Christiano
|
Dec 05, 2021 |
Coherence arguments imply a force for goal-directed behavior by KatjaGrace
|
Dec 05, 2021 |
Request for proposals for projects in AI alignment that work with deep learning systems by abergal, Nick_Beckstead
|
Dec 05, 2021 |
A very crude deception eval is already passed by Beth Barnes
|
Dec 05, 2021 |
Comments on Carlsmith's “Is power-seeking AI an existential risk?” by Nate Soares
|
Dec 05, 2021 |
Counterfactual Mugging Poker Game by Scott Garrabrant
|
Dec 05, 2021 |
Embedded Curiosities by Scott Garrabrant, Abram Demski
|
Dec 05, 2021 |
Collection of GPT-3 results by Kaj Sotala
|
Dec 05, 2021 |
The alignment problem in different capability regimes by Buck Shlegeris
|
Dec 05, 2021 |
Thinking About Filtered Evidence Is (Very!) Hard by Abram Demski
|
Dec 04, 2021 |
Demons in Imperfect Search
|
Dec 04, 2021 |
Homogeneity vs. heterogeneity in AI takeoff scenarios by Evan Hubinger
|
Dec 04, 2021 |
Homogeneity vs. heterogeneity in AI takeoff scenarios by Evan Hubinger
|
Dec 04, 2021 |
Extrapolating GPT-N performance
|
Dec 04, 2021 |
Agency in Conway’s Game of Life by Alex Flint
|
Dec 04, 2021 |
Coherence arguments do not imply goal-directed behavior
|
Dec 04, 2021 |
Search versus design by Alex Flint
|
Dec 04, 2021 |
Clarifying “What failure looks like” by Sam Clarke
|
Dec 04, 2021 |
Reward Is Not Enough by Steve Byrnes
|
Dec 04, 2021 |
Toward a New Technical Explanation of Technical Explanation by Abram Demski
|
Dec 04, 2021 |
Gradient hacking by Evan Hubinger
|
Dec 04, 2021 |
Inaccessible information by Paul Christiano
|
Dec 04, 2021 |
Zoom In: An Introduction to Circuits by Evan Hubinger
|
Dec 04, 2021 |
Plausible cases for HRAD work, and locating the crux in the realism about rationality debate by Issa Rice
|
Dec 04, 2021 |
Introduction To The Infra-Bayesianism Sequence by Diffractor
|
Dec 04, 2021 |
Jitters No Evidence of Stupidity in RL
|
Dec 04, 2021 |
Testing The Natural Abstraction Hypothesis: Project Update by johnswentworth
|
Dec 04, 2021 |
Sources of intuitions and data on AGI by Scott Garrabrant
|
Dec 04, 2021 |
What counts as defection? by Alex Turner
|
Dec 04, 2021 |
Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns by Andreas Stuhlmüller
|
Dec 04, 2021 |
Modelling Transformative AI Risks (MTAIR) Project: Introduction by David Manheim, Aryeh Englander
|
Dec 04, 2021 |
How DeepMind's Generally Capable Agents Were Trained by 1a3orn
|
Dec 04, 2021 |
Open question: are minimal circuits daemon-free? by Paul Christiano
|
Dec 04, 2021 |
In Logical Time, All Games are Iterated Games by Abram Demski
|
Dec 04, 2021 |
Gradations of Inner Alignment Obstacles by Abram Demski
|
Dec 04, 2021 |
Truthful AI: Developing and governing AI that does not lie by Owain Evans, Owen Cotton-Barratt, Lukas Finnveden
|
Dec 04, 2021 |
Learning the prior by Paul Christiano
|
Dec 04, 2021 |
Thoughts on the Alignment Implications of Scaling Language Models by leogao
|
Dec 04, 2021 |
Bayesian Probability is for things that are Space-like Separated from You by Scott Garrabrant
|
Dec 04, 2021 |
The Inner Alignment Problem by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
|
Dec 04, 2021 |
Clarifying some key hypotheses in AI alignment by Ben Cottier, Rohin Shah
|
Dec 04, 2021 |
Reflections on Larks’ 2020 AI alignment literature review by Alex Flint
|
Dec 04, 2021 |
Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress by Mark Xu
|
Dec 04, 2021 |
Humans Are Embedded Agents Too by johnswentworth
|
Dec 04, 2021 |
Reply to Paul Christiano on Inaccessible Information by Alex Flint
|
Dec 04, 2021 |
The Main Sources of AI Risk? by Daniel Kokotajlo, Wei Dai
|
Dec 04, 2021 |
And the AI would have got away with it too, if... by Stuart Armstrong
|
Dec 04, 2021 |
Inner alignment in the brain by Steve Byrnes
|
Dec 04, 2021 |
Review of Soft Takeoff Can Still Lead to DSA by Daniel Kokotajlo
|
Dec 04, 2021 |
Two Neglected Problems in Human-AI Safety by Wei Dai
|
Dec 04, 2021 |
Announcement: AI alignment prize round 4 winners by Vladimir Slepnev
|
Dec 04, 2021 |
The Credit Assignment Problem by Abram Demski
|
Dec 04, 2021 |
Recent Progress in the Theory of Neural Networks by interstice
|
Dec 04, 2021 |
Tessellating Hills: a toy model for demons in imperfect search by DaemonicSigil
|
Dec 04, 2021 |
What Failure Looks Like: Distilling the Discussion by Ben Pace
|
Dec 04, 2021 |
Commentary on AGI Safety from First Principles by Richard Ngo
|
Dec 04, 2021 |
Imitative Generalisation (AKA 'Learning the Prior') by Beth Barnes
|
Dec 04, 2021 |
AI Safety Papers: An App for the TAI Safety Database by Ozzie Gooen
|
Dec 04, 2021 |
On Solving Problems Before They Appear: The Weird Epistemologies of Alignment by Adam Shimi
|
Dec 04, 2021 |
Comments on OpenPhil's Interpretability RFP by Paul Christiano
|
Dec 04, 2021 |
Why I'm excited about Redwood Research's current project by Paul Christiano
|
Dec 04, 2021 |
Worrying about the Vase: Whitelisting by Alex Turner
|
Dec 04, 2021 |
Troll Bridge by Abram Demski
|
Dec 04, 2021 |
Misconceptions about continuous takeoff by Matthew Barnett
|
Dec 04, 2021 |
Updates and additions to Embedded Agency by Rob Bensinger, Abram Demski
|
Dec 04, 2021 |
Selection Theorems: A Program For Understanding Agents by johnswentworth
|
Dec 04, 2021 |
My take on Vanessa Kosoy's take on AGI safety by Steve Byrnes
|
Dec 04, 2021 |
How special are human brains among animal brains? by Alex Zhu
|
Dec 04, 2021 |
How uniform is the neocortex? by Alex Zhu
|
Dec 04, 2021 |
How "honest" is GPT-3?Q by Abram Demski
|
Dec 04, 2021 |
Bottle Caps Aren't Optimisers by DanielFilan
|
Dec 04, 2021 |
Deceptive Alignment by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
|
Dec 04, 2021 |
Problems in AI Alignment that philosophers could potentially contribute to by Wei Dai
|
Dec 04, 2021 |
Public Static: What is Abstraction? by johnswentworth
|
Dec 03, 2021 |
Updating the Lottery Ticket Hypothesis by johnswentworth
|
Dec 03, 2021 |
Siren worlds and the perils of over-optimised search by Stuart Armstrong
|
Dec 03, 2021 |
Arguments about fast takeoff by Paul Christiano
|
Dec 03, 2021 |
Alignment Newsletter #13: 07/02/18 by Rohin Shah
|
Dec 03, 2021 |
Risks from Learned Optimization: Conclusion and Related Work by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
|
Dec 03, 2021 |
2-D Robustness by Vladimir Mikulik
|
Dec 03, 2021 |
Learning Normativity: A Research Agenda by Abram Demski
|
Dec 03, 2021 |
To what extent is GPT-3 capable of reasoning? by Alex Turner
|
Dec 03, 2021 |
Environmental Structure Can Cause Instrumental Convergence by Alex Turner
|
Dec 03, 2021 |
[Book Review] "The Alignment Problem" by Brian Christian,Lsusr
|
Dec 03, 2021 |
Comment on decision theory by Rob Bensinger
|
Dec 03, 2021 |
The strategy-stealing assumption by Paul Christiano
|
Dec 03, 2021 |
A simple environment for showing mesa misalignment by Matthew Barnett
|
Dec 03, 2021 |
The two-layer model of human values, and problems with synthesizing preferences by Kaj Sotala
|
Dec 03, 2021 |
The Goldbach conjecture is probably correct; so was Fermat's last theorem by Stuart Armstrong
|
Dec 03, 2021 |
Eight claims about multi-agent AGI safety by Richard Ngo
|
Dec 03, 2021 |
Why I'm excited about Debate by Richard Ngo
|
Dec 03, 2021 |
Rogue AGI Embodies Valuable Intellectual Property by Mark Xu, CarlShulman
|
Dec 03, 2021 |
Information At A Distance Is Mediated By Deterministic Constraints by johnswentworth
|
Dec 03, 2021 |
Topological Fixed Point Exercises by Scott Garrabrant, Sam Eisenstat
|
Dec 03, 2021 |
A Gym Gridworld Environment for the Treacherous Turn by Michaël Trazzi
|
Dec 03, 2021 |
How does Gradient Descent Interact with Goodhart?Q by Scott Garrabrant
|
Dec 03, 2021 |
Markets are Universal for Logical Induction by johnswentworth.
|
Dec 03, 2021 |
TAI Safety Bibliographic Database by C. Jess Riedel.
|
Dec 03, 2021 |
Announcing AlignmentForum.org Beta by Raymond Arnold.
|
Dec 03, 2021 |
Classifying specification problems as variants of Goodhart's Law by Vika.
|
Dec 03, 2021 |
Does SGD Produce Deceptive Alignment? by Mark Xu.
|
Dec 03, 2021 |
Low-stakes alignment by Paul Christiano.
|
Dec 03, 2021 |
Preface to the sequence on value learning by Rohin Shah.
|
Dec 03, 2021 |
AGI safety from first principles: Superintelligence by Richard Ngo.
|
Dec 03, 2021 |
FAQ: Advice for AI Alignment Researchers by Rohin Shah.
|
Dec 03, 2021 |
AI takeoff story: a continuation of progress by other means by Edouard Harris.
|
Dec 03, 2021 |
Optimization Concepts in the Game of Life by Vika, Ramana Kumar
|
Dec 03, 2021 |
A general model of safety-oriented AI development by Wei Dai
|
Dec 03, 2021 |
Why we need a theory of human values by Stuart Armstrong
|
Dec 03, 2021 |
But exactly how complex and fragile? by KatjaGrace
|
Dec 03, 2021 |
Continuing the takeoffs debate by Richard Ngo
|
Dec 03, 2021 |
Progress on Causal Influence Diagrams by Tom Everitt
|
Dec 03, 2021 |
When does rationality-as-search have nontrivial implications? by nostalgebraist
|
Dec 03, 2021 |
Three AI Safety Related Ideas by Wei Dai
|
Dec 03, 2021 |
Comments on CAIS by Richard Ngo
|
Dec 03, 2021 |
Pavlov Generalizes by Abram Demski
|
Dec 03, 2021 |
Preparing for "The Talk" with AI projects by Daniel Kokotajlo
|
Dec 03, 2021 |
Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda by elriggs, Gurkenglas
|
Dec 03, 2021 |
Announcing the Vitalik Buterin Fellowships in AI Existential Safety! by DanielFilan
|
Dec 03, 2021 |
Three ways that "Sufficiently optimized agents appear coherent" can be false by Wei Dai
|
Dec 03, 2021 |
Research Agenda v0.9: Synthesising a human's preferences into a utility function by Stuart Armstrong
|
Dec 03, 2021 |
Conditions for Mesa-Optimization by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
|
Dec 03, 2021 |
Six AI Risk/Strategy Ideas by Wei Dai
|
Nov 30, 2021 |
Better priors as a safety problem by Paul Christiano
|
Nov 30, 2021 |
Non-Obstruction: A Simple Concept Motivating Corrigibility by Alex Turner
|
Nov 30, 2021 |
Three reasons to expect long AI timelines by Matthew Barnett
|
Nov 30, 2021 |
Decoupling deliberation from competition by Paul Christiano
|
Nov 30, 2021 |