The Nonlinear Library: Alignment Forum Top Posts

By The Nonlinear Fund

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.


Category: Education

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast

Subscribers: 0
Reviews: 0
Episodes: 242

Description

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.

Episode Date
Discussion with Eliezer Yudkowsky on AGI interventions by Rob Bensinger, Eliezer Yudkowsky
Dec 10, 2021
What failure looks like by Paul Christiano
Dec 10, 2021
The Parable of Predict-O-Matic by Abram Demski
Dec 10, 2021
What 2026 looks like by Daniel Kokotajlo
Dec 10, 2021
Are we in an AI overhang? by Andy Jones
Dec 10, 2021
DeepMind: Generally capable agents emerge from open-ended play by Daniel Kokotajlo
Dec 10, 2021
Alignment Research Field Guide by Abram Demski
Dec 10, 2021
Hiring engineers and researchers to help align GPT-3 by Paul Christiano
Dec 10, 2021
2018 AI Alignment Literature Review and Charity Comparison by Larks
Dec 10, 2021
Another (outer) alignment failure story by Paul Christiano
Dec 10, 2021
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More by Ben Pace
Dec 10, 2021
Some AI research areas and their relevance to existential safety by Andrew Critch
Dec 10, 2021
Announcing the Alignment Research Center by Paul Christiano
Dec 10, 2021
The Rocket Alignment Problem by Eliezer Yudkowsky
Dec 10, 2021
The case for aligning narrowly superhuman models by Ajeya Cotra
Dec 10, 2021
Realism about rationality by Richard Ngo
Dec 10, 2021
Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain by Daniel Kokotajlo
Dec 10, 2021
Goodhart Taxonomy by Scott Garrabrant
Dec 10, 2021
The ground of optimization by Alex Flint
Dec 10, 2021
An overview of 11 proposals for building safe advanced AI by Evan Hubinger
Dec 10, 2021
Chris Olah’s views on AGI safety by Evan Hubinger
Dec 10, 2021
Draft report on AI timelines by Ajeya Cotra
Dec 10, 2021
An Untrollable Mathematician Illustrated by Abram Demski
Dec 10, 2021
Radical Probabilism by Abram Demski
Dec 10, 2021
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) by Andrew Critch
Dec 10, 2021
Utility Maximization = Description Length Minimization by johnswentworth
Dec 10, 2021
Risks from Learned Optimization: Introduction by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
Dec 10, 2021
Matt Botvinick on the spontaneous emergence of learning algorithms by Adam Scholl
Dec 10, 2021
the scaling "inconsistency": openAI’s new insight by nostalgebraist
Dec 10, 2021
Introduction to Cartesian Frames by Scott Garrabrant
Dec 10, 2021
My research methodology by Paul Christiano
Dec 10, 2021
Fun with +12 OOMs of Compute by Daniel Kokotajlo
Dec 10, 2021
Seeking Power is Often Convergently Instrumental in MDPs by Paul Christiano
Dec 10, 2021
The Solomonoff Prior is Malign by Mark Xu
Dec 10, 2021
2020 AI Alignment Literature Review and Charity Comparison by Larks
Dec 10, 2021
Inner Alignment: Explain like I'm 12 Edition by Rafael Harth
Dec 10, 2021
Evolution of Modularity by johnswentworth
Dec 10, 2021
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models" by Rob Bensinger
Dec 10, 2021
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised by gwern
Dec 10, 2021
Understanding “Deep Double Descent” by Evan Hubinger
Dec 10, 2021
Can you control the past? by Joe Carlsmith
Dec 10, 2021
Developmental Stages of GPTs by orthonormal
Dec 10, 2021
My computational framework for the brain by Steve Byrnes
Dec 10, 2021
Redwood Research’s current project by Buck Shlegeris
Dec 10, 2021
2019 AI Alignment Literature Review and Charity Comparison by Larks
Dec 10, 2021
Testing The Natural Abstraction Hypothesis: Project Intro by johnswentworth
Dec 10, 2021
The theory-practice gap by Buck Shlegeris by Buck Shlegeris
Dec 10, 2021
Selection vs Control by Abram Demski
Dec 10, 2021
Why Subagents? by johnswentworth
Dec 10, 2021
Possible takeaways from the coronavirus pandemic for slow AI takeoff by Vika
Dec 10, 2021
Embedded Agency (full-text version) by Scott Garrabrant, Abram Demski
Dec 10, 2021
Cortés, Pizarro, and Afonso as Precedents for Takeover by Daniel Kokotajlo
Dec 10, 2021
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers by lifelonglearner, Peter Hase
Dec 10, 2021
Disentangling arguments for the importance of AI safety by Richard Ngo
Dec 10, 2021
A Semitechnical Introductory Dialogue on Solomonoff Induction by Eliezer Yudkowsky
Dec 10, 2021
Thoughts on Human Models by Ramana Kumar, Scott Garrabrant
Dec 06, 2021
AI Alignment 2018-19 Review by Rohin Shah
Dec 06, 2021
Paul's research agenda FAQ by Alex Zhu
Dec 06, 2021
Forecasting Thread: AI TimelinesQ by Amanda Ngo, Daniel Kokotajlo, Ben Pace
Dec 06, 2021
An Orthodox Case Against Utility Functions by Abram Demski
Dec 06, 2021
Saving Time by Scott Garrabrant
Dec 06, 2021
Beyond Astronomical Waste by Wei Dai
Dec 06, 2021
interpreting GPT: the logit lens by nostalgebraist
Dec 06, 2021
Full-time AGI Safety! by Steve Byrnes
Dec 06, 2021
AMA: Paul Christiano, alignment researcher by Paul Christiano
Dec 06, 2021
Inner Alignment in Salt-Starved Rats by Steve Byrnes
Dec 06, 2021
Against GDP as a metric for timelines and takeoff speeds by Daniel Kokotajlo
Dec 06, 2021
Soft takeoff can still lead to decisive strategic advantage by Daniel Kokotajlo
Dec 06, 2021
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda by Chi Nguyen
Dec 06, 2021
An Intuitive Guide to Garrabrant Induction by Mark Xu
Dec 06, 2021
Prisoners' Dilemma with Costs to Modeling by Scott Garrabrant
Dec 06, 2021
How much chess engine progress is about adapting to bigger computers? by Paul Christiano
Dec 06, 2021
Debate update: Obfuscated arguments problem by Beth Barnes
Dec 06, 2021
The Fusion Power Generator Scenario by johnswentworth
Dec 05, 2021
The Alignment Problem: Machine Learning and Human ValuesRobust Delegation by Rohin Shah
Dec 05, 2021
The Commitment Races problem by Daniel Kokotajlo
Dec 05, 2021
What I’ll be doing at MIRI by Evan Hubinger
Dec 05, 2021
Problem relaxation as a tactic by Alex Turner
Dec 05, 2021
Zero Sum is a misnomer by Abram Demski
Dec 05, 2021
Challenges to Christiano’s capability amplification proposal by Eliezer Yudkowsky
Dec 05, 2021
AI Safety Success Stories by Wei Dai
Dec 05, 2021
My current framework for thinking about AGI timelines by Alex Zhu
Dec 05, 2021
Book review: "A Thousand Brains" by Jeff Hawkins, Steve Byrnes
Dec 05, 2021
The date of AI Takeover is not the day the AI takes over by Daniel Kokotajlo
Dec 05, 2021
How do we prepare for final crunch time?Q by Eli Tyre
Dec 05, 2021
Alignment By Default by johnswentworth
Dec 05, 2021
Fixing The Good Regulator Theorem by johnswentworth
Dec 05, 2021
Call for research on evaluating alignment (funding + advice available) by Beth Barnes
Dec 05, 2021
Can you get AGI from a Transformer? by Steve Byrnes
Dec 05, 2021
Measuring hardware overhang by hippke
Dec 05, 2021
Welcome & FAQ! by Ruben Bloom, Oliver Habryka
Dec 05, 2021
Robustness to Scale by Scott Garrabrant
Dec 05, 2021
What can the principal-agent literature tell us about AI risk?
Dec 05, 2021
Our take on CHAI’s research agenda in under 1500 words by Alex Flint
Dec 05, 2021
Alignment As A Bottleneck To Usefulness Of GPT-3 by johnswentworth
Dec 05, 2021
AGI safety from first principles: Introduction by Richard Ngo
Dec 05, 2021
Less Realistic Tales of Doom by Mark Xu
Dec 05, 2021
AI and Compute trend isn't predictive of what is happening by alexlyzhov
Dec 05, 2021
Towards a New Impact Measure by Alex Turner
Dec 05, 2021
Utility ≠ Reward by Vladimir Mikulik
Dec 05, 2021
Knowledge Neurons in Pretrained Transformers by Evan Hubinger
Dec 05, 2021
Comprehensive AI Services as General Intelligence by Rohin Shah
Dec 05, 2021
List of resolved confusions about IDA by Wei Dai
Dec 05, 2021
Announcement: AI alignment prize round 3 winners and next round by Vladimir Slepnev
Dec 05, 2021
Frequent arguments about alignment by John Schulman
Dec 05, 2021
Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22] by Oliver Habryka, Buck Shlegeris
Dec 05, 2021
Alignment Newsletter One Year Retrospective by Rohin Shah
Dec 05, 2021
Formal Inner Alignment, Prospectus by Abram Demski
Dec 05, 2021
Writeup: Progress on AI Safety via Debate by Beth Barnes, Paul Christiano
Dec 05, 2021
Clarifying inner alignment terminology by Evan Hubinger
Dec 05, 2021
A Critique of Functional Decision Theory by wdmacaskill
Dec 05, 2021
Experimentally evaluating whether honesty generalizes by Paul Christiano
Dec 05, 2021
History of the Development of Logical Induction by Scott Garrabrant
Dec 05, 2021
Optimization Amplifies by Scott Garrabrant
Dec 05, 2021
Introducing the AI Alignment Forum (FAQ) by Oliver Habryka, Ben Pace, Raymond Arnold, Jim Babcock
Dec 05, 2021
Ought: why it matters and ways to help by Paul Christiano
Dec 05, 2021
Coherence arguments imply a force for goal-directed behavior by KatjaGrace
Dec 05, 2021
Request for proposals for projects in AI alignment that work with deep learning systems by abergal, Nick_Beckstead
Dec 05, 2021
A very crude deception eval is already passed by Beth Barnes
Dec 05, 2021
Comments on Carlsmith's “Is power-seeking AI an existential risk?” by Nate Soares
Dec 05, 2021
Counterfactual Mugging Poker Game by Scott Garrabrant
Dec 05, 2021
Embedded Curiosities by Scott Garrabrant, Abram Demski
Dec 05, 2021
Collection of GPT-3 results by Kaj Sotala
Dec 05, 2021
The alignment problem in different capability regimes by Buck Shlegeris
Dec 05, 2021
Thinking About Filtered Evidence Is (Very!) Hard by Abram Demski
Dec 04, 2021
Demons in Imperfect Search
Dec 04, 2021
Homogeneity vs. heterogeneity in AI takeoff scenarios by Evan Hubinger
Dec 04, 2021
Homogeneity vs. heterogeneity in AI takeoff scenarios by Evan Hubinger
Dec 04, 2021
Extrapolating GPT-N performance
Dec 04, 2021
Agency in Conway’s Game of Life by Alex Flint
Dec 04, 2021
Coherence arguments do not imply goal-directed behavior
Dec 04, 2021
Search versus design by Alex Flint
Dec 04, 2021
Clarifying “What failure looks like” by Sam Clarke
Dec 04, 2021
Reward Is Not Enough by Steve Byrnes
Dec 04, 2021
Toward a New Technical Explanation of Technical Explanation by Abram Demski
Dec 04, 2021
Gradient hacking by Evan Hubinger
Dec 04, 2021
Inaccessible information by Paul Christiano
Dec 04, 2021
Zoom In: An Introduction to Circuits by Evan Hubinger
Dec 04, 2021
Plausible cases for HRAD work, and locating the crux in the realism about rationality debate by Issa Rice
Dec 04, 2021
Introduction To The Infra-Bayesianism Sequence by Diffractor
Dec 04, 2021
Jitters No Evidence of Stupidity in RL
Dec 04, 2021
Testing The Natural Abstraction Hypothesis: Project Update by johnswentworth
Dec 04, 2021
Sources of intuitions and data on AGI by Scott Garrabrant
Dec 04, 2021
What counts as defection? by Alex Turner
Dec 04, 2021
Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns by Andreas Stuhlmüller
Dec 04, 2021
Modelling Transformative AI Risks (MTAIR) Project: Introduction by David Manheim, Aryeh Englander
Dec 04, 2021
How DeepMind's Generally Capable Agents Were Trained by 1a3orn
Dec 04, 2021
Open question: are minimal circuits daemon-free? by Paul Christiano
Dec 04, 2021
In Logical Time, All Games are Iterated Games by Abram Demski
Dec 04, 2021
Gradations of Inner Alignment Obstacles by Abram Demski
Dec 04, 2021
Truthful AI: Developing and governing AI that does not lie by Owain Evans, Owen Cotton-Barratt, Lukas Finnveden
Dec 04, 2021
Learning the prior by Paul Christiano
Dec 04, 2021
Thoughts on the Alignment Implications of Scaling Language Models by leogao
Dec 04, 2021
Bayesian Probability is for things that are Space-like Separated from You by Scott Garrabrant
Dec 04, 2021
The Inner Alignment Problem by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
Dec 04, 2021
Clarifying some key hypotheses in AI alignment by Ben Cottier, Rohin Shah
Dec 04, 2021
Reflections on Larks’ 2020 AI alignment literature review by Alex Flint
Dec 04, 2021
Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress by Mark Xu
Dec 04, 2021
Humans Are Embedded Agents Too by johnswentworth
Dec 04, 2021
Reply to Paul Christiano on Inaccessible Information by Alex Flint
Dec 04, 2021
The Main Sources of AI Risk? by Daniel Kokotajlo, Wei Dai
Dec 04, 2021
And the AI would have got away with it too, if... by Stuart Armstrong
Dec 04, 2021
Inner alignment in the brain by Steve Byrnes
Dec 04, 2021
Review of Soft Takeoff Can Still Lead to DSA by Daniel Kokotajlo
Dec 04, 2021
Two Neglected Problems in Human-AI Safety by Wei Dai
Dec 04, 2021
Announcement: AI alignment prize round 4 winners by Vladimir Slepnev
Dec 04, 2021
The Credit Assignment Problem by Abram Demski
Dec 04, 2021
Recent Progress in the Theory of Neural Networks by interstice
Dec 04, 2021
Tessellating Hills: a toy model for demons in imperfect search by DaemonicSigil
Dec 04, 2021
What Failure Looks Like: Distilling the Discussion by Ben Pace
Dec 04, 2021
Commentary on AGI Safety from First Principles by Richard Ngo
Dec 04, 2021
Imitative Generalisation (AKA 'Learning the Prior') by Beth Barnes
Dec 04, 2021
AI Safety Papers: An App for the TAI Safety Database by Ozzie Gooen
Dec 04, 2021
On Solving Problems Before They Appear: The Weird Epistemologies of Alignment by Adam Shimi
Dec 04, 2021
Comments on OpenPhil's Interpretability RFP by Paul Christiano
Dec 04, 2021
Why I'm excited about Redwood Research's current project by Paul Christiano
Dec 04, 2021
Worrying about the Vase: Whitelisting by Alex Turner
Dec 04, 2021
Troll Bridge by Abram Demski
Dec 04, 2021
Misconceptions about continuous takeoff by Matthew Barnett
Dec 04, 2021
Updates and additions to Embedded Agency by Rob Bensinger, Abram Demski
Dec 04, 2021
Selection Theorems: A Program For Understanding Agents by johnswentworth
Dec 04, 2021
My take on Vanessa Kosoy's take on AGI safety by Steve Byrnes
Dec 04, 2021
How special are human brains among animal brains? by Alex Zhu
Dec 04, 2021
How uniform is the neocortex? by Alex Zhu
Dec 04, 2021
How "honest" is GPT-3?Q by Abram Demski
Dec 04, 2021
Bottle Caps Aren't Optimisers by DanielFilan
Dec 04, 2021
Deceptive Alignment by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
Dec 04, 2021
Problems in AI Alignment that philosophers could potentially contribute to by Wei Dai
Dec 04, 2021
Public Static: What is Abstraction? by johnswentworth
Dec 03, 2021
Updating the Lottery Ticket Hypothesis by johnswentworth
Dec 03, 2021
Siren worlds and the perils of over-optimised search by Stuart Armstrong
Dec 03, 2021
Arguments about fast takeoff by Paul Christiano
Dec 03, 2021
Alignment Newsletter #13: 07/02/18 by Rohin Shah
Dec 03, 2021
Risks from Learned Optimization: Conclusion and Related Work by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
Dec 03, 2021
2-D Robustness by Vladimir Mikulik
Dec 03, 2021
Learning Normativity: A Research Agenda by Abram Demski
Dec 03, 2021
To what extent is GPT-3 capable of reasoning? by Alex Turner
Dec 03, 2021
Environmental Structure Can Cause Instrumental Convergence by Alex Turner
Dec 03, 2021
[Book Review] "The Alignment Problem" by Brian Christian,Lsusr
Dec 03, 2021
Comment on decision theory by Rob Bensinger
Dec 03, 2021
The strategy-stealing assumption by Paul Christiano
Dec 03, 2021
A simple environment for showing mesa misalignment by Matthew Barnett
Dec 03, 2021
The two-layer model of human values, and problems with synthesizing preferences by Kaj Sotala
Dec 03, 2021
The Goldbach conjecture is probably correct; so was Fermat's last theorem by Stuart Armstrong
Dec 03, 2021
Eight claims about multi-agent AGI safety by Richard Ngo
Dec 03, 2021
Why I'm excited about Debate by Richard Ngo
Dec 03, 2021
Rogue AGI Embodies Valuable Intellectual Property by Mark Xu, CarlShulman
Dec 03, 2021
Information At A Distance Is Mediated By Deterministic Constraints by johnswentworth
Dec 03, 2021
Topological Fixed Point Exercises by Scott Garrabrant, Sam Eisenstat
Dec 03, 2021
A Gym Gridworld Environment for the Treacherous Turn by Michaël Trazzi
Dec 03, 2021
How does Gradient Descent Interact with Goodhart?Q by Scott Garrabrant
Dec 03, 2021
Markets are Universal for Logical Induction by johnswentworth.
Dec 03, 2021
TAI Safety Bibliographic Database by C. Jess Riedel.
Dec 03, 2021
Announcing AlignmentForum.org Beta by Raymond Arnold.
Dec 03, 2021
Classifying specification problems as variants of Goodhart's Law by Vika.
Dec 03, 2021
Does SGD Produce Deceptive Alignment? by Mark Xu.
Dec 03, 2021
Low-stakes alignment by Paul Christiano.
Dec 03, 2021
Preface to the sequence on value learning by Rohin Shah.
Dec 03, 2021
AGI safety from first principles: Superintelligence by Richard Ngo.
Dec 03, 2021
FAQ: Advice for AI Alignment Researchers by Rohin Shah.
Dec 03, 2021
AI takeoff story: a continuation of progress by other means by Edouard Harris.
Dec 03, 2021
Optimization Concepts in the Game of Life by Vika, Ramana Kumar
Dec 03, 2021
A general model of safety-oriented AI development by Wei Dai
Dec 03, 2021
Why we need a theory of human values by Stuart Armstrong
Dec 03, 2021
But exactly how complex and fragile? by KatjaGrace
Dec 03, 2021
Continuing the takeoffs debate by Richard Ngo
Dec 03, 2021
Progress on Causal Influence Diagrams by Tom Everitt
Dec 03, 2021
When does rationality-as-search have nontrivial implications? by nostalgebraist
Dec 03, 2021
Three AI Safety Related Ideas by Wei Dai
Dec 03, 2021
Comments on CAIS by Richard Ngo
Dec 03, 2021
Pavlov Generalizes by Abram Demski
Dec 03, 2021
Preparing for "The Talk" with AI projects by Daniel Kokotajlo
Dec 03, 2021
Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda by elriggs, Gurkenglas
Dec 03, 2021
Announcing the Vitalik Buterin Fellowships in AI Existential Safety! by DanielFilan
Dec 03, 2021
Three ways that "Sufficiently optimized agents appear coherent" can be false by Wei Dai
Dec 03, 2021
Research Agenda v0.9: Synthesising a human's preferences into a utility function by Stuart Armstrong
Dec 03, 2021
Conditions for Mesa-Optimization by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
Dec 03, 2021
Six AI Risk/Strategy Ideas by Wei Dai
Nov 30, 2021
Better priors as a safety problem by Paul Christiano
Nov 30, 2021
Non-Obstruction: A Simple Concept Motivating Corrigibility by Alex Turner
Nov 30, 2021
Three reasons to expect long AI timelines by Matthew Barnett
Nov 30, 2021
Decoupling deliberation from competition by Paul Christiano
Nov 30, 2021