The Nonlinear Library: Alignment Forum Daily

By The Nonlinear Fund

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.


Category: Education

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast

Subscribers: 0
Reviews: 0
Episodes: 60

Description

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episode Date
AF - Meta Questions about Metaphilosophy by Wei Dai
Sep 01, 2023
AF - Red-teaming language models via activation engineering by Nina Rimsky
Aug 26, 2023
AF - Causality and a Cost Semantics for Neural Networks by scottviteri
Aug 21, 2023
AF - "Dirty concepts" in AI alignment discourses, and some guesses for how to deal with them by Nora Ammann
Aug 20, 2023
AF - A Proof of Löb's Theorem using Computability Theory by Jessica Taylor
Aug 16, 2023
AF - Reducing sycophancy and improving honesty via activation steering by NinaR
Jul 28, 2023
AF - How LLMs are and are not myopic by janus
Jul 25, 2023
AF - Open problems in activation engineering by Alex Turner
Jul 24, 2023
AF - QAPR 5: grokking is maybe not that big a deal? by Quintin Pope
Jul 23, 2023
AF - Priorities for the UK Foundation Models Taskforce by Andrea Miotti
Jul 21, 2023
AF - Alignment Grantmaking is Funding-Limited Right Now by johnswentworth
Jul 19, 2023
AF - Measuring and Improving the Faithfulness of Model-Generated Reasoning by Ansh Radhakrishnan
Jul 18, 2023
AF - Using (Uninterpretable) LLMs to Generate Interpretable AI Code by Joar Skalse
Jul 02, 2023
AF - Agency from a causal perspective by Tom Everitt
Jun 30, 2023
AF - Catastrophic Risks from AI #4: Organizational Risks by Dan H
Jun 26, 2023
AF - LLMs Sometimes Generate Purely Negatively-Reinforced Text by Fabien Roger
Jun 16, 2023
AF - Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS) by Scott Emmons
May 31, 2023
AF - PaLM-2 and GPT-4 in "Extrapolating GPT-N performance" by Lukas Finnveden
May 30, 2023
AF - Wikipedia as an introduction to the alignment problem by SoerenMind
May 29, 2023
AF - [Linkpost] Interpretability Dreams by DanielFilan
May 24, 2023
AF - Conjecture internal survey | AGI timelines and estimations of probability of human extinction from unaligned AGI by Maris Sala
May 22, 2023
AF - Some background for reasoning about dual-use alignment research by Charlie Steiner
May 18, 2023
AF - $500 Bounty/Prize Problem: Channel Capacity Using "Insensitive" Functions by johnswentworth
May 16, 2023
AF - Difficulties in making powerful aligned AI by DanielFilan
May 14, 2023
AF - AI doom from an LLM-plateau-ist perspective by Steve Byrnes
Apr 27, 2023
AF - How Many Bits Of Optimization Can One Bit Of Observation Unlock? by johnswentworth
Apr 26, 2023
AF - Endo-, Dia-, Para-, and Ecto-systemic novelty by Tsvi Benson-Tilsen
Apr 23, 2023
AF - Thinking about maximization and corrigibility by James Payor
Apr 21, 2023
AF - Concave Utility Question by Scott Garrabrant
Apr 15, 2023
AF - Shapley Value Attribution in Chain of Thought by leogao
Apr 14, 2023
AF - Announcing Epoch’s dashboard of key trends and figures in Machine Learning by Jaime Sevilla
Apr 13, 2023
AF - Lessons from Convergent Evolution for AI Alignment by Jan Kulveit
Mar 27, 2023
AF - What happens with logical induction when... by Donald Hobson
Mar 26, 2023
AF - EAI Alignment Speaker Series #1: Challenges for Safe and Beneficial Brain-Like Artificial General Intelligence with Steve Byrnes by Curtis Huebner
Mar 23, 2023
AF - The space of systems and the space of maps by Jan Kulveit
Mar 22, 2023
AF - [ASoT] Some thoughts on human abstractions by leogao
Mar 16, 2023
AF - What is a definition, how can it be extrapolated? by Stuart Armstrong
Mar 14, 2023
AF - Implied "utilities" of simulators are broad, dense, and shallow by porby
Mar 01, 2023
AF - Scarce Channels and Abstraction Coupling by johnswentworth
Feb 28, 2023
AF - Agents vs. Predictors: Concrete differentiating factors by Evan Hubinger
Feb 24, 2023
AF - AI that shouldn't work, yet kind of does by Donald Hobson
Feb 23, 2023
AF - The Open Agency Model by Eric Drexler
Feb 22, 2023
AF - EIS VII: A Challenge for Mechanists by Stephen Casper
Feb 18, 2023
AF - EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety by Stephen Casper
Feb 17, 2023
AF - Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic) by Lawrence Chan
Feb 16, 2023
AF - EIS IV: A Spotlight on Feature Attribution/Saliency by Stephen Casper
Feb 15, 2023
AF - The Cave Allegory Revisited: Understanding GPT's Worldview by Jan Kulveit
Feb 14, 2023
AF - Inner Misalignment in "Simulator" LLMs by Adam Scherlis
Jan 31, 2023
AF - Why I hate the "accident vs. misuse" AI x-risk dichotomy (quick thoughts on "structural risk") by David Scott Krueger
Jan 30, 2023
AF - Quick thoughts on "scalable oversight" / "super-human feedback" research by David Scott Krueger
Jan 25, 2023
AF - Quick thoughts on "scalable oversight" / "super-human feedback" research by David Scott Krueger
Jan 25, 2023
AF - Thoughts on hardware / compute requirements for AGI by Steve Byrnes
Jan 24, 2023
AF - Thoughts on hardware / compute requirements for AGI by Steve Byrnes
Jan 24, 2023
AF - Gemini modeling by Tsvi Benson-Tilsen
Jan 22, 2023
AF - Shard theory alignment requires magic. by Charlie Steiner
Jan 20, 2023
AF - Thoughts on refusing harmful requests to large language models by William Saunders
Jan 19, 2023
AF - Löbian emotional processing of emergent cooperation: an example by Andrew Critch
Jan 17, 2023
AF - Underspecification of Oracle AI by Rubi J. Hudson
Jan 15, 2023
AF - World-Model Interpretability Is All We Need by Thane Ruthenis
Jan 14, 2023
AF - AGISF adaptation for in-person groups by Sam Marks
Jan 13, 2023