Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Episode | Date |
---|---|
[HUMAN VOICE] "My Clients, The Liars" by ymeskhout
|
Mar 20, 2024 |
[HUMAN VOICE] "Deep atheism and AI risk" by Joe Carlsmith
|
Mar 20, 2024 |
[HUMAN VOICE] "CFAR Takeaways: Andrew Critch" by Raemon
|
Mar 10, 2024 |
[HUMAN VOICE] "Speaking to Congressional staffers about AI risk" by Akash, hath
|
Mar 10, 2024 |
Many arguments for AI x-risk are wrong
|
Mar 09, 2024 |
Tips for Empirical Alignment Research
|
Mar 07, 2024 |
Timaeus’s First Four Months
|
Feb 29, 2024 |
Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
|
Feb 23, 2024 |
[HUMAN VOICE] "And All the Shoggoths Merely Players" by Zack_M_Davis
|
Feb 20, 2024 |
[HUMAN VOICE] "Updatelessness doesn't solve most problems" by Martín Soto
|
Feb 20, 2024 |
Every “Every Bay Area House Party” Bay Area House Party
|
Feb 19, 2024 |
2023 Survey Results
|
Feb 19, 2024 |
Raising children on the eve of AI
|
Feb 18, 2024 |
“No-one in my org puts money in their pension”
|
Feb 18, 2024 |
Masterpiece
|
Feb 16, 2024 |
CFAR Takeaways: Andrew Critch
|
Feb 15, 2024 |
[HUMAN VOICE] "Believing In" by Anna Salamon
|
Feb 14, 2024 |
[HUMAN VOICE] "Attitudes about Applied Rationality" by Camille Berger
|
Feb 14, 2024 |
Scale Was All We Needed, At First
|
Feb 14, 2024 |
Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
|
Feb 11, 2024 |
[HUMAN VOICE] "A Shutdown Problem Proposal" by johnswentworth, David Lorell
|
Feb 09, 2024 |
Brute Force Manufactured Consensus is Hiding the Crime of the Century
|
Feb 04, 2024 |
[HUMAN VOICE] "Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI" by Jeremy Gillen, peterbarnett
|
Feb 03, 2024 |
Leading The Parade
|
Feb 02, 2024 |
[HUMAN VOICE] "The case for ensuring that powerful AIs are controlled" by ryan_greenblatt, Buck
|
Feb 02, 2024 |
Processor clock speeds are not how fast AIs think
|
Feb 01, 2024 |
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
|
Jan 31, 2024 |
Making every researcher seek grants is a broken model
|
Jan 29, 2024 |
The case for training frontier AIs on Sumerian-only corpus
|
Jan 28, 2024 |
This might be the last AI Safety Camp
|
Jan 25, 2024 |
[HUMAN VOICE] "There is way too much serendipity" by Malmesbury
|
Jan 22, 2024 |
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka
|
Jan 20, 2024 |
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al
|
Jan 20, 2024 |
The impossible problem of due process
|
Jan 17, 2024 |
[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith
|
Jan 14, 2024 |
Introducing Alignment Stress-Testing at Anthropic
|
Jan 14, 2024 |
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
|
Jan 13, 2024 |
[HUMAN VOICE] "Meaning & Agency" by Abram Demski
|
Jan 07, 2024 |
What’s up with LLMs representing XORs of arbitrary features?
|
Jan 07, 2024 |
Gentleness and the artificial Other
|
Jan 05, 2024 |
MIRI 2024 Mission and Strategy Update
|
Jan 05, 2024 |
The Plan - 2023 Version
|
Jan 04, 2024 |
Apologizing is a Core Rationalist Skill
|
Jan 03, 2024 |
[HUMAN VOICE] "A case for AI alignment being difficult" by jessicata
|
Jan 02, 2024 |
The Dark Arts
|
Jan 01, 2024 |
Critical review of Christiano’s disagreements with Yudkowsky
|
Dec 28, 2023 |
Most People Don’t Realize We Have No Idea How Our AIs Work
|
Dec 27, 2023 |
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
|
Dec 26, 2023 |
Succession
|
Dec 24, 2023 |
Nonlinear’s Evidence: Debunking False and Misleading Claims
|
Dec 21, 2023 |
Effective Aspersions: How the Nonlinear Investigation Went Wrong
|
Dec 20, 2023 |
Constellations are Younger than Continents
|
Dec 20, 2023 |
The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda
|
Dec 19, 2023 |
“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
|
Dec 18, 2023 |
Is being sexy for your homies?
|
Dec 17, 2023 |
[HUMAN VOICE] "Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible" by Gene Smith and Kman
|
Dec 17, 2023 |
[HUMAN VOICE] "Moral Reality Check (a short story)" by jessicata
|
Dec 15, 2023 |
AI Control: Improving Safety Despite Intentional Subversion
|
Dec 15, 2023 |
2023 Unofficial LessWrong Census/Survey
|
Dec 13, 2023 |
The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
|
Dec 13, 2023 |
[HUMAN VOICE] "What are the results of more parental supervision and less outdoor play?" by Julia Wise
|
Dec 13, 2023 |
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
|
Dec 12, 2023 |
re: Yudkowsky on biological materials
|
Dec 11, 2023 |
Speaking to Congressional staffers about AI risk
|
Dec 05, 2023 |
[HUMAN VOICE] "Shallow review of live agendas in alignment & safety" by technicalities & Stag
|
Dec 04, 2023 |
Thoughts on “AI is easy to control” by Pope & Belrose
|
Dec 02, 2023 |
The 101 Space You Will Always Have With You
|
Nov 30, 2023 |
[HUMAN VOICE] "Social Dark Matter" by Duncan Sabien
|
Nov 28, 2023 |
Shallow review of live agendas in alignment & safety
|
Nov 28, 2023 |
Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
|
Nov 25, 2023 |
[HUMAN VOICE] "The 6D effect: When companies take risks, one email can be very powerful." by scasper
|
Nov 23, 2023 |
OpenAI: The Battle of the Board
|
Nov 22, 2023 |
OpenAI: Facts from a Weekend
|
Nov 20, 2023 |
Sam Altman fired from OpenAI
|
Nov 18, 2023 |
Social Dark Matter
|
Nov 17, 2023 |
"You can just spontaneously call people you haven't met in years" by lc
|
Nov 17, 2023 |
[HUMAN VOICE] "Thinking By The Clock" by Screwtape
|
Nov 17, 2023 |
"EA orgs' legal structure inhibits risk taking and information sharing on the margin" by Elizabeth
|
Nov 17, 2023 |
[HUMAN VOICE] "AI Timelines" by habryka, Daniel Kokotajlo, Ajeya Cotra, Ege Erdil
|
Nov 17, 2023 |
"Integrity in AI Governance and Advocacy" by habryka, Olivia Jimenez
|
Nov 17, 2023 |
Loudly Give Up, Don’t Quietly Fade
|
Nov 16, 2023 |
[HUMAN VOICE] "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
|
Nov 09, 2023 |
[HUMAN VOICE] "Deception Chess: Game #1" by Zane et al.
|
Nov 09, 2023 |
"Does davidad's uploading moonshot work?" by jacobjabob et al.
|
Nov 09, 2023 |
"The other side of the tidal wave" by Katja Grace
|
Nov 09, 2023 |
"The 6D effect: When companies take risks, one email can be very powerful." by scasper
|
Nov 09, 2023 |
Comp Sci in 2027 (Short story by Eliezer Yudkowsky)
|
Nov 09, 2023 |
"My thoughts on the social response to AI risk" by Matthew Barnett
|
Nov 09, 2023 |
"Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk" by 1a3orn
|
Nov 09, 2023 |
"President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence" by Tristan Williams
|
Nov 03, 2023 |
"Thoughts on the AI Safety Summit company policy requests and responses" by So8res
|
Nov 03, 2023 |
[Human Voice] "Book Review: Going Infinite" by Zvi
|
Oct 31, 2023 |
"Announcing Timaeus" by Jesse Hoogland et al.
|
Oct 30, 2023 |
"Thoughts on responsible scaling policies and regulation" by Paul Christiano
|
Oct 30, 2023 |
"AI as a science, and three obstacles to alignment strategies" by Nate Soares
|
Oct 30, 2023 |
"Architects of Our Own Demise: We Should Stop Developing AI" by Roko
|
Oct 30, 2023 |
"At 87, Pearl is still able to change his mind" by rotatingpaguro
|
Oct 30, 2023 |
"We're Not Ready: thoughts on "pausing" and responsible scaling policies" by Holden Karnofsky
|
Oct 30, 2023 |
[HUMAN VOICE] "Alignment Implications of LLM Successes: a Debate in One Act" by Zack M Davis
|
Oct 23, 2023 |
"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.
|
Oct 23, 2023 |
"Holly Elmore and Rob Miles dialogue on AI Safety Advocacy" by jacobjacob, Robert Miles & Holly_Elmore
|
Oct 23, 2023 |
"Labs should be explicit about why they are building AGI" by Peter Barnett
|
Oct 19, 2023 |
[HUMAN VOICE] "Sum-threshold attacks" by TsviBT
|
Oct 18, 2023 |
"Will no one rid me of this turbulent pest?" by Metacelsus
|
Oct 18, 2023 |
"RSPs are pauses done right" by evhub
|
Oct 15, 2023 |
[HUMAN VOICE] "Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
|
Oct 15, 2023 |
"Cohabitive Games so Far" by mako yass
|
Oct 15, 2023 |
"Announcing MIRI’s new CEO and leadership team" by Gretta Duleba
|
Oct 15, 2023 |
"Comparing Anthropic's Dictionary Learning to Ours" by Robert_AIZI
|
Oct 15, 2023 |
"Announcing Dialogues" by Ben Pace
|
Oct 09, 2023 |
"Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
|
Oct 09, 2023 |
"Evaluating the historical value misspecification argument" by Matthew Barnett
|
Oct 09, 2023 |
"Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn" by Zvi
|
Oct 09, 2023 |
"Thomas Kwa's MIRI research experience" by Thomas Kwa and others
|
Oct 06, 2023 |
"EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem" by Elizabeth
|
Oct 03, 2023 |
"How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" by Jan Brauner et al.
|
Oct 03, 2023 |
"The Lighthaven Campus is open for bookings" by Habryka
|
Oct 03, 2023 |
"'Diamondoid bacteria' nanobots: deadly threat or dead-end? A nanotech investigation" by titotal
|
Oct 03, 2023 |
"The King and the Golem" by Richard Ngo
|
Sep 29, 2023 |
"Sparse Autoencoders Find Highly Interpretable Directions in Language Models" by Logan Riggs et al
|
Sep 27, 2023 |
"Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
|
Sep 26, 2023 |
"There should be more AI safety orgs" by Marius Hobbhahn
|
Sep 25, 2023 |
"The Talk: a brief explanation of sexual dimorphism" by Malmesbury
|
Sep 22, 2023 |
"A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX" by jacobjacob
|
Sep 20, 2023 |
"AI presidents discuss AI alignment agendas" by TurnTrout & Garrett Baker
|
Sep 19, 2023 |
"UDT shows that decision theory is more puzzling than ever" by Wei Dai
|
Sep 18, 2023 |
"Sum-threshold attacks" by TsviBT
|
Sep 11, 2023 |
"A list of core AI safety problems and how I hope to solve them" by Davidad
|
Sep 09, 2023 |
"Report on Frontier Model Training" by Yafah Edelman
|
Sep 09, 2023 |
"Defunding My Mistake" by ymeskhout
|
Sep 08, 2023 |
"Sharing Information About Nonlinear" by Ben Pace
|
Sep 08, 2023 |
"One Minute Every Moment" by abramdemski
|
Sep 08, 2023 |
"What I would do if I wasn’t at ARC Evals" by LawrenceC
|
Sep 08, 2023 |
"The U.S. is becoming less stable" by lc
|
Sep 04, 2023 |
"Meta Questions about Metaphilosophy" by Wei Dai
|
Sep 04, 2023 |
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist
|
Sep 04, 2023 |
"Dear Self; we need to talk about ambition" by Elizabeth
|
Aug 30, 2023 |
"Book Launch: "The Carving of Reality," Best of LessWrong vol. III" by Raemon
|
Aug 28, 2023 |
"Assume Bad Faith" by Zack_M_Davis
|
Aug 28, 2023 |
"Large Language Models will be Great for Censorship" by Ethan Edwards
|
Aug 23, 2023 |
"Ten Thousand Years of Solitude" by agp
|
Aug 22, 2023 |
"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov
|
Aug 22, 2023 |
"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël
|
Aug 21, 2023 |
"Inflection.ai is a major AGI lab" by Nikola
|
Aug 15, 2023 |
"Feedbackloop-first Rationality" by Raemon
|
Aug 15, 2023 |
"When can we trust model evaluations?" bu evhub
|
Aug 09, 2023 |
"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
|
Aug 09, 2023 |
"My current LK99 questions" by Eliezer Yudkowsky
|
Aug 04, 2023 |
"The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate" by Adam David Long
|
Aug 04, 2023 |
"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes
|
Aug 04, 2023 |
"Thoughts on sharing information about language model capabilities" by paulfchristiano
|
Aug 02, 2023 |
"Yes, It's Subjective, But Why All The Crabs?" by johnswentworth
|
Jul 31, 2023 |
"Self-driving car bets" by paulfchristiano
|
Jul 31, 2023 |
"Cultivating a state of mind where new ideas are born" by Henrik Karlsson
|
Jul 31, 2023 |
"Rationality !== Winning" by Raemon
|
Jul 28, 2023 |
"Brain Efficiency Cannell Prize Contest Award Ceremony" by Alexander Gietelink Oldenziel
|
Jul 28, 2023 |
"Grant applications and grand narratives" by Elizabeth
|
Jul 28, 2023 |
"Cryonics and Regret" by MvB
|
Jul 28, 2023 |
"Unifying Bargaining Notions (2/2)" by Diffractor
|
Jun 12, 2023 |
"The ants and the grasshopper" by Richard Ngo
|
Jun 06, 2023 |
"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.
|
May 18, 2023 |
"An artificially structured argument for expecting AGI ruin" by Rob Bensinger
|
May 16, 2023 |
"How much do you believe your results?" by Eric Neyman
|
May 10, 2023 |
"Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)" by Chris Scammell & DivineMango
|
Apr 27, 2023 |
"On AutoGPT" by Zvi
|
Apr 19, 2023 |
"GPTs are Predictors, not Imitators" by Eliezer Yudkowsky
|
Apr 12, 2023 |
"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky
|
Apr 05, 2023 |
"A stylized dialogue on John Wentworth's claims about markets and optimization" by Nate Soares
|
Apr 05, 2023 |
"Deep Deceptiveness" by Nate Soares
|
Apr 05, 2023 |
"The Onion Test for Personal and Institutional Honesty" by Chana Messinger & Andrew Critch
|
Mar 28, 2023 |
"There’s no such thing as a tree (phylogenetically)" by Eukaryote
|
Mar 28, 2023 |
"Losing the root for the tree" by Adam Zerner
|
Mar 28, 2023 |
"Lies, Damn Lies, and Fabricated Options" by Duncan Sabien
|
Mar 28, 2023 |
"Why I think strong general AI is coming soon" by Porby
|
Mar 28, 2023 |
"It Looks Like You’re Trying To Take Over The World" by Gwern
|
Mar 28, 2023 |
"What failure looks like" by Paul Christiano
|
Mar 28, 2023 |
"More information about the dangerous capability evaluations we did with GPT-4 and Claude." by Beth Barnes
|
Mar 21, 2023 |
""Carefully Bootstrapped Alignment" is organizationally hard" by Raemon
|
Mar 21, 2023 |
"The Parable of the King and the Random Process" by moridinamael
|
Mar 14, 2023 |
"Enemies vs Malefactors" by Nate Soares
|
Mar 14, 2023 |
"The Waluigi Effect (mega-post)" by Cleo Nardo
|
Mar 08, 2023 |
"Acausal normalcy" by Andrew Critch
|
Mar 06, 2023 |
"Please don't throw your mind away" by TsviBT
|
Mar 01, 2023 |
"Cyborgism" by Nicholas Kees & Janus
|
Feb 15, 2023 |
"Childhoods of exceptional people" by Henrik Karlsson
|
Feb 14, 2023 |
"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares
|
Feb 13, 2023 |
"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça
|
Feb 10, 2023 |
"SolidGoldMagikarp (plus, prompt generation)"
|
Feb 08, 2023 |
"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares
|
Feb 03, 2023 |
"Basics of Rationalist Discourse" by Duncan Sabien
|
Feb 02, 2023 |
"Sapir-Whorf for Rationalists" by Duncan Sabien
|
Jan 31, 2023 |
"My Model Of EA Burnout" by Logan Strohl
|
Jan 31, 2023 |
"The Social Recession: By the Numbers" by Anton Stjepan Cebalo
|
Jan 25, 2023 |
"Recursive Middle Manager Hell" by Raemon
|
Jan 24, 2023 |
"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin
|
Jan 12, 2023 |
"Models Don't 'Get Reward'" by Sam Ringer
|
Jan 12, 2023 |
"The Feeling of Idea Scarcity" by John Wentworth
|
Jan 12, 2023 |
"The next decades might be wild" by Marius Hobbhahn
|
Dec 21, 2022 |
"Lessons learned from talking to >100 academics about AI safety" by Marius Hobbhahn
|
Nov 17, 2022 |
"How my team at Lightcone sometimes gets stuff done" by jacobjacob
|
Nov 10, 2022 |
"Decision theory does not imply that we get to have nice things" by So8res
|
Nov 08, 2022 |
"What 2026 looks like" by Daniel Kokotajlo
|
Nov 07, 2022 |
Counterarguments to the basic AI x-risk case
|
Nov 04, 2022 |
"Introduction to abstract entropy" by Alex Altair
|
Oct 29, 2022 |
"Consider your appetite for disagreements" by Adam Zerner
|
Oct 25, 2022 |
"My resentful story of becoming a medical miracle" by Elizabeth
|
Oct 21, 2022 |
"The Redaction Machine" by Ben
|
Oct 02, 2022 |
"Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover" by Ajeya Cotra
|
Sep 27, 2022 |
"The shard theory of human values" by Quintin Pope & TurnTrout
|
Sep 22, 2022 |
"Two-year update on my personal AI timelines" by Ajeya Cotra
|
Sep 22, 2022 |
"You Are Not Measuring What You Think You Are Measuring" by John Wentworth
|
Sep 21, 2022 |
"Do bamboos set themselves on fire?" by Malmesbury
|
Sep 20, 2022 |
"Toni Kurz and the Insanity of Climbing Mountains" by Gene Smith
|
Sep 18, 2022 |
"Deliberate Grieving" by Raemon
|
Sep 18, 2022 |
"Survey advice" by Katja Grace
|
Sep 18, 2022 |
"Language models seem to be much better than humans at next-token prediction" by Buck, Fabien and LawrenceC
|
Sep 15, 2022 |
"Humans are not automatically strategic" by Anna Salamon
|
Sep 15, 2022 |
"Local Validity as a Key to Sanity and Civilization" by Eliezer Yudkowsky
|
Sep 15, 2022 |
"Toolbox-thinking and Law-thinking" by Eliezer Yudkowsky
|
Sep 15, 2022 |
"Moral strategies at different capability levels" by Richard Ngo
|
Sep 14, 2022 |
"Worlds Where Iterative Design Fails" by John Wentworth
|
Sep 11, 2022 |
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
|
Sep 11, 2022 |
"Unifying Bargaining Notions (1/2)" by Diffractor
|
Sep 09, 2022 |
'Simulators' by Janus
|
Sep 05, 2022 |
"Humans provide an untapped wealth of evidence about alignment" by TurnTrout & Quintin Pope
|
Aug 08, 2022 |
"Changing the world through slack & hobbies" by Steven Byrnes
|
Jul 30, 2022 |
"«Boundaries», Part 1: a key missing concept from utility theory" by Andrew Critch
|
Jul 28, 2022 |
"ITT-passing and civility are good; "charity" is bad; steelmanning is niche" by Rob Bensinger
|
Jul 24, 2022 |
"What should you change in response to an "emergency"? And AI risk" by Anna Salamon
|
Jul 23, 2022 |
"On how various plans miss the hard bits of the alignment challenge" by Nate Soares
|
Jul 17, 2022 |
"Humans are very reliable agents" by Alyssa Vance
|
Jul 13, 2022 |
"Looking back on my alignment PhD" by TurnTrout
|
Jul 08, 2022 |
"It’s Probably Not Lithium" by Natália Coelho Mendonça
|
Jul 05, 2022 |
"What Are You Tracking In Your Head?" by John Wentworth
|
Jul 02, 2022 |
"Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment" by elspood
|
Jun 29, 2022 |
"Where I agree and disagree with Eliezer" by Paul Christiano
|
Jun 22, 2022 |
"Six Dimensions of Operational Adequacy in AGI Projects" by Eliezer Yudkowsky
|
Jun 21, 2022 |
"Moses and the Class Struggle" by lsusr
|
Jun 21, 2022 |
"Benign Boundary Violations" by Duncan Sabien
|
Jun 20, 2022 |
"AGI Ruin: A List of Lethalities" by Eliezer Yudkowsky
|
Jun 20, 2022 |