LessWrong (30+ Karma)

By LessWrong

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by LessWrong

Category: Technology

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast
    

Subscribers: 3
Reviews: 0
Episodes: 250

Description

Audio narrations of LessWrong posts.

Episode Date
“Statistical takes for mech interp research and beyond” by Paul Bogdan
Aug 06, 2025
[Linkpost] “OpenAI Releases gpt-oss” by anaguma
Aug 06, 2025
“Childhood and Education #13: College” by Zvi
Aug 06, 2025
“The perils of under- vs over-sculpting AGI desires” by Steven Byrnes
Aug 06, 2025
“The Problem” by Rob Bensinger, tanagrabeast, yams, So8res, Eliezer Yudkowsky, Gretta Duleba
Aug 05, 2025
“Concept Poisoning: Probing LLMs without probes” by Jan Betley, jorio, dylan_f, Owain_Evans
Aug 05, 2025
“Narrow finetuning is different” by cloud, Stewy Slocum
Aug 05, 2025
“On Altman’s Interview With Theo Von” by Zvi
Aug 05, 2025
“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes
Aug 05, 2025
“Towards Alignment Auditing as a Numbers-Go-Up Science” by Sam Marks
Aug 04, 2025
“Alcohol is so bad for society that you should probably stop drinking” by KatWoods
Aug 04, 2025
“Permanent Disempowerment is the Baseline” by Vladimir_Nesov
Aug 04, 2025
“Should we aim for flourishing over mere survival? The Better Futures series.” by wdmacaskill
Aug 04, 2025
“Saying Goodbye” by sapphire
Aug 04, 2025
“Emotions Make Sense” by DaystarEld
Aug 03, 2025
“Whence the Inkhaven Residency?” by Ben Pace
Aug 02, 2025
“Many prediction markets would be better off as batched auctions” by William Howard
Aug 02, 2025
“How many species has humanity driven extinct?” by Raemon
Aug 02, 2025
“SB-1047 Documentary: The Post-Mortem” by Michaël Trazzi
Aug 02, 2025
“Podcast: Lincoln Quirk from Wave” by Elizabeth
Aug 02, 2025
“The Dark Arts As A Scaffolding Skill For Rationality” by Screwtape
Aug 01, 2025
“Steve Petersen funding” by abramdemski
Aug 01, 2025
“Two Kinds of Do Overs” by jefftk
Aug 01, 2025
“Red-Thing-Ism” by J Bostock
Aug 01, 2025
“Do Not Render Your Counterfactuals” by AlphaAndOmega
Aug 01, 2025
“Building Black-box Scheming Monitors” by james__p, richbc, Simon Storf, Marius Hobbhahn
Jul 31, 2025
“Follow-up to ‘My Empathy Is Rarely Kind’” by johnswentworth
Jul 31, 2025
“I am worried about near-term non-LLM AI developments” by testingthewaters
Jul 31, 2025
“Childhood and Education: College Admissions” by Zvi
Jul 31, 2025
“Optimizing The Final Output Can Obfuscate CoT (Research Note)” by lukemarks, jacob_drori, cloud, TurnTrout
Jul 30, 2025
“China proposes new global AI cooperation organisation” by Matrice Jacobine
Jul 30, 2025
“My Empathy Is Rarely Kind” by johnswentworth
Jul 30, 2025
“The many paths to permanent disempowerment even with shutdownable AIs (MATS project summary for feedback)” by GideonF
Jul 30, 2025
“Spilling the Tea” by Zvi
Jul 29, 2025
“I wrote a song parody” by CronoDAS
Jul 29, 2025
“Low P(x-risk) as the Bailey for Low P(doom)” by Vladimir_Nesov
Jul 29, 2025
“About 30% of Humanity’s Last Exam chemistry/biology answers are likely wrong” by bohaska
Jul 29, 2025
“Procrastination Drill” by silentbob
Jul 29, 2025
“Teaching kids to swim” by Steven Byrnes
Jul 29, 2025
“Recursions on LessOnline 2025” by Error
Jul 29, 2025
“Simplex Progress Report - July 2025” by Adam Shai, Paul Riechers, hrbigelow, Eric Alt, mntss
Jul 29, 2025
“Optimally Combining Probe Monitors and Black Box Monitors” by Tim Hua, jamesbaskerville, BionicD0LPH1N, Mia Hopman, Aryan Bhatt, Tyler Tracy
Jul 28, 2025
“AI Companion Piece” by Zvi
Jul 28, 2025
“This Is Not Life” by samhealy
Jul 28, 2025
“Sydney Bing Wikipedia Article: Sydney (Microsoft Prometheus)” by jdp
Jul 28, 2025
“Maya’s Escape” by Bridgett Kay
Jul 27, 2025
[Linkpost] “The Purpose of a System is what it Rewards” by robotelvis
Jul 27, 2025
“my experience on glp-1s as a thin person” by AnnaJo
Jul 26, 2025
“Anthropic Faces Potentially ‘Business-Ending’ Copyright Lawsuit” by garrison
Jul 26, 2025
“HPMOR: The (Probably) Untold Lore” by Gretta Duleba, Eliezer Yudkowsky
Jul 25, 2025
“We Built a Tool to Protect Your Dataset From Simple Scrapers” by TurnTrout, Edward Turner, Dipika Khullar
Jul 25, 2025
[Linkpost] “Reasoning-Finetuning Repurposes Latent Representations in Base Models” by Jake Ward, lccqqqqq, Neel Nanda
Jul 25, 2025
“Building and evaluating alignment auditing agents” by Sam Marks, Sam Bowman, Euan Ong, Johannes Treutlein, evhub
Jul 24, 2025
“The Whole Check” by JustisMills
Jul 24, 2025
“‘Behaviorist’ RL reward functions lead to scheming” by Steven Byrnes
Jul 24, 2025
[Linkpost] “A brief perspective from an IMO coordinator” by DirectedEvolution
Jul 23, 2025
“Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning” by kh4dien, Helena Casademunt, Adam Karvonen, Sam Marks, Senthooran Rajamanoharan, Neel Nanda
Jul 23, 2025
“On ‘ChatGPT Psychosis’ and LLM Sycophancy” by jdp
Jul 23, 2025
“Google and OpenAI Get 2025 IMO Gold” by Zvi
Jul 23, 2025
“Unfaithful chain-of-thought as nudged reasoning” by Paul Bogdan, Uzay Macar, Arthur Conmy, Neel Nanda
Jul 23, 2025
“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans
Jul 22, 2025
“Directly Try Solving Alignment for 5 weeks” by Kabir Kumar
Jul 22, 2025
[Linkpost] “Why Reality Has A Well-Known Math Bias” by Linch
Jul 22, 2025
“Do ‘adult developmental stages’ theories have any pre-theoretical motivation?” by Said Achmiz
Jul 22, 2025
“Monthly Roundup #32: July 2025” by Zvi
Jul 21, 2025
“If Anyone Builds It, Everyone Dies: Call for Translators (for Supplementary Materials)” by yams
Jul 21, 2025
“Detecting High-Stakes Interactions with Activation Probes” by Arrrlex, williambankes, Urja Pawar, Phil Bland, David Scott Krueger (formerly: capybaralet), Dmitrii Krasheninnikov
Jul 21, 2025
“HRT in Menopause: A candidate for a case study of epistemology in epidemiology, statistics & medicine” by foodforthought
Jul 21, 2025
[Linkpost] “GDM also claims IMO gold medal” by Yair Halberstadt
Jul 21, 2025
“[Fiction] Our Trial” by Nina Panickssery
Jul 21, 2025
“LLMs Can’t See Pixels or Characters” by Brendan Long
Jul 21, 2025
“Plato’s Trolley” by dr_s
Jul 21, 2025
“Your AI Safety org could get EU funding up to €9.08M. Here’s how (+ free personalized support)” by SamuelK
Jul 20, 2025
“Shallow Water is Dangerous Too” by jefftk
Jul 20, 2025
“Make More Grayspaces” by Duncan Sabien (Inactive)
Jul 20, 2025
[Linkpost] “AI Gets IMO Gold Medal: via general-purpose RL, not via narrow, task specific methodology” by Mikhail Samin
Jul 19, 2025
“A night-watchman ASI as a first step toward a great future” by Eric Neyman
Jul 19, 2025
“Love stays loved (formerly ‘Skin’)” by Swimmer963 (Miranda Dixon-Luinenburg)
Jul 18, 2025
“Why it’s hard to make settings for high-stakes control research” by Buck
Jul 18, 2025
“On METR’s AI Coding RCT” by Zvi
Jul 18, 2025
“Trying the Obvious Thing” by PranavG, Gabriel Alfour
Jul 18, 2025
“Video and transcript of talk on ‘Can goodness compete?’” by Joe Carlsmith
Jul 17, 2025
“On being sort of back and sort of new here” by Loki zen
Jul 17, 2025
“Comment on ‘Four Layers of Intellectual Conversation’” by Zack_M_Davis
Jul 17, 2025
“Selective Generalization: Improving Capabilities While Maintaining Alignment” by ariana_azarbal, Matthew A. Clarke, jorio, Cailley Factor, cloud
Jul 17, 2025
“Bodydouble / Thinking Assistant matchmaking” by Raemon
Jul 17, 2025
“Kimi K2” by Zvi
Jul 16, 2025
“Grok 4 Various Things” by Zvi
Jul 16, 2025
“Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” by Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah
Jul 15, 2025
“Do confident short timelines make sense?” by TsviBT, abramdemski
Jul 15, 2025
[Linkpost] “LLM-induced craziness and base rates” by Kaj_Sotala
Jul 15, 2025
[Linkpost] “Bernie Sanders (I-VT) mentions AI loss of control risk in Gizmodo interview” by Matrice Jacobine
Jul 14, 2025
“Recent Redwood Research project proposals” by ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman, Tyler Tracy, Aryan Bhatt, Joey Yudelson
Jul 14, 2025
“Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda
Jul 14, 2025
“Do LLMs know what they’re capable of? Why this matters for AI safety, and initial findings” by Casey Barkan, Sid Black, Oliver Sourbut
Jul 14, 2025
“Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance” by Senthooran Rajamanoharan, Neel Nanda
Jul 14, 2025
“Worse Than MechaHitler” by Zvi
Jul 14, 2025
“How Does Time Horizon Vary Across Domains?” by Thomas Kwa
Jul 14, 2025
“xAI’s Grok 4 has no meaningful safety guardrails” by eleventhsavi0r
Jul 14, 2025
“Stop and check! The parable of the prince and the dog” by Dumbledore’s Army
Jul 14, 2025
“OpenAI Model Differentiation 101” by Zvi
Jul 14, 2025
“10x more training compute = 5x greater task length (kind of)” by Expertium
Jul 14, 2025
“Three Missing Cakes, or One Turbulent Critic?” by Benquo
Jul 14, 2025
“You can get LLMs to say almost anything you want” by Kaj_Sotala
Jul 13, 2025
“against that one rationalist mashal about japanese fifth-columnists” by Fraser
Jul 13, 2025
“Surprises and learnings from almost two months of Leo Panickssery” by Nina Panickssery
Jul 13, 2025
“Vitalik’s Response to AI 2027” by Daniel Kokotajlo
Jul 12, 2025
“the jackpot age” by thiccythot
Jul 12, 2025
“Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” by habryka
Jul 11, 2025
[Linkpost] “Guide to Redwood’s writing” by Julian Stastny
Jul 11, 2025
“So You Think You’ve Awoken ChatGPT” by JustisMills
Jul 11, 2025
[Linkpost] “Open Global Investment as a Governance Model for AGI” by Nick Bostrom
Jul 10, 2025
“what makes Claude 3 Opus misaligned” by janus
Jul 10, 2025
“Lessons from the Iraq War about AI policy” by Buck
Jul 10, 2025
“Generalized Hangriness: A Standard Rationalist Stance Toward Emotions” by johnswentworth
Jul 10, 2025
“Evaluating and monitoring for AI scheming” by Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, Rohin Shah
Jul 10, 2025
“White Box Control at UK AISI - Update on Sandbagging Investigations” by Joseph Bloom, Jordan Taylor, Connor Kissane, Sid Black, merizian, alexdzm, jacoba, Ben Millwood, Alan Cooney
Jul 10, 2025
“80,000 Hours is producing AI in Context — a new YouTube channel. Our first video, about the AI 2027 scenario, is up!” by chanamessinger
Jul 10, 2025
[Linkpost] “No, We’re Not Getting Meaningful Oversight of AI” by Davidmanheim
Jul 10, 2025
“What’s worse, spies or schemers?” by Buck, Julian Stastny
Jul 09, 2025
“No, Grok, No” by Zvi
Jul 09, 2025
“Applying right-wing frames to AGI (geo)politics” by Richard_Ngo
Jul 09, 2025
“A deep critique of AI 2027’s bad timeline models” by titotal
Jul 09, 2025
“Subway Particle Levels Aren’t That High” by jefftk
Jul 09, 2025
“An Opinionated Guide to Using Anki Correctly” by Luise
Jul 09, 2025
“Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger
Jul 08, 2025
“Balsa Update: Springtime in DC” by Zvi
Jul 08, 2025
[Linkpost] “A Theory of Structural Independence” by Matthias G. Mayer
Jul 08, 2025
“On Alpha School” by Zvi
Jul 08, 2025
“You Can’t Objectively Compare Seven Bees to One Human” by J Bostock
Jul 08, 2025
“Literature Review: Risks of MDMA” by Elizabeth
Jul 07, 2025
“45 - Samuel Albanie on DeepMind’s AGI Safety Approach” by DanielFilan
Jul 07, 2025
“On the functional self of LLMs” by eggsyntax
Jul 07, 2025
“Shutdown Resistance in Reasoning Models” by benwr, JeremySchlatter, Jeffrey Ladish
Jul 06, 2025
“The Cult of Pain” by Martin Sustrik
Jul 05, 2025
[Linkpost] “Claude is a Ravenclaw” by Adam Newgas
Jul 05, 2025
“‘Buckle up bucko, this ain’t over till it’s over.’” by Raemon
Jul 05, 2025
“How much novel security-critical infrastructure do you need during the singularity?” by Buck
Jul 05, 2025
“‘AI for societal uplift’ as a path to victory” by Raymond Douglas
Jul 04, 2025
“Two proposed projects on abstract analogies for scheming” by Julian Stastny
Jul 04, 2025
“Outlive: A Critical Review” by MichaelDickens
Jul 04, 2025
“Authors Have a Responsibility to Communicate Clearly” by TurnTrout
Jul 04, 2025
[Linkpost] “MIRI Newsletter #123” by Harlan, Rob Bensinger
Jul 04, 2025
[Linkpost] “Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals” by Marius Hobbhahn
Jul 03, 2025
“Call for suggestions - AI safety course” by boazbarak
Jul 03, 2025
[Linkpost] “IABIED: Advertisement design competition” by yams
Jul 03, 2025
“Congress Asks Better Questions” by Zvi
Jul 03, 2025
“Curing PMS with Hair Loss Pills” by David Lorell
Jul 02, 2025
“AI Task Length Horizons in Offensive Cybersecurity” by Sean Peters
Jul 02, 2025
“Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks
Jul 02, 2025
“There are two fundamentally different constraints on schemers” by Buck
Jul 02, 2025
“‘What’s my goal?’” by Raemon
Jul 02, 2025
“A Simple Explanation of AGI Risk” by TurnTrout
Jul 02, 2025
“AI Moratorium Stripped From BBB” by Zvi
Jul 01, 2025
“Scientific Discovery in the Age of Artificial Intelligence” by Jessica Rumbelow
Jul 01, 2025
“SLT for AI Safety” by Jesse Hoogland
Jul 01, 2025
“The best simple argument for Pausing AI?” by Gary Marcus
Jul 01, 2025
“SAE on activation differences” by Santiago Aranguri, jacob_drori, Neel Nanda
Jul 01, 2025
“What We Learned Trying to Diff Base and Chat Models (And Why It Matters)” by Clément Dumas, Julian Minder, Neel Nanda
Jun 30, 2025
“If you want to be vegan but you worry about health effects of no meat, consider being vegan except for mussels/oysters” by KatWoods
Jun 30, 2025
[Linkpost] “Project Vend: Can Claude run a small shop?” by Gunnar_Zarncke
Jun 30, 2025
“Paradigms for computation” by Cole Wyeth
Jun 30, 2025
“life lessons from poker” by thiccythot
Jun 30, 2025
“Circuits in Superposition 2: Now with Less Wrong Math” by Linda Linsefors, Lucius Bushnaq
Jun 30, 2025
“I underestimated safety research speedups from safe AI” by Dan Braun
Jun 30, 2025
“Conciseness Manifesto” by Vasyl Dotsenko
Jun 29, 2025
“Support for bedrock liberal principles seems to be in pretty bad shape these days” by Max H
Jun 29, 2025
[Linkpost] “A Depressed Shrink Tries Shrooms” by AlphaAndOmega
Jun 29, 2025
[Linkpost] “[Paper] Stochastic Parameter Decomposition” by Lee Sharkey, Lucius Bushnaq, Dan Braun
Jun 28, 2025
“Childhood and Education #11: The Art of Learning” by Zvi
Jun 28, 2025
“Proposal for making credible commitments to AIs.” by Cleo Nardo
Jun 28, 2025
“Epoch: What is Epoch?” by Zach Stein-Perlman
Jun 27, 2025
“Recent and forecasted rates of software and hardware progress” by elifland
Jun 27, 2025
“Jankily controlling superintelligence” by ryan_greenblatt
Jun 27, 2025
“Help the AI 2027 team make an online AGI wargame” by Jonas V
Jun 27, 2025
“A Guide For LLM-Assisted Web Research” by nikos, dschwarz, Lawrence Phillips, FutureSearch
Jun 27, 2025
“If Not Now, When?” by Yair Halberstadt
Jun 27, 2025
“A case for courage, when speaking of AI danger” by So8res
Jun 27, 2025
“The Industrial Explosion” by rosehadshar, Tom Davidson
Jun 26, 2025
“Summary of John Halstead’s Book-Length Report on Existential Risks From Climate Change” by Bentham’s Bulldog
Jun 26, 2025
“Tech for Thinking” by sarahconstantin
Jun 26, 2025
“Lurking in the Noise” by J Bostock
Jun 25, 2025
[Linkpost] “New Paper: Ambiguous Online Learning” by Vanessa Kosoy
Jun 25, 2025
“Melatonin Self-Experiment Results” by silentbob
Jun 25, 2025
“What does 10x-ing effective compute get you?” by ryan_greenblatt
Jun 25, 2025
“A regime-change power-vacuum conjecture about group belief” by TsviBT
Jun 25, 2025
“Analyzing A Critique Of The AI 2027 Timeline Forecasts” by Zvi
Jun 24, 2025
“Why ‘training against scheming’ is hard” by Marius Hobbhahn
Jun 24, 2025
“My pitch for the AI Village” by Daniel Kokotajlo
Jun 24, 2025
“Situational Awareness: A One-Year Retrospective” by Nathan Delisle
Jun 24, 2025
“Compressed Computation is (probably) not Computation in Superposition” by Jai Bhagat, Sara Molas Medina, Giorgi Giglemiani, StefanHex
Jun 23, 2025
“‘It isn’t magic’” by Ben (Berlin)
Jun 23, 2025
“Foom & Doom 1: ‘Brain in a box in a basement’” by Steven Byrnes
Jun 23, 2025
“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes
Jun 23, 2025
“Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck
Jun 23, 2025
“Clarifying ‘wisdom’: Foundational topics for aligned AIs to prioritize before irreversible decisions” by Anthony DiGiovanni
Jun 23, 2025
“Racial Dating Preferences and Sexual Racism” by koreindian
Jun 23, 2025
“The Sixteen Kinds of Intimacy” by Ruby
Jun 22, 2025
“Consider chilling out in 2028” by Valentine
Jun 21, 2025
“the sillk pajamas effect” by thiccythot
Jun 21, 2025
“Genomic emancipation” by TsviBT
Jun 21, 2025
“Making deals with early schemers” by Julian Stastny, Olli Järviniemi, Buck
Jun 21, 2025
“AI #121 Part 2: The OpenAI Files” by Zvi
Jun 21, 2025
“Musings on AI Companies of 2025-2026 (Jun 2025)” by Vladimir_Nesov
Jun 21, 2025
“Agentic Misalignment: How LLMs Could be Insider Threats” by Aengus Lynch, Benjamin Wright, Ethan Perez, evhub
Jun 20, 2025
“Did the Army Poison a Bunch of Women in Minnesota?” by rba
Jun 20, 2025
“X explains Z% of the variance in Y” by Leon Lang
Jun 20, 2025
“AI safety techniques leveraging distillation” by ryan_greenblatt
Jun 19, 2025
“Sparsely-connected cross-layer transcoders: preliminary findings” by jacob_drori
Jun 19, 2025
“New Endorsements for ‘If Anyone Builds It, Everyone Dies’” by Malo
Jun 18, 2025
“Fictional Thinking vs Real Thinking” by johnswentworth
Jun 18, 2025
“I made a card game to reduce cognitive biases and logical fallacies but I’m not sure what DV to test in a study on its effectiveness.” by Brad Dunn
Jun 18, 2025
“Prover-Estimator Debate: A New Scalable Oversight Protocol” by Jonah Brown-Cohen, Geoffrey Irving
Jun 17, 2025
“Ok, AI Can Write Pretty Good Fiction Now” by JustisMills
Jun 17, 2025
“Debate experiments at The Curve, LessOnline and Manifest” by Nathan Young
Jun 17, 2025
“Why we’re still doing normal school” by juliawise
Jun 17, 2025
“Endometriosis is an incredibly interesting disease” by Abhishaike Mahajan
Jun 17, 2025
“Estrogen: A trip report” by cube_flipper
Jun 17, 2025
“Intelligence Is Not Magic, But Your Threshold For ‘Magic’ Is Pretty Low” by Expertium
Jun 17, 2025
“Some reprogenetics-related projects you could help with” by TsviBT
Jun 17, 2025
“RTFB: The RAISE Act” by Zvi
Jun 17, 2025
“Model Organisms for Emergent Misalignment” by Anna Soligo, Edward Turner, Mia Taylor, Senthooran Rajamanoharan, Neel Nanda
Jun 17, 2025
“Convergent Linear Representations of Emergent Misalignment” by Anna Soligo, Edward Turner, Senthooran Rajamanoharan, Neel Nanda
Jun 17, 2025
“Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models” by James Chua, Owain_Evans
Jun 17, 2025
[Linkpost] “the void” by nostalgebraist
Jun 11, 2025
“Expectation = intention = setpoint” by jimmy
Jun 11, 2025
“Give Me a Reason(ing Model)” by Zvi
Jun 10, 2025
“Mech interp is not pre-paradigmatic” by Lee Sharkey
Jun 10, 2025
“The True Goal Fallacy” by adamShimi
Jun 10, 2025
“Ghiblification for Privacy” by jefftk
Jun 10, 2025
Jun 10, 2025
“When is it important that open-weight models aren’t released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.” by ryan_greenblatt
Jun 09, 2025
“Administering immunotherapy in the morning seems to really, really matter. Why?” by Abhishaike Mahajan
Jun 09, 2025
[Linkpost] “METR: Recent frontier models are reward hacking” by Daniel Kokotajlo
Jun 09, 2025
“Levels of Doom: Eutopia, Disempowerment, Extinction” by Vladimir_Nesov
Jun 09, 2025
“AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman
Jun 09, 2025
“Busking with Kids” by jefftk
Jun 09, 2025
“Emergent Misalignment on a Budget” by Valerio Pepe
Jun 09, 2025
“Letting Kids Be Outside” by jefftk
Jun 08, 2025
“On working 80%” by adrische
Jun 08, 2025
“Solo Park Play at Three” by jefftk
Jun 07, 2025
“The Mirror Trap” by Cameron Berg
Jun 07, 2025
“LLM in-context learning as (approximating) Solomonoff induction” by Cole Wyeth
Jun 06, 2025
[Linkpost] “Histograms are to CDFs as calibration plots are to...” by Optimization Process
Jun 06, 2025
[Linkpost] “Discontinuous Linear Functions?!” by Zack_M_Davis
Jun 06, 2025
“AI #119: Goodbye AISI?” by Zvi
Jun 06, 2025
“The Stereotype of the Stereotype” by Ike
Jun 06, 2025
“Dating Roundup #6” by Zvi
Jun 05, 2025
“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk
Jun 04, 2025
“Individual AI representatives don’t solve Gradual Disempowerement” by Jan_Kulveit
Jun 04, 2025