Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Episode | Date |
---|---|
“Statistical takes for mech interp research and beyond” by Paul Bogdan
|
Aug 06, 2025 |
[Linkpost] “OpenAI Releases gpt-oss” by anaguma
|
Aug 06, 2025 |
“Childhood and Education #13: College” by Zvi
|
Aug 06, 2025 |
“The perils of under- vs over-sculpting AGI desires” by Steven Byrnes
|
Aug 06, 2025 |
“The Problem” by Rob Bensinger, tanagrabeast, yams, So8res, Eliezer Yudkowsky, Gretta Duleba
|
Aug 05, 2025 |
“Concept Poisoning: Probing LLMs without probes” by Jan Betley, jorio, dylan_f, Owain_Evans
|
Aug 05, 2025 |
“Narrow finetuning is different” by cloud, Stewy Slocum
|
Aug 05, 2025 |
“On Altman’s Interview With Theo Von” by Zvi
|
Aug 05, 2025 |
“Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment” by Liron, Steven Byrnes
|
Aug 05, 2025 |
“Towards Alignment Auditing as a Numbers-Go-Up Science” by Sam Marks
|
Aug 04, 2025 |
“Alcohol is so bad for society that you should probably stop drinking” by KatWoods
|
Aug 04, 2025 |
“Permanent Disempowerment is the Baseline” by Vladimir_Nesov
|
Aug 04, 2025 |
“Should we aim for flourishing over mere survival? The Better Futures series.” by wdmacaskill
|
Aug 04, 2025 |
“Saying Goodbye” by sapphire
|
Aug 04, 2025 |
“Emotions Make Sense” by DaystarEld
|
Aug 03, 2025 |
“Whence the Inkhaven Residency?” by Ben Pace
|
Aug 02, 2025 |
“Many prediction markets would be better off as batched auctions” by William Howard
|
Aug 02, 2025 |
“How many species has humanity driven extinct?” by Raemon
|
Aug 02, 2025 |
“SB-1047 Documentary: The Post-Mortem” by Michaël Trazzi
|
Aug 02, 2025 |
“Podcast: Lincoln Quirk from Wave” by Elizabeth
|
Aug 02, 2025 |
“The Dark Arts As A Scaffolding Skill For Rationality” by Screwtape
|
Aug 01, 2025 |
“Steve Petersen funding” by abramdemski
|
Aug 01, 2025 |
“Two Kinds of Do Overs” by jefftk
|
Aug 01, 2025 |
“Red-Thing-Ism” by J Bostock
|
Aug 01, 2025 |
“Do Not Render Your Counterfactuals” by AlphaAndOmega
|
Aug 01, 2025 |
“Building Black-box Scheming Monitors” by james__p, richbc, Simon Storf, Marius Hobbhahn
|
Jul 31, 2025 |
“Follow-up to ‘My Empathy Is Rarely Kind’” by johnswentworth
|
Jul 31, 2025 |
“I am worried about near-term non-LLM AI developments” by testingthewaters
|
Jul 31, 2025 |
“Childhood and Education: College Admissions” by Zvi
|
Jul 31, 2025 |
“Optimizing The Final Output Can Obfuscate CoT (Research Note)” by lukemarks, jacob_drori, cloud, TurnTrout
|
Jul 30, 2025 |
“China proposes new global AI cooperation organisation” by Matrice Jacobine
|
Jul 30, 2025 |
“My Empathy Is Rarely Kind” by johnswentworth
|
Jul 30, 2025 |
“The many paths to permanent disempowerment even with shutdownable AIs (MATS project summary for feedback)” by GideonF
|
Jul 30, 2025 |
“Spilling the Tea” by Zvi
|
Jul 29, 2025 |
“I wrote a song parody” by CronoDAS
|
Jul 29, 2025 |
“Low P(x-risk) as the Bailey for Low P(doom)” by Vladimir_Nesov
|
Jul 29, 2025 |
“About 30% of Humanity’s Last Exam chemistry/biology answers are likely wrong” by bohaska
|
Jul 29, 2025 |
“Procrastination Drill” by silentbob
|
Jul 29, 2025 |
“Teaching kids to swim” by Steven Byrnes
|
Jul 29, 2025 |
“Recursions on LessOnline 2025” by Error
|
Jul 29, 2025 |
“Simplex Progress Report - July 2025” by Adam Shai, Paul Riechers, hrbigelow, Eric Alt, mntss
|
Jul 29, 2025 |
“Optimally Combining Probe Monitors and Black Box Monitors” by Tim Hua, jamesbaskerville, BionicD0LPH1N, Mia Hopman, Aryan Bhatt, Tyler Tracy
|
Jul 28, 2025 |
“AI Companion Piece” by Zvi
|
Jul 28, 2025 |
“This Is Not Life” by samhealy
|
Jul 28, 2025 |
“Sydney Bing Wikipedia Article: Sydney (Microsoft Prometheus)” by jdp
|
Jul 28, 2025 |
“Maya’s Escape” by Bridgett Kay
|
Jul 27, 2025 |
[Linkpost] “The Purpose of a System is what it Rewards” by robotelvis
|
Jul 27, 2025 |
“my experience on glp-1s as a thin person” by AnnaJo
|
Jul 26, 2025 |
“Anthropic Faces Potentially ‘Business-Ending’ Copyright Lawsuit” by garrison
|
Jul 26, 2025 |
“HPMOR: The (Probably) Untold Lore” by Gretta Duleba, Eliezer Yudkowsky
|
Jul 25, 2025 |
“We Built a Tool to Protect Your Dataset From Simple Scrapers” by TurnTrout, Edward Turner, Dipika Khullar
|
Jul 25, 2025 |
[Linkpost] “Reasoning-Finetuning Repurposes Latent Representations in Base Models” by Jake Ward, lccqqqqq, Neel Nanda
|
Jul 25, 2025 |
“Building and evaluating alignment auditing agents” by Sam Marks, Sam Bowman, Euan Ong, Johannes Treutlein, evhub
|
Jul 24, 2025 |
“The Whole Check” by JustisMills
|
Jul 24, 2025 |
“‘Behaviorist’ RL reward functions lead to scheming” by Steven Byrnes
|
Jul 24, 2025 |
[Linkpost] “A brief perspective from an IMO coordinator” by DirectedEvolution
|
Jul 23, 2025 |
“Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning” by kh4dien, Helena Casademunt, Adam Karvonen, Sam Marks, Senthooran Rajamanoharan, Neel Nanda
|
Jul 23, 2025 |
“On ‘ChatGPT Psychosis’ and LLM Sycophancy” by jdp
|
Jul 23, 2025 |
“Google and OpenAI Get 2025 IMO Gold” by Zvi
|
Jul 23, 2025 |
“Unfaithful chain-of-thought as nudged reasoning” by Paul Bogdan, Uzay Macar, Arthur Conmy, Neel Nanda
|
Jul 23, 2025 |
“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans
|
Jul 22, 2025 |
“Directly Try Solving Alignment for 5 weeks” by Kabir Kumar
|
Jul 22, 2025 |
[Linkpost] “Why Reality Has A Well-Known Math Bias” by Linch
|
Jul 22, 2025 |
“Do ‘adult developmental stages’ theories have any pre-theoretical motivation?” by Said Achmiz
|
Jul 22, 2025 |
“Monthly Roundup #32: July 2025” by Zvi
|
Jul 21, 2025 |
“If Anyone Builds It, Everyone Dies: Call for Translators (for Supplementary Materials)” by yams
|
Jul 21, 2025 |
“Detecting High-Stakes Interactions with Activation Probes” by Arrrlex, williambankes, Urja Pawar, Phil Bland, David Scott Krueger (formerly: capybaralet), Dmitrii Krasheninnikov
|
Jul 21, 2025 |
“HRT in Menopause: A candidate for a case study of epistemology in epidemiology, statistics & medicine” by foodforthought
|
Jul 21, 2025 |
[Linkpost] “GDM also claims IMO gold medal” by Yair Halberstadt
|
Jul 21, 2025 |
“[Fiction] Our Trial” by Nina Panickssery
|
Jul 21, 2025 |
“LLMs Can’t See Pixels or Characters” by Brendan Long
|
Jul 21, 2025 |
“Plato’s Trolley” by dr_s
|
Jul 21, 2025 |
“Your AI Safety org could get EU funding up to €9.08M. Here’s how (+ free personalized support)” by SamuelK
|
Jul 20, 2025 |
“Shallow Water is Dangerous Too” by jefftk
|
Jul 20, 2025 |
“Make More Grayspaces” by Duncan Sabien (Inactive)
|
Jul 20, 2025 |
[Linkpost] “AI Gets IMO Gold Medal: via general-purpose RL, not via narrow, task specific methodology” by Mikhail Samin
|
Jul 19, 2025 |
“A night-watchman ASI as a first step toward a great future” by Eric Neyman
|
Jul 19, 2025 |
“Love stays loved (formerly ‘Skin’)” by Swimmer963 (Miranda Dixon-Luinenburg)
|
Jul 18, 2025 |
“Why it’s hard to make settings for high-stakes control research” by Buck
|
Jul 18, 2025 |
“On METR’s AI Coding RCT” by Zvi
|
Jul 18, 2025 |
“Trying the Obvious Thing” by PranavG, Gabriel Alfour
|
Jul 18, 2025 |
“Video and transcript of talk on ‘Can goodness compete?’” by Joe Carlsmith
|
Jul 17, 2025 |
“On being sort of back and sort of new here” by Loki zen
|
Jul 17, 2025 |
“Comment on ‘Four Layers of Intellectual Conversation’” by Zack_M_Davis
|
Jul 17, 2025 |
“Selective Generalization: Improving Capabilities While Maintaining Alignment” by ariana_azarbal, Matthew A. Clarke, jorio, Cailley Factor, cloud
|
Jul 17, 2025 |
“Bodydouble / Thinking Assistant matchmaking” by Raemon
|
Jul 17, 2025 |
“Kimi K2” by Zvi
|
Jul 16, 2025 |
“Grok 4 Various Things” by Zvi
|
Jul 16, 2025 |
“Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” by Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah
|
Jul 15, 2025 |
“Do confident short timelines make sense?” by TsviBT, abramdemski
|
Jul 15, 2025 |
[Linkpost] “LLM-induced craziness and base rates” by Kaj_Sotala
|
Jul 15, 2025 |
[Linkpost] “Bernie Sanders (I-VT) mentions AI loss of control risk in Gizmodo interview” by Matrice Jacobine
|
Jul 14, 2025 |
“Recent Redwood Research project proposals” by ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman, Tyler Tracy, Aryan Bhatt, Joey Yudelson
|
Jul 14, 2025 |
“Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda
|
Jul 14, 2025 |
“Do LLMs know what they’re capable of? Why this matters for AI safety, and initial findings” by Casey Barkan, Sid Black, Oliver Sourbut
|
Jul 14, 2025 |
“Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance” by Senthooran Rajamanoharan, Neel Nanda
|
Jul 14, 2025 |
“Worse Than MechaHitler” by Zvi
|
Jul 14, 2025 |
“How Does Time Horizon Vary Across Domains?” by Thomas Kwa
|
Jul 14, 2025 |
“xAI’s Grok 4 has no meaningful safety guardrails” by eleventhsavi0r
|
Jul 14, 2025 |
“Stop and check! The parable of the prince and the dog” by Dumbledore’s Army
|
Jul 14, 2025 |
“OpenAI Model Differentiation 101” by Zvi
|
Jul 14, 2025 |
“10x more training compute = 5x greater task length (kind of)” by Expertium
|
Jul 14, 2025 |
“Three Missing Cakes, or One Turbulent Critic?” by Benquo
|
Jul 14, 2025 |
“You can get LLMs to say almost anything you want” by Kaj_Sotala
|
Jul 13, 2025 |
“against that one rationalist mashal about japanese fifth-columnists” by Fraser
|
Jul 13, 2025 |
“Surprises and learnings from almost two months of Leo Panickssery” by Nina Panickssery
|
Jul 13, 2025 |
“Vitalik’s Response to AI 2027” by Daniel Kokotajlo
|
Jul 12, 2025 |
“the jackpot age” by thiccythot
|
Jul 12, 2025 |
“Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” by habryka
|
Jul 11, 2025 |
[Linkpost] “Guide to Redwood’s writing” by Julian Stastny
|
Jul 11, 2025 |
“So You Think You’ve Awoken ChatGPT” by JustisMills
|
Jul 11, 2025 |
[Linkpost] “Open Global Investment as a Governance Model for AGI” by Nick Bostrom
|
Jul 10, 2025 |
“what makes Claude 3 Opus misaligned” by janus
|
Jul 10, 2025 |
“Lessons from the Iraq War about AI policy” by Buck
|
Jul 10, 2025 |
“Generalized Hangriness: A Standard Rationalist Stance Toward Emotions” by johnswentworth
|
Jul 10, 2025 |
“Evaluating and monitoring for AI scheming” by Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, Rohin Shah
|
Jul 10, 2025 |
“White Box Control at UK AISI - Update on Sandbagging Investigations” by Joseph Bloom, Jordan Taylor, Connor Kissane, Sid Black, merizian, alexdzm, jacoba, Ben Millwood, Alan Cooney
|
Jul 10, 2025 |
“80,000 Hours is producing AI in Context — a new YouTube channel. Our first video, about the AI 2027 scenario, is up!” by chanamessinger
|
Jul 10, 2025 |
[Linkpost] “No, We’re Not Getting Meaningful Oversight of AI” by Davidmanheim
|
Jul 10, 2025 |
“What’s worse, spies or schemers?” by Buck, Julian Stastny
|
Jul 09, 2025 |
“No, Grok, No” by Zvi
|
Jul 09, 2025 |
“Applying right-wing frames to AGI (geo)politics” by Richard_Ngo
|
Jul 09, 2025 |
“A deep critique of AI 2027’s bad timeline models” by titotal
|
Jul 09, 2025 |
“Subway Particle Levels Aren’t That High” by jefftk
|
Jul 09, 2025 |
“An Opinionated Guide to Using Anki Correctly” by Luise
|
Jul 09, 2025 |
“Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger
|
Jul 08, 2025 |
“Balsa Update: Springtime in DC” by Zvi
|
Jul 08, 2025 |
[Linkpost] “A Theory of Structural Independence” by Matthias G. Mayer
|
Jul 08, 2025 |
“On Alpha School” by Zvi
|
Jul 08, 2025 |
“You Can’t Objectively Compare Seven Bees to One Human” by J Bostock
|
Jul 08, 2025 |
“Literature Review: Risks of MDMA” by Elizabeth
|
Jul 07, 2025 |
“45 - Samuel Albanie on DeepMind’s AGI Safety Approach” by DanielFilan
|
Jul 07, 2025 |
“On the functional self of LLMs” by eggsyntax
|
Jul 07, 2025 |
“Shutdown Resistance in Reasoning Models” by benwr, JeremySchlatter, Jeffrey Ladish
|
Jul 06, 2025 |
“The Cult of Pain” by Martin Sustrik
|
Jul 05, 2025 |
[Linkpost] “Claude is a Ravenclaw” by Adam Newgas
|
Jul 05, 2025 |
“‘Buckle up bucko, this ain’t over till it’s over.’” by Raemon
|
Jul 05, 2025 |
“How much novel security-critical infrastructure do you need during the singularity?” by Buck
|
Jul 05, 2025 |
“‘AI for societal uplift’ as a path to victory” by Raymond Douglas
|
Jul 04, 2025 |
“Two proposed projects on abstract analogies for scheming” by Julian Stastny
|
Jul 04, 2025 |
“Outlive: A Critical Review” by MichaelDickens
|
Jul 04, 2025 |
“Authors Have a Responsibility to Communicate Clearly” by TurnTrout
|
Jul 04, 2025 |
[Linkpost] “MIRI Newsletter #123” by Harlan, Rob Bensinger
|
Jul 04, 2025 |
[Linkpost] “Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals” by Marius Hobbhahn
|
Jul 03, 2025 |
“Call for suggestions - AI safety course” by boazbarak
|
Jul 03, 2025 |
[Linkpost] “IABIED: Advertisement design competition” by yams
|
Jul 03, 2025 |
“Congress Asks Better Questions” by Zvi
|
Jul 03, 2025 |
“Curing PMS with Hair Loss Pills” by David Lorell
|
Jul 02, 2025 |
“AI Task Length Horizons in Offensive Cybersecurity” by Sean Peters
|
Jul 02, 2025 |
“Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks
|
Jul 02, 2025 |
“There are two fundamentally different constraints on schemers” by Buck
|
Jul 02, 2025 |
“‘What’s my goal?’” by Raemon
|
Jul 02, 2025 |
“A Simple Explanation of AGI Risk” by TurnTrout
|
Jul 02, 2025 |
“AI Moratorium Stripped From BBB” by Zvi
|
Jul 01, 2025 |
“Scientific Discovery in the Age of Artificial Intelligence” by Jessica Rumbelow
|
Jul 01, 2025 |
“SLT for AI Safety” by Jesse Hoogland
|
Jul 01, 2025 |
“The best simple argument for Pausing AI?” by Gary Marcus
|
Jul 01, 2025 |
“SAE on activation differences” by Santiago Aranguri, jacob_drori, Neel Nanda
|
Jul 01, 2025 |
“What We Learned Trying to Diff Base and Chat Models (And Why It Matters)” by Clément Dumas, Julian Minder, Neel Nanda
|
Jun 30, 2025 |
“If you want to be vegan but you worry about health effects of no meat, consider being vegan except for mussels/oysters” by KatWoods
|
Jun 30, 2025 |
[Linkpost] “Project Vend: Can Claude run a small shop?” by Gunnar_Zarncke
|
Jun 30, 2025 |
“Paradigms for computation” by Cole Wyeth
|
Jun 30, 2025 |
“life lessons from poker” by thiccythot
|
Jun 30, 2025 |
“Circuits in Superposition 2: Now with Less Wrong Math” by Linda Linsefors, Lucius Bushnaq
|
Jun 30, 2025 |
“I underestimated safety research speedups from safe AI” by Dan Braun
|
Jun 30, 2025 |
“Conciseness Manifesto” by Vasyl Dotsenko
|
Jun 29, 2025 |
“Support for bedrock liberal principles seems to be in pretty bad shape these days” by Max H
|
Jun 29, 2025 |
[Linkpost] “A Depressed Shrink Tries Shrooms” by AlphaAndOmega
|
Jun 29, 2025 |
[Linkpost] “[Paper] Stochastic Parameter Decomposition” by Lee Sharkey, Lucius Bushnaq, Dan Braun
|
Jun 28, 2025 |
“Childhood and Education #11: The Art of Learning” by Zvi
|
Jun 28, 2025 |
“Proposal for making credible commitments to AIs.” by Cleo Nardo
|
Jun 28, 2025 |
“Epoch: What is Epoch?” by Zach Stein-Perlman
|
Jun 27, 2025 |
“Recent and forecasted rates of software and hardware progress” by elifland
|
Jun 27, 2025 |
“Jankily controlling superintelligence” by ryan_greenblatt
|
Jun 27, 2025 |
“Help the AI 2027 team make an online AGI wargame” by Jonas V
|
Jun 27, 2025 |
“A Guide For LLM-Assisted Web Research” by nikos, dschwarz, Lawrence Phillips, FutureSearch
|
Jun 27, 2025 |
“If Not Now, When?” by Yair Halberstadt
|
Jun 27, 2025 |
“A case for courage, when speaking of AI danger” by So8res
|
Jun 27, 2025 |
“The Industrial Explosion” by rosehadshar, Tom Davidson
|
Jun 26, 2025 |
“Summary of John Halstead’s Book-Length Report on Existential Risks From Climate Change” by Bentham’s Bulldog
|
Jun 26, 2025 |
“Tech for Thinking” by sarahconstantin
|
Jun 26, 2025 |
“Lurking in the Noise” by J Bostock
|
Jun 25, 2025 |
[Linkpost] “New Paper: Ambiguous Online Learning” by Vanessa Kosoy
|
Jun 25, 2025 |
“Melatonin Self-Experiment Results” by silentbob
|
Jun 25, 2025 |
“What does 10x-ing effective compute get you?” by ryan_greenblatt
|
Jun 25, 2025 |
“A regime-change power-vacuum conjecture about group belief” by TsviBT
|
Jun 25, 2025 |
“Analyzing A Critique Of The AI 2027 Timeline Forecasts” by Zvi
|
Jun 24, 2025 |
“Why ‘training against scheming’ is hard” by Marius Hobbhahn
|
Jun 24, 2025 |
“My pitch for the AI Village” by Daniel Kokotajlo
|
Jun 24, 2025 |
“Situational Awareness: A One-Year Retrospective” by Nathan Delisle
|
Jun 24, 2025 |
“Compressed Computation is (probably) not Computation in Superposition” by Jai Bhagat, Sara Molas Medina, Giorgi Giglemiani, StefanHex
|
Jun 23, 2025 |
“‘It isn’t magic’” by Ben (Berlin)
|
Jun 23, 2025 |
“Foom & Doom 1: ‘Brain in a box in a basement’” by Steven Byrnes
|
Jun 23, 2025 |
“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes
|
Jun 23, 2025 |
“Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck
|
Jun 23, 2025 |
“Clarifying ‘wisdom’: Foundational topics for aligned AIs to prioritize before irreversible decisions” by Anthony DiGiovanni
|
Jun 23, 2025 |
“Racial Dating Preferences and Sexual Racism” by koreindian
|
Jun 23, 2025 |
“The Sixteen Kinds of Intimacy” by Ruby
|
Jun 22, 2025 |
“Consider chilling out in 2028” by Valentine
|
Jun 21, 2025 |
“the sillk pajamas effect” by thiccythot
|
Jun 21, 2025 |
“Genomic emancipation” by TsviBT
|
Jun 21, 2025 |
“Making deals with early schemers” by Julian Stastny, Olli Järviniemi, Buck
|
Jun 21, 2025 |
“AI #121 Part 2: The OpenAI Files” by Zvi
|
Jun 21, 2025 |
“Musings on AI Companies of 2025-2026 (Jun 2025)” by Vladimir_Nesov
|
Jun 21, 2025 |
“Agentic Misalignment: How LLMs Could be Insider Threats” by Aengus Lynch, Benjamin Wright, Ethan Perez, evhub
|
Jun 20, 2025 |
“Did the Army Poison a Bunch of Women in Minnesota?” by rba
|
Jun 20, 2025 |
“X explains Z% of the variance in Y” by Leon Lang
|
Jun 20, 2025 |
“AI safety techniques leveraging distillation” by ryan_greenblatt
|
Jun 19, 2025 |
“Sparsely-connected cross-layer transcoders: preliminary findings” by jacob_drori
|
Jun 19, 2025 |
“New Endorsements for ‘If Anyone Builds It, Everyone Dies’” by Malo
|
Jun 18, 2025 |
“Fictional Thinking vs Real Thinking” by johnswentworth
|
Jun 18, 2025 |
“I made a card game to reduce cognitive biases and logical fallacies but I’m not sure what DV to test in a study on its effectiveness.” by Brad Dunn
|
Jun 18, 2025 |
“Prover-Estimator Debate:
A New Scalable Oversight Protocol” by Jonah Brown-Cohen, Geoffrey Irving
|
Jun 17, 2025 |
“Ok, AI Can Write Pretty Good Fiction Now” by JustisMills
|
Jun 17, 2025 |
“Debate experiments at The Curve, LessOnline and Manifest” by Nathan Young
|
Jun 17, 2025 |
“Why we’re still doing normal school” by juliawise
|
Jun 17, 2025 |
“Endometriosis is an incredibly interesting disease” by Abhishaike Mahajan
|
Jun 17, 2025 |
“Estrogen: A trip report” by cube_flipper
|
Jun 17, 2025 |
“Intelligence Is Not Magic, But Your Threshold For ‘Magic’ Is Pretty Low” by Expertium
|
Jun 17, 2025 |
“Some reprogenetics-related projects you could help with” by TsviBT
|
Jun 17, 2025 |
“RTFB: The RAISE Act” by Zvi
|
Jun 17, 2025 |
“Model Organisms for Emergent Misalignment” by Anna Soligo, Edward Turner, Mia Taylor, Senthooran Rajamanoharan, Neel Nanda
|
Jun 17, 2025 |
“Convergent Linear Representations of Emergent Misalignment” by Anna Soligo, Edward Turner, Senthooran Rajamanoharan, Neel Nanda
|
Jun 17, 2025 |
“Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models” by James Chua, Owain_Evans
|
Jun 17, 2025 |
[Linkpost] “the void” by nostalgebraist
|
Jun 11, 2025 |
“Expectation = intention = setpoint” by jimmy
|
Jun 11, 2025 |
“Give Me a Reason(ing Model)” by Zvi
|
Jun 10, 2025 |
“Mech interp is not pre-paradigmatic” by Lee Sharkey
|
Jun 10, 2025 |
“The True Goal Fallacy” by adamShimi
|
Jun 10, 2025 |
“Ghiblification for Privacy” by jefftk
|
Jun 10, 2025 |
|
Jun 10, 2025 |
“When is it important that open-weight models aren’t released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.” by ryan_greenblatt
|
Jun 09, 2025 |
“Administering immunotherapy in the morning seems to really, really matter. Why?” by Abhishaike Mahajan
|
Jun 09, 2025 |
[Linkpost] “METR: Recent frontier models are reward hacking” by Daniel Kokotajlo
|
Jun 09, 2025 |
“Levels of Doom: Eutopia, Disempowerment, Extinction” by Vladimir_Nesov
|
Jun 09, 2025 |
“AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman
|
Jun 09, 2025 |
“Busking with Kids” by jefftk
|
Jun 09, 2025 |
“Emergent Misalignment on a Budget” by Valerio Pepe
|
Jun 09, 2025 |
“Letting Kids Be Outside” by jefftk
|
Jun 08, 2025 |
“On working 80%” by adrische
|
Jun 08, 2025 |
“Solo Park Play at Three” by jefftk
|
Jun 07, 2025 |
“The Mirror Trap” by Cameron Berg
|
Jun 07, 2025 |
“LLM in-context learning as (approximating) Solomonoff induction” by Cole Wyeth
|
Jun 06, 2025 |
[Linkpost] “Histograms are to CDFs as calibration plots are to...” by Optimization Process
|
Jun 06, 2025 |
[Linkpost] “Discontinuous Linear Functions?!” by Zack_M_Davis
|
Jun 06, 2025 |
“AI #119: Goodbye AISI?” by Zvi
|
Jun 06, 2025 |
“The Stereotype of the Stereotype” by Ike
|
Jun 06, 2025 |
“Dating Roundup #6” by Zvi
|
Jun 05, 2025 |
“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk
|
Jun 04, 2025 |
“Individual AI representatives don’t solve Gradual Disempowerement” by Jan_Kulveit
|
Jun 04, 2025 |