Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
| Episode | Date |
|---|---|
|
“Expert Views on Continual Learning: Survey Results and Forecasts” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd
|
Jun 25, 2026 |
|
“AI catastrophe: more like a genocide than a thought experiment” by KatjaGrace
|
Jun 25, 2026 |
|
“Elephant seal IV” by KatjaGrace
|
Jun 25, 2026 |
|
“What is up with e/acc?” by KatjaGrace
|
Jun 25, 2026 |
|
“AI pause: the case for ASAP” by KatjaGrace
|
Jun 24, 2026 |
|
“Reward Hacking Without Egregious Misalignment in an RL-Only Setting” by Joey Yudelson, Vladimir Ivanov, ryan_greenblatt
|
Jun 24, 2026 |
|
“Planning for Preservation in the Age of AI” by Raelifin
|
Jun 24, 2026 |
|
“Risk-Averse AIs” by wdmacaskill, Elliott Thornley (EJT)
|
Jun 24, 2026 |
|
“And what happens next?” by Sean Herrington
|
Jun 24, 2026 |
|
“Superintelligence vs. The Second Strike” by Felix Choussat
|
Jun 24, 2026 |
|
“Monthly Roundup #43: June 2026” by Zvi
|
Jun 24, 2026 |
|
“The worthlessness of vitamin D is mildly exaggerated” by dynomight
|
Jun 23, 2026 |
|
“A system overview for near-term, low-trust AI compute verification” by Naci Cankaya
|
Jun 23, 2026 |
|
“Model Size Scaling in 2023-2031” by Vladimir_Nesov
|
Jun 23, 2026 |
|
“GLM-5.2 Is The New Best Open Model” by Zvi
|
Jun 22, 2026 |
|
“The AI Industrial Explosion — Part 4: Cheap power” by djbinder
|
Jun 22, 2026 |
|
“A Theory of Prompt Injection (and why you should study roles)” by Charles Ye, softboiledheart
|
Jun 22, 2026 |
|
“Coup is the Pareto-optimal social game” by Daniel Tan
|
Jun 22, 2026 |
|
“A brief list of ways AI safety efforts could be net negative” by Elias Schmied
|
Jun 22, 2026 |
|
“NLA explanations can be shortened without harming reconstruction” by loops
|
Jun 22, 2026 |
|
“Introducing MonitoringBench” by monika_j
|
Jun 22, 2026 |
|
“Claude Fable 5 and Mythos 5: Capabilities” by Zvi
|
Jun 21, 2026 |
|
“The Invisible Side of AI Governance” by Charbel-Raphaël
|
Jun 21, 2026 |
|
“Google Can’t Math Parsecs” by jefftk
|
Jun 21, 2026 |
|
″[Linkpost] How Transparent Is DiffusionGemma (and why it matters)” by Josh Engels, Callum McDougall, bilalchughtai, János Kramár, Senthooran Rajamanoharan, Arthur Conmy
|
Jun 21, 2026 |
|
“Would anybody here be interested in a “mistake postmortem” discussion group?” by SK2
|
Jun 20, 2026 |
|
“Hyperstition as the Natural Enemy of Rationality” by alseph
|
Jun 20, 2026 |
|
“AI Safety Ecosystem Research notes” by Eneasz
|
Jun 20, 2026 |
|
“Research agenda: Interpretive debate” by Shi
|
Jun 20, 2026 |
|
“The LLM shoggoth meme is weirder than you think” by HedonicEscalator
|
Jun 20, 2026 |
|
“Introduction: Gaussian Natural Latents” by Haru
|
Jun 20, 2026 |
|
“San Silvestro” by Tomás B.
|
Jun 19, 2026 |
|
“The one-week sprint” by Daniel Tan
|
Jun 19, 2026 |
|
“On “Model Organisms”” by J Bostock
|
Jun 19, 2026 |
|
“The distillation double bind: Distilling misaligned models either transfers misalignment or it doesn’t” by Alek Westover, SebastianP, Alexa Pan, Jozdien
|
Jun 18, 2026 |
|
“AI #173: AI Pauses” by Zvi
|
Jun 18, 2026 |
|
″“Did you lie?” Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms” by Alan Cooney, David Africa, Geoffrey Irving
|
Jun 18, 2026 |
|
“Contra Pace on When to Apologize” by Zack_M_Davis
|
Jun 18, 2026 |
|
“GDM AI Control Roadmap” by Mary Phuong, Erik Jenner, Rohin Shah, Seb Farquhar
|
Jun 18, 2026 |
|
“Your Model Organisms Might Be Fried” by Daniel Tan, J Bostock, draganover, ma-rmartinez, sidbaines, David Africa
|
Jun 18, 2026 |
|
“Rational Agentic Maximalist Philosophies” by Connor Blake
|
Jun 18, 2026 |
|
“Leveraged on being right” by Ben Pace
|
Jun 18, 2026 |
|
“Gears for political races” by Tom Smith
|
Jun 17, 2026 |
|
“Several frontier models are substantially prefill aware” by yeedrag, Parv Mahajan, David Africa, alexsouly, Jordan Taylor, RobertKirk
|
Jun 17, 2026 |
|
“Alignement pretraining could backfire” by Alexandre Variengien
|
Jun 17, 2026 |
|
“The Financial Ledger Theory of Apologies” by Ben Pace
|
Jun 17, 2026 |
|
“The Once And Future Fable #3: Fix This Code” by Zvi
|
Jun 17, 2026 |
|
[Linkpost] “Scaling Hypothesis #2: Are Humans Just More Over-Parameterized?” by gwern
|
Jun 17, 2026 |
|
[Linkpost] “Guardian Angels: LLM Personalization for Productivity and Security” by gwern
|
Jun 17, 2026 |
|
“Predicting LLM Safety Before Release by Simulating Deployment” by Tomek Korbak, Marcus Williams, micahcarroll, Cameron Raymond, Hannah Sheahan
|
Jun 17, 2026 |
|
“How the AI Village works” by Adam B
|
Jun 17, 2026 |
|
“What are some angles of attack for making continual learning safer?” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd
|
Jun 16, 2026 |
|
“Fable and Mythos: Model Welfare” by Zvi
|
Jun 16, 2026 |
|
“Does preservation make sense before we know how to revive?” by Aurelia
|
Jun 16, 2026 |
|
“Synthetic document finetuning for instilling positive traits” by CallumMcDougall, Arthur Conmy, Neel Nanda
|
Jun 16, 2026 |
|
“A Test Suite for Concepts” by Gretta Duleba
|
Jun 16, 2026 |
|
“The Once And Future Fable #2” by Zvi
|
Jun 15, 2026 |
|
“A frontier AI company should shut down” by MichaelDickens
|
Jun 15, 2026 |
|
“Why Do Naive SFT Filters For Safety Properties Fail?” by Josh Engels, Neel Nanda
|
Jun 14, 2026 |
|
“Impressions at the Extremity of Civilization” by Ben Pace
|
Jun 14, 2026 |
|
“The Hidden Structures of Problems” by spencerg
|
Jun 14, 2026 |
|
“American Government Takes Down Claude Fable” by Zvi
|
Jun 13, 2026 |
|
“How might continual learning affect safety and alignment?” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd
|
Jun 13, 2026 |
|
“SFT Drives Gemini’s Safety Properties” by Josh Engels, Arthur Conmy, bilalchughtai, Neel Nanda
|
Jun 13, 2026 |
|
“The term “AGI” is almost useless at this point [Linkpost]” by Noosphere89
|
Jun 13, 2026 |
|
“The Uncertainty That Matters Isn’t Fundamental” by jimmy
|
Jun 13, 2026 |
|
[Linkpost] “US government directive to suspend access to Fable 5 and Mythos 5” by Capybasilisk
|
Jun 13, 2026 |
|
“Claude Fable 5 and Mythos 5: The System Card” by Zvi
|
Jun 12, 2026 |
|
“Simulating Simulators” by kromem
|
Jun 12, 2026 |
|
“Citations Needed: Magic Encyclopedias to Save the World” by Oliver Sourbut
|
Jun 12, 2026 |
|
“Implications of Continual Learning for LLM Agents: Introduction” by RohanS, Rauno Arike, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd
|
Jun 12, 2026 |
|
“Reward Hacking at the 1937 World’s Fair” by frmsaul
|
Jun 12, 2026 |
|
“Building and evaluating model diffing agents” by bilalchughtai, Josh Engels, Neel Nanda
|
Jun 12, 2026 |
|
“Sympathy for both sides of the egregious misalignment debate” by Steven Byrnes
|
Jun 12, 2026 |
|
“Celene’s thoughts on consciousness” by ToasterLightning
|
Jun 12, 2026 |
|
“Parkinson’s Heuristic” by Ben Pace
|
Jun 12, 2026 |
|
“PSA: Almost nobody is working on alignment” by Chi Nguyen, peterbarnett
|
Jun 12, 2026 |
|
“AI #172: The First Fable” by Zvi
|
Jun 11, 2026 |
|
“Models May Behave Worse When Eval Aware” by Senthooran Rajamanoharan, Neel Nanda
|
Jun 11, 2026 |
|
“Thoughts on Claude Fable’s silent safeguards” by Andy Arditi
|
Jun 11, 2026 |
|
“You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them” by RobinHa
|
Jun 11, 2026 |
|
“Tracing Eval-Awareness Emergence Through Training of OLMo 3” by Ram Bharadwaj, RobertKirk
|
Jun 10, 2026 |
|
“Anthropic did not call for a pause on AI” by Andrea_Miotti, Gabriel Alfour
|
Jun 10, 2026 |
|
“Three types of model organism” by Francis Rhys Ward
|
Jun 10, 2026 |
|
“Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models” by Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask
|
Jun 10, 2026 |
|
“Sequent: scale and automation for higher confidence in alignment” by Geoffrey Irving, Alex HT, Jesse Hoogland, Daniel Murfet, Jacob Pfau, Marco Cozzi, Stan van Wingerden
|
Jun 10, 2026 |
|
“Machinic Psychopharmacology: Do LLMs Self-Medicate?” by Sid Black, Joseph Bloom
|
Jun 10, 2026 |
|
“The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably” by Alex Amadori
|
Jun 10, 2026 |
|
″“Programmer Science Fiction: My case for a new sub-genre”, Sam T. Oates 2026” by gwern
|
Jun 10, 2026 |
|
“Even “illegible” Mythos reasoning traces seem pretty legible” by faul_sname
|
Jun 10, 2026 |
|
“Claude Fable 5 and Mythos 5 [Linkpost]” by fluxxrider
|
Jun 10, 2026 |
|
“Three Labs With a Plan and A Memorandum” by Zvi
|
Jun 10, 2026 |
|
“A Mike’s-Eye View of ARC’s Research” by Jacob_Hilton
|
Jun 09, 2026 |
|
“Towards a Formal Scientific Epistemology” by Richard_Ngo
|
Jun 09, 2026 |
|
“LLMs and almost good code” by kqr
|
Jun 09, 2026 |
|
“On Slop” by Jan
|
Jun 09, 2026 |
|
“The Machines Lack Honour” by Raymond Douglas
|
Jun 09, 2026 |
|
“How to build a cancer vaccine, and whether they will work this time” by Abhishaike Mahajan
|
Jun 09, 2026 |
|
“Efficient tradeoffs and the safety-usefulness tradeoff model” by Buck
|
Jun 08, 2026 |
|
“Bun’s Migration from Zig to Rust as a Potential Case Study for Gradual Disempowerment” by Sayhan Yalvaçer
|
Jun 08, 2026 |
|
“Mental causation is not load-bearing” by jessicata
|
Jun 08, 2026 |
|
“How Far Apart Does a Model Think Its Tokens Are?” by Brendan Long
|
Jun 08, 2026 |
|
“Can activation verbalizers surface an internal chain of thought?” by oakhu, ryan_greenblatt
|
Jun 07, 2026 |
|
“Against Corrigibility” by peralice
|
Jun 07, 2026 |
|
“Coming Around To Political Donations” by jefftk
|
Jun 07, 2026 |
|
“OpenAI Offers A New Policy Blueprint” by Zvi
|
Jun 06, 2026 |
|
“Optimisation over non-stationary distributions creates weirder minds” by Samuel Ratnam, Pjain
|
Jun 06, 2026 |
|
“Why Software Automation Is Hard” by silentbob
|
Jun 06, 2026 |
|
“SecureBio Detection is Hiring Software Engineers” by jefftk
|
Jun 06, 2026 |
|
“What if Anthropic unilaterally paused capabilities development right now?” by Karl von Wendt
|
Jun 06, 2026 |
|
“Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks” by Mark Kagach ☘️, EliasSchlie, Thomas Van Damme, JustinShovelain
|
Jun 06, 2026 |
|
“Beyond the lexical personality traits: What is the structure of personality?” by tailcalled
|
Jun 06, 2026 |
|
“Logits as a new monitor for evaluation awareness” by Santiago Aranguri
|
Jun 05, 2026 |
|
“My research agenda and work” by Seth Herd
|
Jun 05, 2026 |
|
“One Year of PauseAI UK” by Joseph Miller, PauseAI UK
|
Jun 05, 2026 |
|
“Learnings from starting an AI safety research team” by draganover, Erin Robertson
|
Jun 05, 2026 |
|
“Training Deliberative Monitors for Black-Box Scheming Detection” by aksh-n, adityasinha, Victor Gillioz, Simon Storf, Kilian Merkelbach, richbc, Axel Højmark, Marius Hobbhahn
|
Jun 05, 2026 |
|
“Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition” by Oliver Sourbut, Josh Jacobson, Future of Life Foundation (FLF)
|
Jun 05, 2026 |
|
″(Mis)generalization of Helpful-Only Fine-tuning” by Omar Khursheed, Baram Sosis, Fabien Roger
|
Jun 05, 2026 |
|
“AI #171: False Flag” by Zvi
|
Jun 04, 2026 |
|
“Rohin Shah on AGI Safety” by anaguma
|
Jun 04, 2026 |
|
“Building Better Activation Oracles” by ceselder, jan_bauer, Niclas Luick, Adam Karvonen, Neel Nanda
|
Jun 04, 2026 |
|
“Sixteen schemes for AI safety” by Austin Chen
|
Jun 04, 2026 |
|
“Don’t Edit Your Ideas Before Having Them” by Hide
|
Jun 03, 2026 |
|
“Trump Signs Executive Order For AI Testing Prior To Frontier Model Releases” by Zvi
|
Jun 03, 2026 |
|
“Society Explained: a tool for efficiently exploring >100 theories of society” by spencerg
|
Jun 03, 2026 |
|
“China won’t win the AI race but would it be much worse if it did?” by Chastity Ruth
|
Jun 03, 2026 |
|
“A Town Without Children” by SeñorDingDong
|
Jun 03, 2026 |
|
“Claude Opus 4.8: Capabilities and Reactions” by Zvi
|
Jun 03, 2026 |
|
“My favorite depiction of utopia” by Caleb Biddulph
|
Jun 03, 2026 |
|
“Why Even Experts Don’t Know What to Do About AI Risk” by Luc Brinkman, plex
|
Jun 02, 2026 |
|
“Agent Foundations Reminds Me of Continental Philosophy” by IanWS
|
Jun 02, 2026 |
|
“Announcing the ARC White-Box Estimation Challenge” by Jacob_Hilton
|
Jun 02, 2026 |
|
“Tech I’m skeptical of and why” by harsimony
|
Jun 02, 2026 |
|
“Dissolving the Deep Learning Sample Efficiency Gap” by Samuel Knoche
|
Jun 02, 2026 |
|
″“Contagious Humming” to Silence a Room” by JohnofCharleston
|
Jun 01, 2026 |
|
[Linkpost] “NYT: Senator Sanders Proposes Gov’t Take 50% Ownership of AI labs” by Julian Bradshaw
|
Jun 01, 2026 |
|
“Opus 4.8 Part 2: Model Welfare” by Zvi
|
Jun 01, 2026 |
|
[Linkpost] “Some humans are both male and female, and can (but shouldn’t) have children with themselves” by HedonicEscalator
|
Jun 01, 2026 |
|
“Outrunning your headlights” by mattshu0410
|
Jun 01, 2026 |
|
“Lighthaven East - A Feasibility Study” by JohnofCharleston
|
Jun 01, 2026 |
|
“Notes on axes of variation in third-party risk assessment” by Buck
|
May 31, 2026 |
|
“Financial Costs of an AI Pause?” by PeterMcCluskey
|
May 31, 2026 |
|
“When Are Two Networks the Same?
Tensor Similarity for Mechanistic Interpretability” by Logan Riggs, tdooms, Conflux, lwroe, MLNissenGonzalez
|
May 31, 2026 |
|
“Testing Gemini models for scheming tendencies” by Vika, David Lindner, Seb Farquhar, Rohin Shah
|
May 31, 2026 |
|
“Comment on “Banning Said Achmiz”” by Zack_M_Davis
|
May 30, 2026 |
|
“Announcing: Iliad’s Fall 2026 Programs” by David Udell, Alexander Gietelink Oldenziel, Leon Lang
|
May 30, 2026 |
|
“Data you could have observed but didn’t” by Gretta Duleba
|
May 30, 2026 |
|
“Claude Opus 4.8: The System Card” by Zvi
|
May 29, 2026 |
|
“Retrying vs Resampling in AI Control” by james.lucassen, Adam Kaufman
|
May 29, 2026 |
|
“AI Researchers, Ask Yourself These 6 Questions to Strengthen Your Moral Muscles” by Max Tegmark
|
May 29, 2026 |
|
“Developmental Cognitive Interpretability: A Research Agenda for Modelling Generalisation and Predicting Agent Behaviour” by JasonB, Edward James Young
|
May 29, 2026 |
|
“Does Claude really care about you?” by Simon Lermen
|
May 29, 2026 |
|
“How can the middle powers avoid getting trounced during the intelligence explosion? A plan.” by Tom Davidson
|
May 29, 2026 |
|
“Trees are mostly made of air and a generalizable lesson for AI safety” by zroe1
|
May 29, 2026 |
|
“Advice for making robust-to-training model organisms” by SebastianP, Alek Westover, Vivek Hebbar, Julian Stastny, Dylan Xu
|
May 29, 2026 |
|
“Mnemonic portraits for 19,023 human genes” by Brinedew
|
May 29, 2026 |
|
“Claude… doesn’t know who you are?” by Smaug123
|
May 29, 2026 |
|
“Some Dating Stories” by johnswentworth
|
May 29, 2026 |
|
“AI #170: Lack of Executive Order” by Zvi
|
May 28, 2026 |
|
“Infinite ethics and UDASSA” by David Matolcsi
|
May 28, 2026 |
|
“The ballad of TIGIT” by Abhishaike Mahajan
|
May 27, 2026 |
|
“Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming” by Jasmine Li, Alex Turner
|
May 27, 2026 |
|
“LLMs Through the Eyes of Vinge” by Gordon Seidoh Worley
|
May 27, 2026 |
|
“Announcing Geodesic Research” by Puria, Cam, Alexandra Narin, Edward James Young, Kyle O’Brien
|
May 27, 2026 |
|
“Full automation of AI R&D probably yields a large speed up even without a software-only singularity” by ryan_greenblatt
|
May 27, 2026 |
|
“Quantitative AI risk assessment: a starting point” by Henry Papadatos, jakub_krys, malcolmmurray, Renn Karageorgieva
|
May 27, 2026 |
|
“Finding the Mole: Bayesianism is Hard” by laniakea
|
May 27, 2026 |
|
“Notes on Fourier Analysis” by Menotim
|
May 27, 2026 |
|
“Standard deviations from just two values” by kqr
|
May 27, 2026 |
|
“Contra Wentworth on Physical Attractiveness for Men” by Gretta Duleba
|
May 27, 2026 |
|
“Practical Learnings from Synthetic Document Finetuning” by Axel Højmark, Jérémy Scheurer
|
May 27, 2026 |
|
“RTMH: Pope Leo’s Magnifica Humanitas on AI” by Zvi
|
May 26, 2026 |
|
“Claude, Author of the Humanitas” by Linch
|
May 26, 2026 |
|
“Brackets Are a Bad Way to Regulate” by Hide
|
May 26, 2026 |
|
“Many portions of Magnifica Humanitas appear to be AI-written” by DanielFilan
|
May 26, 2026 |
|
“Donating 80% While It Still Counts” by jefftk
|
May 26, 2026 |
|
“Cognitive Security as an AI Safety Cause Area” by jsteinhardt
|
May 25, 2026 |
|
“Linkpost: New Vatican Encyclical on AI Governance” by Jackson Wagner
|
May 25, 2026 |
|
“A (Slightly) Mechanistic Theory for Exponentially Increasing AI Time Horizons?” by Oliver Sourbut
|
May 25, 2026 |
|
“Taxing Small Cars To Improve MPG” by jefftk
|
May 25, 2026 |
|
“We made a map of the doom debate” by Sean Herrington, Paul Hindoian, mikaelacankosyan, David Bravo, keivnc, Josh Tuffy, Christopher Davis, Khai Tran, Maryam Hampaei
|
May 24, 2026 |
|
“Your Left Brain Doesn’t Trade With Your Right” by Alexander Gietelink Oldenziel
|
May 24, 2026 |
|
“Gemini 3.5 Flash Looks Good For How Fast It Is” by Zvi
|
May 23, 2026 |
|
“Probabilities are not the right concept” by David Matolcsi
|
May 23, 2026 |
|
“Basic principles for dressing better.” by spookycat
|
May 23, 2026 |
|
“Will we really put data centers in space?” by Avi Parrack, fin
|
May 23, 2026 |
|
“PLA Daily Translation: Reflections on Warfare Brought by AGI” by eeeee
|
May 23, 2026 |
|
“Out-of-Context Reasoning (OOCR) in LLMs: A Short Primer and Reading List” by Owain_Evans
|
May 23, 2026 |
|
“Numb mental state shifts” by KatjaGrace
|
May 23, 2026 |
|
“You can opt out of allergies” by Rattengift
|
May 22, 2026 |
|
“Notes on Collaborating with Claude Opus” by Nissa Seru
|
May 22, 2026 |
|
“Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks” by Nathaniel Mitrani, sassanb, Cam Tice, Puria
|
May 22, 2026 |
|
“What am I, if not an AI?” by makiba
|
May 21, 2026 |
|
“AI #169: New Knowledge” by Zvi
|
May 21, 2026 |
|
“Loss of Oversight: How AI Systems May Become Harder to Audit, Monitor, and Investigate” by Jordan Taylor, Max H, Ed Fage, Thomas Read, Joseph Bloom
|
May 21, 2026 |
|
“Why does off-model SFT degrade capabilities?” by SebastianP, Dylan Xu, Alek Westover, Julian Stastny, Vivek Hebbar
|
May 21, 2026 |
|
“Women should be able to open things” by KatjaGrace
|
May 21, 2026 |
|
“Toward Interoperability of Minimal Programs” by johnswentworth
|
May 21, 2026 |
|
“theory uplift differentially benefits safety & is massively underpriced” by Yudhister Kumar
|
May 20, 2026 |
|
“Power-seeking agents will likely be developed” by Alec Harris
|
May 20, 2026 |
|
“Synthetic Persona Pretraining: Alignment from Token Zero” by Julian Minder, Raghav Singhal, Viktor Moskvoretskii, Stefan Krsteski, ashtonanderson, rolandaydin, Robert West
|
May 20, 2026 |
|
“If AI is normal technology, history is not reassuring.” by Davidmanheim
|
May 20, 2026 |
|
“Pythagorean addition” by kqr
|
May 20, 2026 |
|
“Brain Structure and IQ: How Myelin Elevates Intelligence” by Shiva’s Right Foot
|
May 20, 2026 |
|
“Humans are not automatically strategic — “inner work” edition” by Chris Lakin
|
May 20, 2026 |
|
“Conclave 1492” by Vaniver
|
May 20, 2026 |
|
“A Visual Guide to Natural Latents” by Alfred Harwood
|
May 20, 2026 |
|
“Implications Of Predicting The Next Token” by jdp
|
May 20, 2026 |
|
“Sealing Conditional Misalignment in Inoculation Prompting with Consistency Training” by David Africa, Neil Shah, Sukrati_Gautam
|
May 19, 2026 |
|
“Advice on interviewing candidates for AI safety fellowships” by beyarkay
|
May 19, 2026 |
|
“Negation Neglect: When models fail to learn negations in training” by harrymayne, Lev McKinney, Owain_Evans
|
May 18, 2026 |
|
“Classifier Context Rot: Monitor Performance Degrades with Context Length” by Fabien Roger, Sam Martin
|
May 18, 2026 |
|
“why pollen allergies?” by bhauth
|
May 18, 2026 |
|
“How to Quit Fandom: Apostasy” by Laiba Rehman
|
May 18, 2026 |
|
“James C. Scott: Seeing Like a State” by Martin Sustrik
|
May 17, 2026 |
|
“How to Reason about Your Health Issues” by Taylor G. Lunt
|
May 17, 2026 |
|
“Benchmarking Real Work” by kaivu, leni, rohuang, zef
|
May 17, 2026 |
|
“A relatively brief explanation of Boltzmann Brains” by Eliezer Yudkowsky
|
May 16, 2026 |
|
“An Introduction to Exemplar Partitioning for Mechanistic Interpretability” by Jessica Rumbelow
|
May 16, 2026 |
|
“A Year Late, Claude Finally Beats Pokémon” by Julian Bradshaw
|
May 16, 2026 |
|
“Monthly Roundup #42: May 2026” by Zvi
|
May 16, 2026 |
|
“Incriminating misaligned AI models via distillation” by Alek Westover, SebastianP, Alex Mallen, Jozdien, Alexa Pan, Julian Stastny
|
May 15, 2026 |
|
“The hard core of alignment (is robustifying RL)” by Cole Wyeth
|
May 15, 2026 |
|
“Announcing the Center for Shared AI Prosperity” by Dylan Matthews
|
May 15, 2026 |
|
“Risk reports need to address deployment-time spread of misalignment” by Alex Mallen
|
May 15, 2026 |
|
“Mechanistic estimation for expectations of random products” by Jacob_Hilton
|
May 15, 2026 |
|
“MATS 9 Retrospective & Advice” by beyarkay
|
May 15, 2026 |
|
[Linkpost] “Don’t be too Clever to Take Obvious Advice” by Hide
|
May 15, 2026 |
|
“Verification-Centric AI” by Raemon
|
May 15, 2026 |
|
“Convergent Abstraction Hypothesis” by Jan_Kulveit
|
May 15, 2026 |
|
“AI #168: Not Leading the Future” by Zvi
|
May 15, 2026 |
|
“Automated Alignment is Harder Than You Think” by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving
|
May 14, 2026 |
|
“The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness” by Charlie Griffin, Patrick Leask
|
May 14, 2026 |
|
“Predicting Rare LLM Failures with 30× Fewer Rollouts” by Santiago Aranguri, Francisco Pernice
|
May 14, 2026 |
|
“Cyber Lack of Security and AI Governance” by Zvi
|
May 14, 2026 |
|
[Linkpost] “Claude is Now Alignment Pretrained” by RogerDearnaley
|
May 14, 2026 |
|
“The primary sources of near-term cybersecurity risk” by lc
|
May 14, 2026 |
|
“Most “inner work” looks like entertainment.” by Chris Lakin
|
May 14, 2026 |
|
[Linkpost] “Apollo Update May 2026” by Marius Hobbhahn
|
May 13, 2026 |
|
“Voters are surprisingly open to talking about AI risk” by less_raichu
|
May 13, 2026 |
|
“Childhood and Education #18: Do The Math” by Zvi
|
May 12, 2026 |
|
“The Owned Ones” by Eliezer Yudkowsky
|
May 12, 2026 |
|
“Optimisation: Selective versus Predictive” by Raymond Douglas
|
May 12, 2026 |
|
“Childhood And Education #17: Is Our Children Reading” by Zvi
|
May 12, 2026 |
|
“AI companies are already profitable (in the way that matters)” by Yair Halberstadt
|
May 11, 2026 |
|
“The Iliad Intensive Course Materials” by Leon Lang, David Udell, Alexander Gietelink Oldenziel
|
May 11, 2026 |
|
“How useful is the information you get from working inside an AI company?” by Buck, Anders Cairns Woodruff
|
May 11, 2026 |
|
“Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)” by Steven Byrnes
|
May 11, 2026 |
|
“Who Got Breasts First and How We Got Them” by rba
|
May 11, 2026 |