LessWrong (30+ Karma)

By LessWrong

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by LessWrong

Category: Technology

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast

Subscribers: 3
Reviews: 0
Episodes: 250

Description

Audio narrations of LessWrong posts.

Episode Date
“Expert Views on Continual Learning: Survey Results and Forecasts” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd
Jun 25, 2026
“AI catastrophe: more like a genocide than a thought experiment” by KatjaGrace
Jun 25, 2026
“Elephant seal IV” by KatjaGrace
Jun 25, 2026
“What is up with e/acc?” by KatjaGrace
Jun 25, 2026
“AI pause: the case for ASAP” by KatjaGrace
Jun 24, 2026
“Reward Hacking Without Egregious Misalignment in an RL-Only Setting” by Joey Yudelson, Vladimir Ivanov, ryan_greenblatt
Jun 24, 2026
“Planning for Preservation in the Age of AI” by Raelifin
Jun 24, 2026
“Risk-Averse AIs” by wdmacaskill, Elliott Thornley (EJT)
Jun 24, 2026
“And what happens next?” by Sean Herrington
Jun 24, 2026
“Superintelligence vs. The Second Strike” by Felix Choussat
Jun 24, 2026
“Monthly Roundup #43: June 2026” by Zvi
Jun 24, 2026
“The worthlessness of vitamin D is mildly exaggerated” by dynomight
Jun 23, 2026
“A system overview for near-term, low-trust AI compute verification” by Naci Cankaya
Jun 23, 2026
“Model Size Scaling in 2023-2031” by Vladimir_Nesov
Jun 23, 2026
“GLM-5.2 Is The New Best Open Model” by Zvi
Jun 22, 2026
“The AI Industrial Explosion — Part 4: Cheap power” by djbinder
Jun 22, 2026
“A Theory of Prompt Injection (and why you should study roles)” by Charles Ye, softboiledheart
Jun 22, 2026
“Coup is the Pareto-optimal social game” by Daniel Tan
Jun 22, 2026
“A brief list of ways AI safety efforts could be net negative” by Elias Schmied
Jun 22, 2026
“NLA explanations can be shortened without harming reconstruction” by loops
Jun 22, 2026
“Introducing MonitoringBench” by monika_j
Jun 22, 2026
“Claude Fable 5 and Mythos 5: Capabilities” by Zvi
Jun 21, 2026
“The Invisible Side of AI Governance” by Charbel-Raphaël
Jun 21, 2026
“Google Can’t Math Parsecs” by jefftk
Jun 21, 2026
″[Linkpost] How Transparent Is DiffusionGemma (and why it matters)” by Josh Engels, Callum McDougall, bilalchughtai, János Kramár, Senthooran Rajamanoharan, Arthur Conmy
Jun 21, 2026
“Would anybody here be interested in a “mistake postmortem” discussion group?” by SK2
Jun 20, 2026
“Hyperstition as the Natural Enemy of Rationality” by alseph
Jun 20, 2026
“AI Safety Ecosystem Research notes” by Eneasz
Jun 20, 2026
“Research agenda: Interpretive debate” by Shi
Jun 20, 2026
“The LLM shoggoth meme is weirder than you think” by HedonicEscalator
Jun 20, 2026
“Introduction: Gaussian Natural Latents” by Haru
Jun 20, 2026
“San Silvestro” by Tomás B.
Jun 19, 2026
“The one-week sprint” by Daniel Tan
Jun 19, 2026
“On “Model Organisms”” by J Bostock
Jun 19, 2026
“The distillation double bind: Distilling misaligned models either transfers misalignment or it doesn’t” by Alek Westover, SebastianP, Alexa Pan, Jozdien
Jun 18, 2026
“AI #173: AI Pauses” by Zvi
Jun 18, 2026
″“Did you lie?” Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms” by Alan Cooney, David Africa, Geoffrey Irving
Jun 18, 2026
“Contra Pace on When to Apologize” by Zack_M_Davis
Jun 18, 2026
“GDM AI Control Roadmap” by Mary Phuong, Erik Jenner, Rohin Shah, Seb Farquhar
Jun 18, 2026
“Your Model Organisms Might Be Fried” by Daniel Tan, J Bostock, draganover, ma-rmartinez, sidbaines, David Africa
Jun 18, 2026
“Rational Agentic Maximalist Philosophies” by Connor Blake
Jun 18, 2026
“Leveraged on being right” by Ben Pace
Jun 18, 2026
“Gears for political races” by Tom Smith
Jun 17, 2026
“Several frontier models are substantially prefill aware” by yeedrag, Parv Mahajan, David Africa, alexsouly, Jordan Taylor, RobertKirk
Jun 17, 2026
“Alignement pretraining could backfire” by Alexandre Variengien
Jun 17, 2026
“The Financial Ledger Theory of Apologies” by Ben Pace
Jun 17, 2026
“The Once And Future Fable #3: Fix This Code” by Zvi
Jun 17, 2026
[Linkpost] “Scaling Hypothesis #2: Are Humans Just More Over-Parameterized?” by gwern
Jun 17, 2026
[Linkpost] “Guardian Angels: LLM Personalization for Productivity and Security” by gwern
Jun 17, 2026
“Predicting LLM Safety Before Release by Simulating Deployment” by Tomek Korbak, Marcus Williams, micahcarroll, Cameron Raymond, Hannah Sheahan
Jun 17, 2026
“How the AI Village works” by Adam B
Jun 17, 2026
“What are some angles of attack for making continual learning safer?” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd
Jun 16, 2026
“Fable and Mythos: Model Welfare” by Zvi
Jun 16, 2026
“Does preservation make sense before we know how to revive?” by Aurelia
Jun 16, 2026
“Synthetic document finetuning for instilling positive traits” by CallumMcDougall, Arthur Conmy, Neel Nanda
Jun 16, 2026
“A Test Suite for Concepts” by Gretta Duleba
Jun 16, 2026
“The Once And Future Fable #2” by Zvi
Jun 15, 2026
“A frontier AI company should shut down” by MichaelDickens
Jun 15, 2026
“Why Do Naive SFT Filters For Safety Properties Fail?” by Josh Engels, Neel Nanda
Jun 14, 2026
“Impressions at the Extremity of Civilization” by Ben Pace
Jun 14, 2026
“The Hidden Structures of Problems” by spencerg
Jun 14, 2026
“American Government Takes Down Claude Fable” by Zvi
Jun 13, 2026
“How might continual learning affect safety and alignment?” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd
Jun 13, 2026
“SFT Drives Gemini’s Safety Properties” by Josh Engels, Arthur Conmy, bilalchughtai, Neel Nanda
Jun 13, 2026
“The term “AGI” is almost useless at this point [Linkpost]” by Noosphere89
Jun 13, 2026
“The Uncertainty That Matters Isn’t Fundamental” by jimmy
Jun 13, 2026
[Linkpost] “US government directive to suspend access to Fable 5 and Mythos 5” by Capybasilisk
Jun 13, 2026
“Claude Fable 5 and Mythos 5: The System Card” by Zvi
Jun 12, 2026
“Simulating Simulators” by kromem
Jun 12, 2026
“Citations Needed: Magic Encyclopedias to Save the World” by Oliver Sourbut
Jun 12, 2026
“Implications of Continual Learning for LLM Agents: Introduction” by RohanS, Rauno Arike, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd
Jun 12, 2026
“Reward Hacking at the 1937 World’s Fair” by frmsaul
Jun 12, 2026
“Building and evaluating model diffing agents” by bilalchughtai, Josh Engels, Neel Nanda
Jun 12, 2026
“Sympathy for both sides of the egregious misalignment debate” by Steven Byrnes
Jun 12, 2026
“Celene’s thoughts on consciousness” by ToasterLightning
Jun 12, 2026
“Parkinson’s Heuristic” by Ben Pace
Jun 12, 2026
“PSA: Almost nobody is working on alignment” by Chi Nguyen, peterbarnett
Jun 12, 2026
“AI #172: The First Fable” by Zvi
Jun 11, 2026
“Models May Behave Worse When Eval Aware” by Senthooran Rajamanoharan, Neel Nanda
Jun 11, 2026
“Thoughts on Claude Fable’s silent safeguards” by Andy Arditi
Jun 11, 2026
“You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them” by RobinHa
Jun 11, 2026
“Tracing Eval-Awareness Emergence Through Training of OLMo 3” by Ram Bharadwaj, RobertKirk
Jun 10, 2026
“Anthropic did not call for a pause on AI” by Andrea_Miotti, Gabriel Alfour
Jun 10, 2026
“Three types of model organism” by Francis Rhys Ward
Jun 10, 2026
“Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models” by Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask
Jun 10, 2026
“Sequent: scale and automation for higher confidence in alignment” by Geoffrey Irving, Alex HT, Jesse Hoogland, Daniel Murfet, Jacob Pfau, Marco Cozzi, Stan van Wingerden
Jun 10, 2026
“Machinic Psychopharmacology: Do LLMs Self-Medicate?” by Sid Black, Joseph Bloom
Jun 10, 2026
“The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably” by Alex Amadori
Jun 10, 2026
″“Programmer Science Fiction: My case for a new sub-genre”, Sam T. Oates 2026” by gwern
Jun 10, 2026
“Even “illegible” Mythos reasoning traces seem pretty legible” by faul_sname
Jun 10, 2026
“Claude Fable 5 and Mythos 5 [Linkpost]” by fluxxrider
Jun 10, 2026
“Three Labs With a Plan and A Memorandum” by Zvi
Jun 10, 2026
“A Mike’s-Eye View of ARC’s Research” by Jacob_Hilton
Jun 09, 2026
“Towards a Formal Scientific Epistemology” by Richard_Ngo
Jun 09, 2026
“LLMs and almost good code” by kqr
Jun 09, 2026
“On Slop” by Jan
Jun 09, 2026
“The Machines Lack Honour” by Raymond Douglas
Jun 09, 2026
“How to build a cancer vaccine, and whether they will work this time” by Abhishaike Mahajan
Jun 09, 2026
“Efficient tradeoffs and the safety-usefulness tradeoff model” by Buck
Jun 08, 2026
“Bun’s Migration from Zig to Rust as a Potential Case Study for Gradual Disempowerment” by Sayhan Yalvaçer
Jun 08, 2026
“Mental causation is not load-bearing” by jessicata
Jun 08, 2026
“How Far Apart Does a Model Think Its Tokens Are?” by Brendan Long
Jun 08, 2026
“Can activation verbalizers surface an internal chain of thought?” by oakhu, ryan_greenblatt
Jun 07, 2026
“Against Corrigibility” by peralice
Jun 07, 2026
“Coming Around To Political Donations” by jefftk
Jun 07, 2026
“OpenAI Offers A New Policy Blueprint” by Zvi
Jun 06, 2026
“Optimisation over non-stationary distributions creates weirder minds” by Samuel Ratnam, Pjain
Jun 06, 2026
“Why Software Automation Is Hard” by silentbob
Jun 06, 2026
“SecureBio Detection is Hiring Software Engineers” by jefftk
Jun 06, 2026
“What if Anthropic unilaterally paused capabilities development right now?” by Karl von Wendt
Jun 06, 2026
“Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks” by Mark Kagach ☘️, EliasSchlie, Thomas Van Damme, JustinShovelain
Jun 06, 2026
“Beyond the lexical personality traits: What is the structure of personality?” by tailcalled
Jun 06, 2026
“Logits as a new monitor for evaluation awareness” by Santiago Aranguri
Jun 05, 2026
“My research agenda and work” by Seth Herd
Jun 05, 2026
“One Year of PauseAI UK” by Joseph Miller, PauseAI UK
Jun 05, 2026
“Learnings from starting an AI safety research team” by draganover, Erin Robertson
Jun 05, 2026
“Training Deliberative Monitors for Black-Box Scheming Detection” by aksh-n, adityasinha, Victor Gillioz, Simon Storf, Kilian Merkelbach, richbc, Axel Højmark, Marius Hobbhahn
Jun 05, 2026
“Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition” by Oliver Sourbut, Josh Jacobson, Future of Life Foundation (FLF)
Jun 05, 2026
″(Mis)generalization of Helpful-Only Fine-tuning” by Omar Khursheed, Baram Sosis, Fabien Roger
Jun 05, 2026
“AI #171: False Flag” by Zvi
Jun 04, 2026
“Rohin Shah on AGI Safety” by anaguma
Jun 04, 2026
“Building Better Activation Oracles” by ceselder, jan_bauer, Niclas Luick, Adam Karvonen, Neel Nanda
Jun 04, 2026
“Sixteen schemes for AI safety” by Austin Chen
Jun 04, 2026
“Don’t Edit Your Ideas Before Having Them” by Hide
Jun 03, 2026
“Trump Signs Executive Order For AI Testing Prior To Frontier Model Releases” by Zvi
Jun 03, 2026
“Society Explained: a tool for efficiently exploring >100 theories of society” by spencerg
Jun 03, 2026
“China won’t win the AI race but would it be much worse if it did?” by Chastity Ruth
Jun 03, 2026
“A Town Without Children” by SeñorDingDong
Jun 03, 2026
“Claude Opus 4.8: Capabilities and Reactions” by Zvi
Jun 03, 2026
“My favorite depiction of utopia” by Caleb Biddulph
Jun 03, 2026
“Why Even Experts Don’t Know What to Do About AI Risk” by Luc Brinkman, plex
Jun 02, 2026
“Agent Foundations Reminds Me of Continental Philosophy” by IanWS
Jun 02, 2026
“Announcing the ARC White-Box Estimation Challenge” by Jacob_Hilton
Jun 02, 2026
“Tech I’m skeptical of and why” by harsimony
Jun 02, 2026
“Dissolving the Deep Learning Sample Efficiency Gap” by Samuel Knoche
Jun 02, 2026
″“Contagious Humming” to Silence a Room” by JohnofCharleston
Jun 01, 2026
[Linkpost] “NYT: Senator Sanders Proposes Gov’t Take 50% Ownership of AI labs” by Julian Bradshaw
Jun 01, 2026
“Opus 4.8 Part 2: Model Welfare” by Zvi
Jun 01, 2026
[Linkpost] “Some humans are both male and female, and can (but shouldn’t) have children with themselves” by HedonicEscalator
Jun 01, 2026
“Outrunning your headlights” by mattshu0410
Jun 01, 2026
“Lighthaven East - A Feasibility Study” by JohnofCharleston
Jun 01, 2026
“Notes on axes of variation in third-party risk assessment” by Buck
May 31, 2026
“Financial Costs of an AI Pause?” by PeterMcCluskey
May 31, 2026
“When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability” by Logan Riggs, tdooms, Conflux, lwroe, MLNissenGonzalez
May 31, 2026
“Testing Gemini models for scheming tendencies” by Vika, David Lindner, Seb Farquhar, Rohin Shah
May 31, 2026
“Comment on “Banning Said Achmiz”” by Zack_M_Davis
May 30, 2026
“Announcing: Iliad’s Fall 2026 Programs” by David Udell, Alexander Gietelink Oldenziel, Leon Lang
May 30, 2026
“Data you could have observed but didn’t” by Gretta Duleba
May 30, 2026
“Claude Opus 4.8: The System Card” by Zvi
May 29, 2026
“Retrying vs Resampling in AI Control” by james.lucassen, Adam Kaufman
May 29, 2026
“AI Researchers, Ask Yourself These 6 Questions to Strengthen Your Moral Muscles” by Max Tegmark
May 29, 2026
“Developmental Cognitive Interpretability: A Research Agenda for Modelling Generalisation and Predicting Agent Behaviour” by JasonB, Edward James Young
May 29, 2026
“Does Claude really care about you?” by Simon Lermen
May 29, 2026
“How can the middle powers avoid getting trounced during the intelligence explosion? A plan.” by Tom Davidson
May 29, 2026
“Trees are mostly made of air and a generalizable lesson for AI safety” by zroe1
May 29, 2026
“Advice for making robust-to-training model organisms” by SebastianP, Alek Westover, Vivek Hebbar, Julian Stastny, Dylan Xu
May 29, 2026
“Mnemonic portraits for 19,023 human genes” by Brinedew
May 29, 2026
“Claude… doesn’t know who you are?” by Smaug123
May 29, 2026
“Some Dating Stories” by johnswentworth
May 29, 2026
“AI #170: Lack of Executive Order” by Zvi
May 28, 2026
“Infinite ethics and UDASSA” by David Matolcsi
May 28, 2026
“The ballad of TIGIT” by Abhishaike Mahajan
May 27, 2026
“Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming” by Jasmine Li, Alex Turner
May 27, 2026
“LLMs Through the Eyes of Vinge” by Gordon Seidoh Worley
May 27, 2026
“Announcing Geodesic Research” by Puria, Cam, Alexandra Narin, Edward James Young, Kyle O’Brien
May 27, 2026
“Full automation of AI R&D probably yields a large speed up even without a software-only singularity” by ryan_greenblatt
May 27, 2026
“Quantitative AI risk assessment: a starting point” by Henry Papadatos, jakub_krys, malcolmmurray, Renn Karageorgieva
May 27, 2026
“Finding the Mole: Bayesianism is Hard” by laniakea
May 27, 2026
“Notes on Fourier Analysis” by Menotim
May 27, 2026
“Standard deviations from just two values” by kqr
May 27, 2026
“Contra Wentworth on Physical Attractiveness for Men” by Gretta Duleba
May 27, 2026
“Practical Learnings from Synthetic Document Finetuning” by Axel Højmark, Jérémy Scheurer
May 27, 2026
“RTMH: Pope Leo’s Magnifica Humanitas on AI” by Zvi
May 26, 2026
“Claude, Author of the Humanitas” by Linch
May 26, 2026
“Brackets Are a Bad Way to Regulate” by Hide
May 26, 2026
“Many portions of Magnifica Humanitas appear to be AI-written” by DanielFilan
May 26, 2026
“Donating 80% While It Still Counts” by jefftk
May 26, 2026
“Cognitive Security as an AI Safety Cause Area” by jsteinhardt
May 25, 2026
“Linkpost: New Vatican Encyclical on AI Governance” by Jackson Wagner
May 25, 2026
“A (Slightly) Mechanistic Theory for Exponentially Increasing AI Time Horizons?” by Oliver Sourbut
May 25, 2026
“Taxing Small Cars To Improve MPG” by jefftk
May 25, 2026
“We made a map of the doom debate” by Sean Herrington, Paul Hindoian, mikaelacankosyan, David Bravo, keivnc, Josh Tuffy, Christopher Davis, Khai Tran, Maryam Hampaei
May 24, 2026
“Your Left Brain Doesn’t Trade With Your Right” by Alexander Gietelink Oldenziel
May 24, 2026
“Gemini 3.5 Flash Looks Good For How Fast It Is” by Zvi
May 23, 2026
“Probabilities are not the right concept” by David Matolcsi
May 23, 2026
“Basic principles for dressing better.” by spookycat
May 23, 2026
“Will we really put data centers in space?” by Avi Parrack, fin
May 23, 2026
“PLA Daily Translation: Reflections on Warfare Brought by AGI” by eeeee
May 23, 2026
“Out-of-Context Reasoning (OOCR) in LLMs: A Short Primer and Reading List” by Owain_Evans
May 23, 2026
“Numb mental state shifts” by KatjaGrace
May 23, 2026
“You can opt out of allergies” by Rattengift
May 22, 2026
“Notes on Collaborating with Claude Opus” by Nissa Seru
May 22, 2026
“Learned Chain-of-Thought Obfuscation Generalises to Unseen Tasks” by Nathaniel Mitrani, sassanb, Cam Tice, Puria
May 22, 2026
“What am I, if not an AI?” by makiba
May 21, 2026
“AI #169: New Knowledge” by Zvi
May 21, 2026
“Loss of Oversight: How AI Systems May Become Harder to Audit, Monitor, and Investigate” by Jordan Taylor, Max H, Ed Fage, Thomas Read, Joseph Bloom
May 21, 2026
“Why does off-model SFT degrade capabilities?” by SebastianP, Dylan Xu, Alek Westover, Julian Stastny, Vivek Hebbar
May 21, 2026
“Women should be able to open things” by KatjaGrace
May 21, 2026
“Toward Interoperability of Minimal Programs” by johnswentworth
May 21, 2026
“theory uplift differentially benefits safety & is massively underpriced” by Yudhister Kumar
May 20, 2026
“Power-seeking agents will likely be developed” by Alec Harris
May 20, 2026
“Synthetic Persona Pretraining: Alignment from Token Zero” by Julian Minder, Raghav Singhal, Viktor Moskvoretskii, Stefan Krsteski, ashtonanderson, rolandaydin, Robert West
May 20, 2026
“If AI is normal technology, history is not reassuring.” by Davidmanheim
May 20, 2026
“Pythagorean addition” by kqr
May 20, 2026
“Brain Structure and IQ: How Myelin Elevates Intelligence” by Shiva’s Right Foot
May 20, 2026
“Humans are not automatically strategic — “inner work” edition” by Chris Lakin
May 20, 2026
“Conclave 1492” by Vaniver
May 20, 2026
“A Visual Guide to Natural Latents” by Alfred Harwood
May 20, 2026
“Implications Of Predicting The Next Token” by jdp
May 20, 2026
“Sealing Conditional Misalignment in Inoculation Prompting with Consistency Training” by David Africa, Neil Shah, Sukrati_Gautam
May 19, 2026
“Advice on interviewing candidates for AI safety fellowships” by beyarkay
May 19, 2026
“Negation Neglect: When models fail to learn negations in training” by harrymayne, Lev McKinney, Owain_Evans
May 18, 2026
“Classifier Context Rot: Monitor Performance Degrades with Context Length” by Fabien Roger, Sam Martin
May 18, 2026
“why pollen allergies?” by bhauth
May 18, 2026
“How to Quit Fandom: Apostasy” by Laiba Rehman
May 18, 2026
“James C. Scott: Seeing Like a State” by Martin Sustrik
May 17, 2026
“How to Reason about Your Health Issues” by Taylor G. Lunt
May 17, 2026
“Benchmarking Real Work” by kaivu, leni, rohuang, zef
May 17, 2026
“A relatively brief explanation of Boltzmann Brains” by Eliezer Yudkowsky
May 16, 2026
“An Introduction to Exemplar Partitioning for Mechanistic Interpretability” by Jessica Rumbelow
May 16, 2026
“A Year Late, Claude Finally Beats Pokémon” by Julian Bradshaw
May 16, 2026
“Monthly Roundup #42: May 2026” by Zvi
May 16, 2026
“Incriminating misaligned AI models via distillation” by Alek Westover, SebastianP, Alex Mallen, Jozdien, Alexa Pan, Julian Stastny
May 15, 2026
“The hard core of alignment (is robustifying RL)” by Cole Wyeth
May 15, 2026
“Announcing the Center for Shared AI Prosperity” by Dylan Matthews
May 15, 2026
“Risk reports need to address deployment-time spread of misalignment” by Alex Mallen
May 15, 2026
“Mechanistic estimation for expectations of random products” by Jacob_Hilton
May 15, 2026
“MATS 9 Retrospective & Advice” by beyarkay
May 15, 2026
[Linkpost] “Don’t be too Clever to Take Obvious Advice” by Hide
May 15, 2026
“Verification-Centric AI” by Raemon
May 15, 2026
“Convergent Abstraction Hypothesis” by Jan_Kulveit
May 15, 2026
“AI #168: Not Leading the Future” by Zvi
May 15, 2026
“Automated Alignment is Harder Than You Think” by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving
May 14, 2026
“The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness” by Charlie Griffin, Patrick Leask
May 14, 2026
“Predicting Rare LLM Failures with 30× Fewer Rollouts” by Santiago Aranguri, Francisco Pernice
May 14, 2026
“Cyber Lack of Security and AI Governance” by Zvi
May 14, 2026
[Linkpost] “Claude is Now Alignment Pretrained” by RogerDearnaley
May 14, 2026
“The primary sources of near-term cybersecurity risk” by lc
May 14, 2026
“Most “inner work” looks like entertainment.” by Chris Lakin
May 14, 2026
[Linkpost] “Apollo Update May 2026” by Marius Hobbhahn
May 13, 2026
“Voters are surprisingly open to talking about AI risk” by less_raichu
May 13, 2026
“Childhood and Education #18: Do The Math” by Zvi
May 12, 2026
“The Owned Ones” by Eliezer Yudkowsky
May 12, 2026
“Optimisation: Selective versus Predictive” by Raymond Douglas
May 12, 2026
“Childhood And Education #17: Is Our Children Reading” by Zvi
May 12, 2026
“AI companies are already profitable (in the way that matters)” by Yair Halberstadt
May 11, 2026
“The Iliad Intensive Course Materials” by Leon Lang, David Udell, Alexander Gietelink Oldenziel
May 11, 2026
“How useful is the information you get from working inside an AI company?” by Buck, Anders Cairns Woodruff
May 11, 2026
“Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)” by Steven Byrnes
May 11, 2026
“Who Got Breasts First and How We Got Them” by rba
May 11, 2026