Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
| Episode | Date |
|---|---|
|
“Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models” by Anders Cairns Woodruff
|
Jun 10, 2026 |
|
“Efficient tradeoffs and the safety-usefulness tradeoff model” by Buck Shlegeris
|
Jun 08, 2026 |
|
“Retrying vs Resampling in AI Control” by James Lucassen, Adam Kaufman
|
May 29, 2026 |
|
“Advice for making robust-to-training model organisms” by Alek Westover, Sebastian Prasanna, Vivek Hebbar, Julian Stastny
|
May 28, 2026 |
|
“Full automation of AI R&D probably yields a large speed up even without a software-only singularity” by Ryan Greenblatt
|
May 27, 2026 |
|
“Incriminating misaligned AI models via distillation” by Alek Westover, Sebastian Prasanna, Alex Mallen, Alexa Pan, Julian Stastny
|
May 18, 2026 |
|
“Risk reports need to address deployment-time spread of misalignment” by Alex Mallen
|
May 15, 2026 |
|
“How useful is the information you get from working inside an AI company?” by Buck Shlegeris, Anders Cairns Woodruff
|
May 11, 2026 |
|
“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck Shlegeris
|
May 07, 2026 |
|
“Risk from fitness-seeking AIs: mechanisms and mitigations” by Alex Mallen
|
May 01, 2026 |
|
“Research Sabotage in ML Codebases” by Eric Gan
|
Apr 29, 2026 |
|
“Recursive forecasting” by Arun Jose, Alex Mallen
|
Apr 28, 2026 |
|
“AI companies should publish security assessments” by Ryan Greenblatt
|
Apr 27, 2026 |
|
“Fail safe(r) at alignment by channeling reward-hacking into a “spillway” motivation” by Anders Cairns Woodruff, Alex Mallen
|
Apr 27, 2026 |
|
“A taxonomy of barriers to trading with early misaligned AIs” by Alexa Pan
|
Apr 21, 2026 |
|
“Introducing LinuxArena” by Tyler
|
Apr 20, 2026 |
|
“Current AIs seem pretty misaligned to me” by Ryan Greenblatt
|
Apr 15, 2026 |
|
“Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes” by Alex Mallen, Ryan Greenblatt
|
Apr 14, 2026 |
|
“Logit ROCs: Monitor TPR is linear in FPR in logit space” by Kerrick Staley, Aryan Bhatt, Julian Stastny
|
Apr 12, 2026 |
|
“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by Ryan Greenblatt
|
Apr 11, 2026 |
|
“My picture of the present in AI” by Ryan Greenblatt
|
Apr 07, 2026 |
|
“AIs can now often do massive easy-to-verify SWE tasks” by Ryan Greenblatt
|
Apr 06, 2026 |
|
“Blocking live failures with synchronous monitors” by James Lucassen, Adam Kaufman
|
Mar 30, 2026 |
|
“Reward-seekers will probably behave according to causal decision theory” by Alex Mallen
|
Mar 28, 2026 |
|
“AI’s capability improvements haven’t come from it getting less affordable” by Anders Cairns Woodruff
|
Mar 27, 2026 |
|
“Are AIs more likely to pursue on-episode or beyond-episode reward?” by Anders Cairns Woodruff, Alex Mallen
|
Mar 12, 2026 |
|
“The case for satiating cheaply-satisfied AI preferences” by Alex Mallen
|
Mar 10, 2026 |
|
“Frontier AI companies probably can’t leave the US” by Anders Cairns Woodruff
|
Feb 26, 2026 |
|
“Announcing ControlConf 2026” by Buck Shlegeris
|
Feb 26, 2026 |
|
“Will reward-seekers respond to distant incentives?” by Alex Mallen
|
Feb 16, 2026 |
|
“How do we (more) safely defer to AIs?” by Ryan Greenblatt
|
Feb 12, 2026 |
|
“Distinguish between inference scaling and “larger tasks use more compute”” by Ryan Greenblatt
|
Feb 11, 2026 |
|
“Fitness-Seekers: Generalizing the Reward-Seeking Threat Model” by Alex Mallen
|
Jan 29, 2026 |
|
The inaugural Redwood Research podcast
|
Jan 06, 2026 |
|
“Recent LLMs can do 2-hop and 3-hop latent (no CoT) reasoning on natural facts” by Ryan Greenblatt
|
Jan 01, 2026 |
|
“Measuring no CoT math time horizon (single forward pass)” by Ryan Greenblatt
|
Dec 26, 2025 |
|
“Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance” by Ryan Greenblatt
|
Dec 22, 2025 |
|
“BashArena and Control Setting Design” by Adam Kaufman, jlucassen
|
Dec 18, 2025 |
|
“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck Shlegeris
|
Dec 04, 2025 |
|
“Will AI systems drift into misalignment?” by Josh Clymer
|
Nov 15, 2025 |
|
“What’s up with Anthropic predicting AGI by early 2027?” by Ryan Greenblatt
|
Nov 03, 2025 |
|
“Sonnet 4.5’s eval gaming seriously undermines alignment evals” by Alexa Pan, Ryan Greenblatt
|
Oct 30, 2025 |
|
“Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?” by Alek Westover
|
Oct 23, 2025 |
|
“Is 90% of code at Anthropic being written by AIs?” by Ryan Greenblatt
|
Oct 22, 2025 |
|
“Reducing risk from scheming by studying trained-in scheming behavior” by Ryan Greenblatt
|
Oct 16, 2025 |
|
“Iterated Development and Study of Schemers (IDSS)” by Ryan Greenblatt
|
Oct 10, 2025 |
|
“The Thinking Machines Tinker API is good news for AI control and security” by Buck Shlegeris
|
Oct 09, 2025 |
|
“Plans A, B, C, and D for misalignment risk” by Ryan Greenblatt
|
Oct 08, 2025 |
|
“Notes on fatalities from AI takeover” by Ryan Greenblatt
|
Sep 23, 2025 |
|
“Focus transparency on risk reports, not safety cases” by Ryan Greenblatt
|
Sep 22, 2025 |
|
“Prospects for studying actual schemers” by Ryan Greenblatt, Julian Stastny
|
Sep 19, 2025 |
|
“What training data should developers filter to reduce risk from misaligned AI?” by Alek Westover
|
Sep 17, 2025 |
|
“AIs will greatly change engineering in AI companies well before AGI” by Ryan Greenblatt
|
Sep 09, 2025 |
|
“Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro” by Ryan Greenblatt
|
Sep 03, 2025 |
|
“Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)” by Ryan Greenblatt
|
Aug 27, 2025 |
|
“Notes on cooperating with unaligned AI” by Lukas Finnveden
|
Aug 24, 2025 |
|
“Being honest with AIs” by Lukas Finnveden
|
Aug 21, 2025 |
|
“My AGI timeline updates from GPT-5 (and 2025 so far)” by Ryan Greenblatt
|
Aug 20, 2025 |
|
“Four places where you can put LLM monitoring” by Fabien Roger, Buck Shlegeris
|
Aug 09, 2025 |
|
“Four places where you can put LLM monitoring” by Fabien Roger, Buck Shlegeris
|
Aug 09, 2025 |
|
“Should we update against seeing relatively fast AI progress in 2025 and 2026?” by Ryan Greenblatt
|
Jul 28, 2025 |
|
“Should we update against seeing relatively fast AI progress in 2025 and 2026?” by Ryan Greenblatt
|
Jul 28, 2025 |
|
“Why it’s hard to make settings for high-stakes control research” by Buck Shlegeris
|
Jul 18, 2025 |
|
“Why it’s hard to make settings for high-stakes control research” by Buck Shlegeris
|
Jul 18, 2025 |
|
“Recent Redwood Research project proposals” by Ryan Greenblatt, Buck Shlegeris, Julian Stastny, Josh Clymer, Alex Mallen, Vivek Hebbar
|
Jul 14, 2025 |
|
“Recent Redwood Research project proposals” by Ryan Greenblatt, Buck Shlegeris, Julian Stastny, Josh Clymer, Alex Mallen, Vivek Hebbar
|
Jul 14, 2025 |
|
“What’s worse, spies or schemers?” by Buck Shlegeris, Julian Stastny
|
Jul 09, 2025 |
|
“Ryan on the 80,000 Hours podcast” by Buck Shlegeris
|
Jul 08, 2025 |
|
“How much novel security-critical infrastructure do you need during the singularity?” by Buck Shlegeris
|
Jul 05, 2025 |
|
“Two proposed projects on abstract analogies for scheming” by Julian Stastny
|
Jul 04, 2025 |
|
“There are two fundamentally different constraints on schemers” by Buck Shlegeris
|
Jul 02, 2025 |
|
“Jankily controlling superintelligence” by Ryan Greenblatt
|
Jun 27, 2025 |
|
“What does 10x-ing effective compute get you?” by Ryan Greenblatt
|
Jun 24, 2025 |
|
“Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck Shlegeris
|
Jun 23, 2025 |
|
“Making deals with early schemers” by Julian Stastny, Olli Järviniemi, Buck Shlegeris
|
Jun 20, 2025 |
|
“Prefix cache untrusted monitors: a method to apply after you catch your AI” by Ryan Greenblatt
|
Jun 20, 2025 |
|
“AI safety techniques leveraging distillation” by Ryan Greenblatt
|
Jun 19, 2025 |
|
“When does training a model change its goals?” by Vivek Hebbar, Ryan Greenblatt
|
Jun 12, 2025 |
|
“The case for countermeasures to memetic spread of misaligned values” by Alex Mallen
|
May 28, 2025 |
|
“AIs at the current capability level may be important for future safety work” by Ryan Greenblatt
|
May 12, 2025 |
|
“Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking” by Julian Stastny, Buck Shlegeris
|
May 08, 2025 |
|
“Training-time schemers vs behavioral schemers” by Alex Mallen
|
May 06, 2025 |
|
“What’s going on with AI progress and trends? (As of 5/2025)” by Ryan Greenblatt
|
May 03, 2025 |
|
“How can we solve diffuse threats like research sabotage with AI control?” by Vivek Hebbar
|
Apr 30, 2025 |
|
“7+ tractable directions in AI control” by Ryan Greenblatt
|
Apr 29, 2025 |
|
“Clarifying AI R&D threat models” by Josh Clymer
|
Apr 25, 2025 |
|
“How training-gamers might function (and win)” by Vivek Hebbar
|
Apr 24, 2025 |
|
“Handling schemers if shutdown is not an option” by Buck Shlegeris
|
Apr 18, 2025 |
|
“Ctrl-Z: Controlling AI Agents via Resampling” by Buck Shlegeris
|
Apr 16, 2025 |
|
“To be legible, evidence of misalignment probably has to be behavioral” by Ryan Greenblatt
|
Apr 15, 2025 |
|
“Why do misalignment risks increase as AIs get more capable?” by Ryan Greenblatt
|
Apr 11, 2025 |
|
“An overview of areas of control work” by Ryan Greenblatt
|
Apr 09, 2025 |
|
“An overview of control measures” by Ryan Greenblatt
|
Apr 06, 2025 |
|
“Buck on the 80,000 Hours podcast” by Buck Shlegeris
|
Apr 05, 2025 |
|
“Notes on countermeasures for exploration hacking (aka sandbagging)” by Ryan Greenblatt
|
Apr 04, 2025 |
|
“Notes on handling non-concentrated failures with AI control: high level methods and different regimes” by Ryan Greenblatt
|
Apr 03, 2025 |
|
“Prioritizing threats for AI control” by Ryan Greenblatt
|
Mar 19, 2025 |
|
“How might we safely pass the buck to AI?” by Josh Clymer
|
Feb 19, 2025 |
|
“Takeaways from sketching a control safety case” by Josh Clymer
|
Jan 30, 2025 |
|
“Planning for Extreme AI Risks” by Josh Clymer
|
Jan 29, 2025 |
|
“Ten people on the inside” by Buck Shlegeris
|
Jan 28, 2025 |
|
“When does capability elicitation bound risk?” by Josh Clymer
|
Jan 22, 2025 |
|
“How will we update about scheming?” by Ryan Greenblatt
|
Jan 19, 2025 |
|
“Thoughts on the conservative assumptions in AI control” by Buck Shlegeris
|
Jan 17, 2025 |
|
“Extending control evaluations to non-scheming threats” by Josh Clymer
|
Jan 13, 2025 |
|
“Measuring whether AIs can statelessly strategize to subvert security measures” by Buck Shlegeris, Alex Mallen
|
Dec 20, 2024 |
|
“Alignment Faking in Large Language Models” by Ryan Greenblatt, Buck Shlegeris
|
Dec 18, 2024 |
|
“Why imperfect adversarial robustness doesn’t doom AI control” by Buck Shlegeris
|
Nov 18, 2024 |
|
“Win/continue/lose scenarios and execute/replace/audit protocols” by Buck Shlegeris
|
Nov 15, 2024 |
|
“Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren’t scheming” by Buck Shlegeris
|
Oct 10, 2024 |
|
“A basic systems architecture for AI agents that do autonomous research” by Buck Shlegeris
|
Sep 26, 2024 |
|
“How to prevent collusion when using untrusted models to monitor each other” by Buck Shlegeris
|
Sep 25, 2024 |
|
“Would catching your AIs trying to escape convince AI developers to slow down or undeploy?” by Buck Shlegeris
|
Aug 26, 2024 |
|
“Fields that I reference when thinking about AI takeover prevention” by Buck Shlegeris
|
Aug 13, 2024 |
|
“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by Ryan Greenblatt
|
Jun 17, 2024 |