Redwood Research Blog Podcast Republic

Redwood Research Blog

By Redwood Research

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by Redwood Research

Open Website

Rate for this podcast

Subscribers: 2
Reviews: 0
Episodes: 115

Description

Narrations of Redwood Research blog posts. Redwood Research is a research nonprofit based in Berkeley. We investigate risks posed by the development of powerful artificial intelligence and techniques for mitigating those risks.

Episode	Date
“Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models” by Anders Cairns Woodruff Read the full episode description	Jun 10, 2026
“Efficient tradeoffs and the safety-usefulness tradeoff model” by Buck Shlegeris Read the full episode description	Jun 08, 2026
“Retrying vs Resampling in AI Control” by James Lucassen, Adam Kaufman Read the full episode description	May 29, 2026
“Advice for making robust-to-training model organisms” by Alek Westover, Sebastian Prasanna, Vivek Hebbar, Julian Stastny Read the full episode description	May 28, 2026
“Full automation of AI R&D probably yields a large speed up even without a software-only singularity” by Ryan Greenblatt Read the full episode description	May 27, 2026
“Incriminating misaligned AI models via distillation” by Alek Westover, Sebastian Prasanna, Alex Mallen, Alexa Pan, Julian Stastny Read the full episode description	May 18, 2026
“Risk reports need to address deployment-time spread of misalignment” by Alex Mallen Read the full episode description	May 15, 2026
“How useful is the information you get from working inside an AI company?” by Buck Shlegeris, Anders Cairns Woodruff Read the full episode description	May 11, 2026
“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck Shlegeris Read the full episode description	May 07, 2026
“Risk from fitness-seeking AIs: mechanisms and mitigations” by Alex Mallen Read the full episode description	May 01, 2026
“Research Sabotage in ML Codebases” by Eric Gan Read the full episode description	Apr 29, 2026
“Recursive forecasting” by Arun Jose, Alex Mallen Read the full episode description	Apr 28, 2026
“AI companies should publish security assessments” by Ryan Greenblatt Read the full episode description	Apr 27, 2026
“Fail safe(r) at alignment by channeling reward-hacking into a “spillway” motivation” by Anders Cairns Woodruff, Alex Mallen Read the full episode description	Apr 27, 2026
“A taxonomy of barriers to trading with early misaligned AIs” by Alexa Pan Read the full episode description	Apr 21, 2026
“Introducing LinuxArena” by Tyler Read the full episode description	Apr 20, 2026
“Current AIs seem pretty misaligned to me” by Ryan Greenblatt Read the full episode description	Apr 15, 2026
“Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes” by Alex Mallen, Ryan Greenblatt Read the full episode description	Apr 14, 2026
“Logit ROCs: Monitor TPR is linear in FPR in logit space” by Kerrick Staley, Aryan Bhatt, Julian Stastny Read the full episode description	Apr 12, 2026
“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by Ryan Greenblatt Read the full episode description	Apr 11, 2026
“My picture of the present in AI” by Ryan Greenblatt Read the full episode description	Apr 07, 2026
“AIs can now often do massive easy-to-verify SWE tasks” by Ryan Greenblatt Read the full episode description	Apr 06, 2026
“Blocking live failures with synchronous monitors” by James Lucassen, Adam Kaufman Read the full episode description	Mar 30, 2026
“Reward-seekers will probably behave according to causal decision theory” by Alex Mallen Read the full episode description	Mar 28, 2026
“AI’s capability improvements haven’t come from it getting less affordable” by Anders Cairns Woodruff Read the full episode description	Mar 27, 2026
“Are AIs more likely to pursue on-episode or beyond-episode reward?” by Anders Cairns Woodruff, Alex Mallen Read the full episode description	Mar 12, 2026
“The case for satiating cheaply-satisfied AI preferences” by Alex Mallen Read the full episode description	Mar 10, 2026
“Frontier AI companies probably can’t leave the US” by Anders Cairns Woodruff Read the full episode description	Feb 26, 2026
“Announcing ControlConf 2026” by Buck Shlegeris Read the full episode description	Feb 26, 2026
“Will reward-seekers respond to distant incentives?” by Alex Mallen Read the full episode description	Feb 16, 2026
“How do we (more) safely defer to AIs?” by Ryan Greenblatt Read the full episode description	Feb 12, 2026
“Distinguish between inference scaling and “larger tasks use more compute”” by Ryan Greenblatt Read the full episode description	Feb 11, 2026
“Fitness-Seekers: Generalizing the Reward-Seeking Threat Model” by Alex Mallen Read the full episode description	Jan 29, 2026
The inaugural Redwood Research podcast Read the full episode description	Jan 06, 2026
“Recent LLMs can do 2-hop and 3-hop latent (no CoT) reasoning on natural facts” by Ryan Greenblatt Read the full episode description	Jan 01, 2026
“Measuring no CoT math time horizon (single forward pass)” by Ryan Greenblatt Read the full episode description	Dec 26, 2025
“Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance” by Ryan Greenblatt Read the full episode description	Dec 22, 2025
“BashArena and Control Setting Design” by Adam Kaufman, jlucassen Read the full episode description	Dec 18, 2025
“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck Shlegeris Read the full episode description	Dec 04, 2025
“Will AI systems drift into misalignment?” by Josh Clymer Read the full episode description	Nov 15, 2025
“What’s up with Anthropic predicting AGI by early 2027?” by Ryan Greenblatt Read the full episode description	Nov 03, 2025
“Sonnet 4.5’s eval gaming seriously undermines alignment evals” by Alexa Pan, Ryan Greenblatt Read the full episode description	Oct 30, 2025
“Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?” by Alek Westover Read the full episode description	Oct 23, 2025
“Is 90% of code at Anthropic being written by AIs?” by Ryan Greenblatt Read the full episode description	Oct 22, 2025
“Reducing risk from scheming by studying trained-in scheming behavior” by Ryan Greenblatt Read the full episode description	Oct 16, 2025
“Iterated Development and Study of Schemers (IDSS)” by Ryan Greenblatt Read the full episode description	Oct 10, 2025
“The Thinking Machines Tinker API is good news for AI control and security” by Buck Shlegeris Read the full episode description	Oct 09, 2025
“Plans A, B, C, and D for misalignment risk” by Ryan Greenblatt Read the full episode description	Oct 08, 2025
“Notes on fatalities from AI takeover” by Ryan Greenblatt Read the full episode description	Sep 23, 2025
“Focus transparency on risk reports, not safety cases” by Ryan Greenblatt Read the full episode description	Sep 22, 2025
“Prospects for studying actual schemers” by Ryan Greenblatt, Julian Stastny Read the full episode description	Sep 19, 2025
“What training data should developers filter to reduce risk from misaligned AI?” by Alek Westover Read the full episode description	Sep 17, 2025
“AIs will greatly change engineering in AI companies well before AGI” by Ryan Greenblatt Read the full episode description	Sep 09, 2025
“Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro” by Ryan Greenblatt Read the full episode description	Sep 03, 2025
“Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)” by Ryan Greenblatt Read the full episode description	Aug 27, 2025
“Notes on cooperating with unaligned AI” by Lukas Finnveden Read the full episode description	Aug 24, 2025
“Being honest with AIs” by Lukas Finnveden Read the full episode description	Aug 21, 2025
“My AGI timeline updates from GPT-5 (and 2025 so far)” by Ryan Greenblatt Read the full episode description	Aug 20, 2025
“Four places where you can put LLM monitoring” by Fabien Roger, Buck Shlegeris Read the full episode description	Aug 09, 2025
“Four places where you can put LLM monitoring” by Fabien Roger, Buck Shlegeris Read the full episode description	Aug 09, 2025
“Should we update against seeing relatively fast AI progress in 2025 and 2026?” by Ryan Greenblatt Read the full episode description	Jul 28, 2025
“Should we update against seeing relatively fast AI progress in 2025 and 2026?” by Ryan Greenblatt Read the full episode description	Jul 28, 2025
“Why it’s hard to make settings for high-stakes control research” by Buck Shlegeris Read the full episode description	Jul 18, 2025
“Why it’s hard to make settings for high-stakes control research” by Buck Shlegeris Read the full episode description	Jul 18, 2025
“Recent Redwood Research project proposals” by Ryan Greenblatt, Buck Shlegeris, Julian Stastny, Josh Clymer, Alex Mallen, Vivek Hebbar Read the full episode description	Jul 14, 2025
“Recent Redwood Research project proposals” by Ryan Greenblatt, Buck Shlegeris, Julian Stastny, Josh Clymer, Alex Mallen, Vivek Hebbar Read the full episode description	Jul 14, 2025
“What’s worse, spies or schemers?” by Buck Shlegeris, Julian Stastny Read the full episode description	Jul 09, 2025
“Ryan on the 80,000 Hours podcast” by Buck Shlegeris Read the full episode description	Jul 08, 2025
“How much novel security-critical infrastructure do you need during the singularity?” by Buck Shlegeris Read the full episode description	Jul 05, 2025
“Two proposed projects on abstract analogies for scheming” by Julian Stastny Read the full episode description	Jul 04, 2025
“There are two fundamentally different constraints on schemers” by Buck Shlegeris Read the full episode description	Jul 02, 2025
“Jankily controlling superintelligence” by Ryan Greenblatt Read the full episode description	Jun 27, 2025
“What does 10x-ing effective compute get you?” by Ryan Greenblatt Read the full episode description	Jun 24, 2025
“Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck Shlegeris Read the full episode description	Jun 23, 2025
“Making deals with early schemers” by Julian Stastny, Olli Järviniemi, Buck Shlegeris Read the full episode description	Jun 20, 2025
“Prefix cache untrusted monitors: a method to apply after you catch your AI” by Ryan Greenblatt Read the full episode description	Jun 20, 2025
“AI safety techniques leveraging distillation” by Ryan Greenblatt Read the full episode description	Jun 19, 2025
“When does training a model change its goals?” by Vivek Hebbar, Ryan Greenblatt Read the full episode description	Jun 12, 2025
“The case for countermeasures to memetic spread of misaligned values” by Alex Mallen Read the full episode description	May 28, 2025
“AIs at the current capability level may be important for future safety work” by Ryan Greenblatt Read the full episode description	May 12, 2025
“Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking” by Julian Stastny, Buck Shlegeris Read the full episode description	May 08, 2025
“Training-time schemers vs behavioral schemers” by Alex Mallen Read the full episode description	May 06, 2025
“What’s going on with AI progress and trends? (As of 5/2025)” by Ryan Greenblatt Read the full episode description	May 03, 2025
“How can we solve diffuse threats like research sabotage with AI control?” by Vivek Hebbar Read the full episode description	Apr 30, 2025
“7+ tractable directions in AI control” by Ryan Greenblatt Read the full episode description	Apr 29, 2025
“Clarifying AI R&D threat models” by Josh Clymer Read the full episode description	Apr 25, 2025
“How training-gamers might function (and win)” by Vivek Hebbar Read the full episode description	Apr 24, 2025
“Handling schemers if shutdown is not an option” by Buck Shlegeris Read the full episode description	Apr 18, 2025
“Ctrl-Z: Controlling AI Agents via Resampling” by Buck Shlegeris Read the full episode description	Apr 16, 2025
“To be legible, evidence of misalignment probably has to be behavioral” by Ryan Greenblatt Read the full episode description	Apr 15, 2025
“Why do misalignment risks increase as AIs get more capable?” by Ryan Greenblatt Read the full episode description	Apr 11, 2025
“An overview of areas of control work” by Ryan Greenblatt Read the full episode description	Apr 09, 2025
“An overview of control measures” by Ryan Greenblatt Read the full episode description	Apr 06, 2025
“Buck on the 80,000 Hours podcast” by Buck Shlegeris Read the full episode description	Apr 05, 2025
“Notes on countermeasures for exploration hacking (aka sandbagging)” by Ryan Greenblatt Read the full episode description	Apr 04, 2025
“Notes on handling non-concentrated failures with AI control: high level methods and different regimes” by Ryan Greenblatt Read the full episode description	Apr 03, 2025
“Prioritizing threats for AI control” by Ryan Greenblatt Read the full episode description	Mar 19, 2025
“How might we safely pass the buck to AI?” by Josh Clymer Read the full episode description	Feb 19, 2025
“Takeaways from sketching a control safety case” by Josh Clymer Read the full episode description	Jan 30, 2025
“Planning for Extreme AI Risks” by Josh Clymer Read the full episode description	Jan 29, 2025
“Ten people on the inside” by Buck Shlegeris Read the full episode description	Jan 28, 2025
“When does capability elicitation bound risk?” by Josh Clymer Read the full episode description	Jan 22, 2025
“How will we update about scheming?” by Ryan Greenblatt Read the full episode description	Jan 19, 2025
“Thoughts on the conservative assumptions in AI control” by Buck Shlegeris Read the full episode description	Jan 17, 2025
“Extending control evaluations to non-scheming threats” by Josh Clymer Read the full episode description	Jan 13, 2025
“Measuring whether AIs can statelessly strategize to subvert security measures” by Buck Shlegeris, Alex Mallen Read the full episode description	Dec 20, 2024
“Alignment Faking in Large Language Models” by Ryan Greenblatt, Buck Shlegeris Read the full episode description	Dec 18, 2024
“Why imperfect adversarial robustness doesn’t doom AI control” by Buck Shlegeris Read the full episode description	Nov 18, 2024
“Win/continue/lose scenarios and execute/replace/audit protocols” by Buck Shlegeris Read the full episode description	Nov 15, 2024
“Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren’t scheming” by Buck Shlegeris Read the full episode description	Oct 10, 2024
“A basic systems architecture for AI agents that do autonomous research” by Buck Shlegeris Read the full episode description	Sep 26, 2024
“How to prevent collusion when using untrusted models to monitor each other” by Buck Shlegeris Read the full episode description	Sep 25, 2024
“Would catching your AIs trying to escape convince AI developers to slow down or undeploy?” by Buck Shlegeris Read the full episode description	Aug 26, 2024
“Fields that I reference when thinking about AI takeover prevention” by Buck Shlegeris Read the full episode description	Aug 13, 2024
“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by Ryan Greenblatt Read the full episode description	Jun 17, 2024

Redwood Research Blog

By Redwood Research

Category: Technology

Open in Apple Podcasts

Open RSS feed

Open Website

Description