AI Safety Fundamentals: Alignment Podcast Republic

AI Safety Fundamentals: Alignment

By BlueDot Impact

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by BlueDot Impact

Category: Technology

Open in Apple Podcasts

Open RSS feed

Open Website

Rate for this podcast

Subscribers: 2
Reviews: 0
Episodes: 83

Description

Listen to resources from the AI Safety Fundamentals: Alignment course!

https://aisafetyfundamentals.com/alignment

Episode	Date
Illustrating Reinforcement Learning from Human Feedback (RLHF) Read the full episode description	Jul 19, 2024
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Read the full episode description	Jul 19, 2024
Constitutional AI Harmlessness from AI Feedback Read the full episode description	Jul 19, 2024
Intro to Brain-Like-AGI Safety Read the full episode description	Jun 17, 2024
Chinchilla’s Wild Implications Read the full episode description	Jun 17, 2024
Deep Double Descent Read the full episode description	Jun 17, 2024
Eliciting Latent Knowledge Read the full episode description	Jun 17, 2024
Empirical Findings Generalize Surprisingly Far Read the full episode description	Jun 17, 2024
Low-Stakes Alignment Read the full episode description	Jun 17, 2024
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions Read the full episode description	Jun 17, 2024
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models Read the full episode description	Jun 17, 2024
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation Read the full episode description	Jun 17, 2024
Imitative Generalisation (AKA ‘Learning the Prior’) Read the full episode description	Jun 17, 2024
Toy Models of Superposition Read the full episode description	Jun 17, 2024
Discovering Latent Knowledge in Language Models Without Supervision Read the full episode description	Jun 17, 2024
An Investigation of Model-Free Planning Read the full episode description	Jun 17, 2024
Gradient Hacking: Definitions and Examples Read the full episode description	Jun 17, 2024
Compute Trends Across Three Eras of Machine Learning Read the full episode description	Jun 13, 2024
Worst-Case Thinking in AI Alignment Read the full episode description	May 29, 2024
Public by Default: How We Manage Information Visibility at Get on Board Read the full episode description	May 12, 2024
How to Get Feedback Read the full episode description	May 12, 2024
Writing, Briefly Read the full episode description	May 12, 2024
Being the (Pareto) Best in the World Read the full episode description	May 04, 2024
How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach Read the full episode description	Apr 23, 2024
Become a Person who Actually Does Things Read the full episode description	Apr 17, 2024
Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points Read the full episode description	Apr 16, 2024
Working in AI Alignment Read the full episode description	Apr 14, 2024
Computing Power and the Governance of AI Read the full episode description	Apr 07, 2024
AI Watermarking Won’t Curb Disinformation Read the full episode description	Apr 07, 2024
Emerging Processes for Frontier AI Safety Read the full episode description	Apr 07, 2024
Challenges in Evaluating AI Systems Read the full episode description	Apr 07, 2024
AI Control: Improving Safety Despite Intentional Subversion Read the full episode description	Apr 07, 2024
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small Read the full episode description	Apr 01, 2024
Zoom In: An Introduction to Circuits Read the full episode description	Mar 31, 2024
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning Read the full episode description	Mar 31, 2024
Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Read the full episode description	Mar 26, 2024
Can We Scale Human Feedback for Complex AI Tasks? Read the full episode description	Mar 26, 2024
Machine Learning for Humans: Supervised Learning Read the full episode description	May 13, 2023
Visualizing the Deep Learning Revolution Read the full episode description	May 13, 2023
AGI Safety From First Principles Read the full episode description	May 13, 2023
Four Background Claims Read the full episode description	May 13, 2023
Biological Anchors: A Trick That Might Or Might Not Work Read the full episode description	May 13, 2023
Future ML Systems Will Be Qualitatively Different Read the full episode description	May 13, 2023
Intelligence Explosion: Evidence and Import Read the full episode description	May 13, 2023
More Is Different for AI Read the full episode description	May 13, 2023
On the Opportunities and Risks of Foundation Models Read the full episode description	May 13, 2023
A Short Introduction to Machine Learning Read the full episode description	May 13, 2023
Learning From Human Preferences Read the full episode description	May 13, 2023
Specification Gaming: The Flip Side of AI Ingenuity Read the full episode description	May 13, 2023
Superintelligence: Instrumental Convergence Read the full episode description	May 13, 2023
The Easy Goal Inference Problem Is Still Hard Read the full episode description	May 13, 2023
What Failure Looks Like Read the full episode description	May 13, 2023
The Alignment Problem From a Deep Learning Perspective Read the full episode description	May 13, 2023
Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It Read the full episode description	May 13, 2023
AGI Ruin: A List of Lethalities Read the full episode description	May 13, 2023
Is Power-Seeking AI an Existential Risk? Read the full episode description	May 13, 2023
Thought Experiments Provide a Third Anchor Read the full episode description	May 13, 2023
Where I Agree and Disagree with Eliezer Read the full episode description	May 13, 2023
Why AI Alignment Could Be Hard With Modern Deep Learning Read the full episode description	May 13, 2023
Yudkowsky Contra Christiano on AI Takeoff Speeds Read the full episode description	May 13, 2023
Goal Misgeneralisation: Why Correct Specifications Aren’t Enough for Correct Goals Read the full episode description	May 13, 2023
ML Systems Will Have Weird Failure Modes Read the full episode description	May 13, 2023
Summarizing Books With Human Feedback Read the full episode description	May 13, 2023
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models Read the full episode description	May 13, 2023
Measuring Progress on Scalable Oversight for Large Language Models Read the full episode description	May 13, 2023
Supervising Strong Learners by Amplifying Weak Experts Read the full episode description	May 13, 2023
High-Stakes Alignment via Adversarial Training [Redwood Research Report] Read the full episode description	May 13, 2023
Takeaways From Our Robust Injury Classifier Project [Redwood Research] Read the full episode description	May 13, 2023
AI Safety via Debate Read the full episode description	May 13, 2023
Debate Update: Obfuscated Arguments Problem Read the full episode description	May 13, 2023
AI Safety via Debatered Teaming Language Models With Language Models Read the full episode description	May 13, 2023
Robust Feature-Level Adversaries Are Interpretability Tools Read the full episode description	May 13, 2023
Introduction to Logical Decision Theory for Computer Scientists Read the full episode description	May 13, 2023
Acquisition of Chess Knowledge in Alphazero Read the full episode description	May 13, 2023
Discovering Latent Knowledge in Language Models Without Supervision Read the full episode description	May 13, 2023
Feature Visualization Read the full episode description	May 13, 2023
Toy Models of Superposition Read the full episode description	May 13, 2023
Understanding Intermediate Layers Using Linear Classifier Probes Read the full episode description	May 13, 2023
Progress on Causal Influence Diagrams Read the full episode description	May 13, 2023
Careers in Alignment Read the full episode description	May 13, 2023
Cooperation, Conflict, and Transformative Artificial Intelligence: Sections 1 & 2 — Introduction, Strategy and Governance Read the full episode description	May 13, 2023
Embedded Agents Read the full episode description	May 13, 2023
Logical Induction (Blog Post) Read the full episode description	May 13, 2023