Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Listen to resources from the AI Safety Fundamentals: Alignment course!
https://aisafetyfundamentals.com/alignment
Episode | Date |
---|---|
Public by default: How we manage information visibility at Get on Board
|
May 12, 2024 |
How to get feedback
|
May 12, 2024 |
Writing, Briefly
|
May 12, 2024 |
Being the (Pareto) Best in the World
|
May 04, 2024 |
How to succeed as an early-stage researcher: the “lean startup” approach
|
Apr 23, 2024 |
Become a person who Actually Does Things
|
Apr 17, 2024 |
Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points
|
Apr 16, 2024 |
Working in AI Alignment
|
Apr 14, 2024 |
Computing Power and the Governance of AI
|
Apr 07, 2024 |
AI Watermarking Won’t Curb Disinformation
|
Apr 07, 2024 |
Emerging Processes for Frontier AI Safety
|
Apr 07, 2024 |
Challenges in Evaluating AI Systems
|
Apr 07, 2024 |
AI Control: Improving Safety Despite Intentional Subversion
|
Apr 07, 2024 |
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small
|
Apr 01, 2024 |
Zoom In: An Introduction to Circuits
|
Mar 31, 2024 |
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
|
Mar 31, 2024 |
Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
|
Mar 26, 2024 |
Can We Scale Human Feedback for Complex AI Tasks?
|
Mar 26, 2024 |
Machine Learning for Humans: Supervised Learning
|
May 13, 2023 |
Visualizing the Deep Learning Revolution
|
May 13, 2023 |
AGI Safety From First Principles
|
May 13, 2023 |
Four Background Claims
|
May 13, 2023 |
Biological Anchors: A Trick That Might Or Might Not Work
|
May 13, 2023 |
Future ML Systems Will Be Qualitatively Different
|
May 13, 2023 |
Intelligence Explosion: Evidence and Import
|
May 13, 2023 |
More Is Different for AI
|
May 13, 2023 |
On the Opportunities and Risks of Foundation Models
|
May 13, 2023 |
A Short Introduction to Machine Learning
|
May 13, 2023 |
Learning From Human Preferences
|
May 13, 2023 |
Specification Gaming: The Flip Side of AI Ingenuity
|
May 13, 2023 |
Superintelligence: Instrumental Convergence
|
May 13, 2023 |
The Easy Goal Inference Problem Is Still Hard
|
May 13, 2023 |
What Failure Looks Like
|
May 13, 2023 |
The Alignment Problem From a Deep Learning Perspective
|
May 13, 2023 |
Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It
|
May 13, 2023 |
AGI Ruin: A List of Lethalities
|
May 13, 2023 |
Is Power-Seeking AI an Existential Risk?
|
May 13, 2023 |
Thought Experiments Provide a Third Anchor
|
May 13, 2023 |
Where I Agree and Disagree with Eliezer
|
May 13, 2023 |
Why AI Alignment Could Be Hard With Modern Deep Learning
|
May 13, 2023 |
Yudkowsky Contra Christiano on AI Takeoff Speeds
|
May 13, 2023 |
Goal Misgeneralisation: Why Correct Specifications Aren’t Enough for Correct Goals
|
May 13, 2023 |
ML Systems Will Have Weird Failure Modes
|
May 13, 2023 |
Summarizing Books With Human Feedback
|
May 13, 2023 |
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
|
May 13, 2023 |
Measuring Progress on Scalable Oversight for Large Language Models
|
May 13, 2023 |
Supervising Strong Learners by Amplifying Weak Experts
|
May 13, 2023 |
High-Stakes Alignment via Adversarial Training [Redwood Research Report]
|
May 13, 2023 |
Takeaways From Our Robust Injury Classifier Project [Redwood Research]
|
May 13, 2023 |
AI Safety via Debate
|
May 13, 2023 |
Debate Update: Obfuscated Arguments Problem
|
May 13, 2023 |
AI Safety via Debatered Teaming Language Models With Language Models
|
May 13, 2023 |
Robust Feature-Level Adversaries Are Interpretability Tools
|
May 13, 2023 |
Introduction to Logical Decision Theory for Computer Scientists
|
May 13, 2023 |
Acquisition of Chess Knowledge in Alphazero
|
May 13, 2023 |
Discovering Latent Knowledge in Language Models Without Supervision
|
May 13, 2023 |
Feature Visualization
|
May 13, 2023 |
Toy Models of Superposition
|
May 13, 2023 |
Understanding Intermediate Layers Using Linear Classifier Probes
|
May 13, 2023 |
Progress on Causal Influence Diagrams
|
May 13, 2023 |
Careers in Alignment
|
May 13, 2023 |
Cooperation, Conflict, and Transformative Artificial Intelligence: Sections 1 & 2 — Introduction, Strategy and Governance
|
May 13, 2023 |
Embedded Agents
|
May 13, 2023 |
Logical Induction (Blog Post)
|
May 13, 2023 |