AI Safety Fundamentals: Alignment

By BlueDot Impact

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by BlueDot Impact

Category: Technology

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast

Subscribers: 0
Reviews: 0
Episodes: 64

Description

Listen to resources from the AI Safety Fundamentals: Alignment course!

https://aisafetyfundamentals.com/alignment


Episode Date
Public by default: How we manage information visibility at Get on Board
May 12, 2024
How to get feedback
May 12, 2024
Writing, Briefly
May 12, 2024
Being the (Pareto) Best in the World
May 04, 2024
How to succeed as an early-stage researcher: the “lean startup” approach
Apr 23, 2024
Become a person who Actually Does Things
Apr 17, 2024
Planning a High-Impact Career: A Summary of Everything You Need to Know in 7 Points
Apr 16, 2024
Working in AI Alignment
Apr 14, 2024
Computing Power and the Governance of AI
Apr 07, 2024
AI Watermarking Won’t Curb Disinformation
Apr 07, 2024
Emerging Processes for Frontier AI Safety
Apr 07, 2024
Challenges in Evaluating AI Systems
Apr 07, 2024
AI Control: Improving Safety Despite Intentional Subversion
Apr 07, 2024
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small
Apr 01, 2024
Zoom In: An Introduction to Circuits
Mar 31, 2024
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Mar 31, 2024
Weak-To-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Mar 26, 2024
Can We Scale Human Feedback for Complex AI Tasks?
Mar 26, 2024
Machine Learning for Humans: Supervised Learning
May 13, 2023
Visualizing the Deep Learning Revolution
May 13, 2023
AGI Safety From First Principles
May 13, 2023
Four Background Claims
May 13, 2023
Biological Anchors: A Trick That Might Or Might Not Work
May 13, 2023
Future ML Systems Will Be Qualitatively Different
May 13, 2023
Intelligence Explosion: Evidence and Import
May 13, 2023
More Is Different for AI
May 13, 2023
On the Opportunities and Risks of Foundation Models
May 13, 2023
A Short Introduction to Machine Learning
May 13, 2023
Learning From Human Preferences
May 13, 2023
Specification Gaming: The Flip Side of AI Ingenuity
May 13, 2023
Superintelligence: Instrumental Convergence
May 13, 2023
The Easy Goal Inference Problem Is Still Hard
May 13, 2023
What Failure Looks Like
May 13, 2023
The Alignment Problem From a Deep Learning Perspective
May 13, 2023
Deceptively Aligned Mesa-Optimizers: It’s Not Funny if I Have to Explain It
May 13, 2023
AGI Ruin: A List of Lethalities
May 13, 2023
Is Power-Seeking AI an Existential Risk?
May 13, 2023
Thought Experiments Provide a Third Anchor
May 13, 2023
Where I Agree and Disagree with Eliezer
May 13, 2023
Why AI Alignment Could Be Hard With Modern Deep Learning
May 13, 2023
Yudkowsky Contra Christiano on AI Takeoff Speeds
May 13, 2023
Goal Misgeneralisation: Why Correct Specifications Aren’t Enough for Correct Goals
May 13, 2023
ML Systems Will Have Weird Failure Modes
May 13, 2023
Summarizing Books With Human Feedback
May 13, 2023
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
May 13, 2023
Measuring Progress on Scalable Oversight for Large Language Models
May 13, 2023
Supervising Strong Learners by Amplifying Weak Experts
May 13, 2023
High-Stakes Alignment via Adversarial Training [Redwood Research Report]
May 13, 2023
Takeaways From Our Robust Injury Classifier Project [Redwood Research]
May 13, 2023
AI Safety via Debate
May 13, 2023
Debate Update: Obfuscated Arguments Problem
May 13, 2023
AI Safety via Debatered Teaming Language Models With Language Models
May 13, 2023
Robust Feature-Level Adversaries Are Interpretability Tools
May 13, 2023
Introduction to Logical Decision Theory for Computer Scientists
May 13, 2023
Acquisition of Chess Knowledge in Alphazero
May 13, 2023
Discovering Latent Knowledge in Language Models Without Supervision
May 13, 2023
Feature Visualization
May 13, 2023
Toy Models of Superposition
May 13, 2023
Understanding Intermediate Layers Using Linear Classifier Probes
May 13, 2023
Progress on Causal Influence Diagrams
May 13, 2023
Careers in Alignment
May 13, 2023
Cooperation, Conflict, and Transformative Artificial Intelligence: Sections 1 & 2 — Introduction, Strategy and Governance
May 13, 2023
Embedded Agents
May 13, 2023
Logical Induction (Blog Post)
May 13, 2023