Best AI papers explained Podcast Republic

Best AI papers explained

By Enoch H. Kang

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by Enoch H. Kang

Category: Technology

Open in Apple Podcasts

Open RSS feed

Open Website

Rate for this podcast

Subscribers: 1
Reviews: 0
Episodes: 733

Description

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Episode	Date
Magentic Marketplace: An Open-Source Environment for studying Agentic Markets Read the full episode description	May 05, 2026
Hyperloop Transformers Read the full episode description	May 05, 2026
Scaling Self-Play with Self-Guidance Read the full episode description	May 04, 2026
RL Token: Bootstrapping Online RL with Vision-Language-Action Models Read the full episode description	May 03, 2026
Agentic Data Environments Read the full episode description	May 03, 2026
AI organizations are more effective but less aligned than individual agents Read the full episode description	May 01, 2026
Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context Read the full episode description	Apr 28, 2026
Distortion of AI alignment revisited: RLHF is a decent utilitarian aligner Read the full episode description	Apr 27, 2026
Llms get lost in multi-turn conversation Read the full episode description	Apr 25, 2026
Transformers are inherently succint Read the full episode description	Apr 23, 2026
The Coasean Singularity? Demand, Supply, and Market Design with AI Agents Read the full episode description	Apr 23, 2026
Demystifying the unreasonable effectiveness of online alignment methods Read the full episode description	Apr 21, 2026
Specialization after generalization: towards understanding test-time training in foundation models Read the full episode description	Apr 21, 2026
Exploration and Exploitation Errors Are Measurable for Language Model Agents Read the full episode description	Apr 20, 2026
A Mechanistic Analysis of Looped Reasoning Language Models Read the full episode description	Apr 19, 2026
Sample Complexity of Autoregressive Reasoning: Chain-of-Thought vs. End-to-End Read the full episode description	Apr 19, 2026
Why AI systems don’t learn and what to do about it Read the full episode description	Apr 17, 2026
The Illusion of Learning from Observational Data: An Empirical Bayes Perspective Read the full episode description	Apr 17, 2026
Ads in AI chatbots? An analysis of how large language models navigate conflicts of interest Read the full episode description	Apr 17, 2026
Beyond Semantic Manipulation: Token-Space Attacks on Reward Models Read the full episode description	Apr 13, 2026
LLM Evaluation as Tensor Completion: Low-Rank Efficiency and Uncertainty Quantification Read the full episode description	Apr 12, 2026
Neural Computers Read the full episode description	Apr 11, 2026
How AI Aggregation Affects Knowledge Read the full episode description	Apr 11, 2026
World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry Read the full episode description	Apr 10, 2026
In-Place Test-Time Training Read the full episode description	Apr 09, 2026
Test-Time Scaling Makes Overtraining Compute-Optimal Read the full episode description	Apr 07, 2026
AI Agent Prevalence and Data Quality Across Multiple Online Sample Providers Read the full episode description	Apr 07, 2026
POLCA: Stochastic Generative Optimization with LLM Read the full episode description	Apr 04, 2026
Agentic Markets: Equilibrium Effects of Improving Consumer Search Read the full episode description	Apr 04, 2026
One Model, Two Markets: Bid-Aware Generative Recommendation Read the full episode description	Apr 01, 2026
How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge Read the full episode description	Apr 01, 2026
Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum Read the full episode description	Apr 01, 2026
Agentic AI and the next intelligence explosion Read the full episode description	Mar 30, 2026
Understanding Behavior Cloning with Action Quantization Read the full episode description	Mar 29, 2026
HyperAgents: : Open-Ended Metacognitive Self-Improvement for Any Computable Task Read the full episode description	Mar 27, 2026
Harness design for long-running application development \ Anthropic Read the full episode description	Mar 26, 2026
Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably Read the full episode description	Mar 24, 2026
How Log-Barrier Helps Exploration in Policy Optimization Read the full episode description	Mar 22, 2026
The Finetuner’s Fallacy: When to Pretrain with Your Finetuning Data Read the full episode description	Mar 22, 2026
TURNWISE: The Gap between Single- and Multi-turn Language Model Capabilities Read the full episode description	Mar 22, 2026
Temporal Straightening for Latent Planning Read the full episode description	Mar 20, 2026
Fine-Tuning Strategies for Preserving In-Context Learning in Linear Attention Read the full episode description	Mar 19, 2026
LLMs Can Learn to Reason Via Off-Policy RL Read the full episode description	Mar 19, 2026
Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning Read the full episode description	Mar 17, 2026
Provable and practical in-context policy optimization for self-improvement Read the full episode description	Mar 17, 2026
Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models Read the full episode description	Mar 16, 2026
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights Read the full episode description	Mar 14, 2026
AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization Read the full episode description	Mar 14, 2026
∇−reasoner: LLM reasoning via test-time gradient descent in latent space Read the full episode description	Mar 14, 2026
Inference for Regression with Variables Generated by AI or Machine Learning Read the full episode description	Mar 12, 2026
Fast KV Compaction via Attention Matching Read the full episode description	Mar 12, 2026
Position: stop anthropomorphizing intermediate tokens as reasoning/thinking traces! Read the full episode description	Mar 11, 2026
Code World Models for General Game Playing Read the full episode description	Mar 08, 2026
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought Read the full episode description	Mar 07, 2026
Task Descriptors Help Transformers Learn Linear Models In-Context Read the full episode description	Mar 07, 2026
Equivalence of Context and Parameter Updates in Modern Transformer Blocks Read the full episode description	Mar 07, 2026
Learning without training: The implicit dynamics of in-context learning Read the full episode description	Mar 07, 2026
Causal Identification from Counterfactual Data: Completeness and Bounding Results Read the full episode description	Mar 07, 2026
Is Cosine-Similarity of Embeddings Really About Similarity? Read the full episode description	Mar 06, 2026
Diffusion LLMs are Natural Adversaries for any LLM Read the full episode description	Mar 05, 2026
Are you going to finish that? A Practical Study of the Partial Token Problem Read the full episode description	Mar 04, 2026
Language Models Struggle to Use Representations Learned In-Context Read the full episode description	Mar 02, 2026
LLMs are Bayesian, In Expectation, Not in Realization Read the full episode description	Mar 01, 2026
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs Read the full episode description	Feb 27, 2026
LLMs Can Learn to Reason Via Off-Policy RL Read the full episode description	Feb 27, 2026
Test-Time Training with KV Binding Is Secretly Linear Attention Read the full episode description	Feb 27, 2026
Unified Latents (UL): How to train your latents Read the full episode description	Feb 26, 2026
Spectral Bellman Method: Unifying RL Representation and Exploration Read the full episode description	Feb 25, 2026
Prescriptive Scaling Reveals the Evolution of Language Model Capabilities Read the full episode description	Feb 24, 2026
Experiential Reinforcement Learning Read the full episode description	Feb 23, 2026
Learning Personalized Agents from Human Feedback Read the full episode description	Feb 21, 2026
Learning to summarize user information for personalized RLHF Read the full episode description	Feb 20, 2026
Intrinsic Credit Assignment for Long Horizon Interaction Read the full episode description	Feb 20, 2026
Learning to Continually Learn via Meta-learning Agentic Memory Designs Read the full episode description	Feb 20, 2026
Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models Read the full episode description	Feb 19, 2026
PAD: Personalized Alignment of LLMs at Decoding-Time Read the full episode description	Feb 19, 2026
The Reward Model Selection Crisis in Personalized Alignment Read the full episode description	Feb 19, 2026
Causal-JEPA: Learning World Models through Object-Level Latent Interventions Read the full episode description	Feb 18, 2026
How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics Read the full episode description	Feb 17, 2026
Deriving neural scaling laws from the statistics of natural language Read the full episode description	Feb 15, 2026
Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL Read the full episode description	Feb 15, 2026
Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL Read the full episode description	Feb 14, 2026
Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning Read the full episode description	Feb 12, 2026
Owning the AI Pareto Frontier — Jeff Dean Read the full episode description	Feb 12, 2026
Learning to Reason in 13 Parameters Read the full episode description	Feb 11, 2026
Nearly Optimal Active Preference Learning and Its Application to LLM Alignment Read the full episode description	Feb 08, 2026
Language Model Circuits Are Sparse in the Neuron Basis Read the full episode description	Feb 08, 2026
Rethinking the Trust Region in LLM Reinforcement Learning Read the full episode description	Feb 08, 2026
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward Read the full episode description	Feb 08, 2026
Self-distillation enables continual learning Read the full episode description	Feb 07, 2026
Maximum Likelihood Reinforcement Learning Read the full episode description	Feb 06, 2026
In-Context Algorithm Emulation in Fixed-Weight Transformers Read the full episode description	Feb 05, 2026
PPI-SVRG: Unifying Prediction-Powered Inference and Variance Reduction for Semi-Supervised Optimization Read the full episode description	Feb 05, 2026
When Models Don’t Collapse: On the Consistency of Iterative MLE Read the full episode description	Feb 03, 2026
An orthogonal learner for individualized outcomes In markov decision processes Read the full episode description	Feb 03, 2026
Shaping capabilities with token-level data filtering Read the full episode description	Feb 01, 2026
Self-Improving Pretraining: using post-trained models to pretrain better models Read the full episode description	Feb 01, 2026
Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success Read the full episode description	Jan 31, 2026
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning Read the full episode description	Jan 31, 2026
GameTalk: Training LLMs for Strategic Multi-Turn Conversation Read the full episode description	Jan 30, 2026
Reinforcement Learning via Self-Distillation Read the full episode description	Jan 30, 2026
Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning Read the full episode description	Jan 28, 2026
On the alignment between supervised and self-supervised contrastive learning Read the full episode description	Jan 28, 2026
Rethinking the value of multi-agent work-flow: a strong single agent baseline Read the full episode description	Jan 24, 2026
Greedy Sampling Is Provably Efficient for RLHF Read the full episode description	Jan 24, 2026
A Generalization Theory for Zero-Shot Prediction Read the full episode description	Jan 24, 2026
Learning to Discover at Test Time Read the full episode description	Jan 23, 2026
How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness Read the full episode description	Jan 23, 2026
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Retrieval Read the full episode description	Jan 20, 2026
Activation Reward Models for Few-Shot Model Alignment Read the full episode description	Jan 20, 2026
Reward is enough: LLMs are in-context reinforcement learners Read the full episode description	Jan 19, 2026
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO Read the full episode description	Jan 19, 2026
The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination Read the full episode description	Jan 18, 2026
PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary Read the full episode description	Jan 18, 2026
Coverage Improvement and Fast Convergence of On-policy Preference Learning Read the full episode description	Jan 17, 2026
Stagewise Reinforcement Learning and the Geometry of the Regret Landscape Read the full episode description	Jan 16, 2026
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models Read the full episode description	Jan 16, 2026
Learning Latent Action World Models In The Wild Read the full episode description	Jan 16, 2026
From Unstructured Data to Demand Counterfactuals: Theory and Practice Read the full episode description	Jan 14, 2026
In-context reinforcement learning through bayesian fusion of context and value prior Read the full episode description	Jan 14, 2026
Digital RedQueen: Adversarial Program Evolution in Core War with LLMs Read the full episode description	Jan 14, 2026
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings Read the full episode description	Jan 13, 2026
Representation-Based Exploration for Language Models: from test-time to post-training Read the full episode description	Jan 12, 2026
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Read the full episode description	Jan 10, 2026
RelayLLM: Efficient Reasoning via Collaborative Decoding Read the full episode description	Jan 10, 2026
A Unified Definition of Hallucination, Or: It’s the World Model, Stupid Read the full episode description	Jan 08, 2026
Deep sequence models tend to memorize geometrically; it is unclear why. Read the full episode description	Jan 08, 2026
From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence Read the full episode description	Jan 08, 2026
Diffusion Language Models are Provably Optimal Parallel Samplers Read the full episode description	Jan 07, 2026
Universal Reasoning Model Read the full episode description	Jan 06, 2026
Recursive language models Read the full episode description	Jan 06, 2026
Adapting fast and slow: transportable circuits for few shot learning Read the full episode description	Jan 04, 2026
Position: Probabilistic Modelling is Sufficient for Causal Inference Read the full episode description	Jan 03, 2026
End-to-End Test-Time Training for Long Context Read the full episode description	Jan 03, 2026
Parallel Token Generation for Language Models Read the full episode description	Jan 02, 2026
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning Read the full episode description	Dec 31, 2025
Activation oracles: training and evaluating llms as general-purpose activation explainers Read the full episode description	Dec 30, 2025
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Read the full episode description	Dec 29, 2025
Joint-Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction Read the full episode description	Dec 29, 2025
Monitoring Monitorability/ OpenAI Read the full episode description	Dec 28, 2025
Detailed Balance in Large Language Model-Driven Agents Read the full episode description	Dec 28, 2025
Learning to reason in LLMs by expectation maximization Read the full episode description	Dec 28, 2025
Exploratory Causal Inference in SAEnce Read the full episode description	Dec 25, 2025
Detailed balance in large language model-driven agents Read the full episode description	Dec 24, 2025
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Read the full episode description	Dec 24, 2025
Adaptation of Agentic AI Read the full episode description	Dec 23, 2025
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning Read the full episode description	Dec 22, 2025
Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs Read the full episode description	Dec 21, 2025
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models Read the full episode description	Dec 20, 2025
What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data Read the full episode description	Dec 19, 2025
Bolmo: Byteifying the Next Generation of Language Models Read the full episode description	Dec 19, 2025
What happened with sparse autoencoders? Read the full episode description	Dec 17, 2025
What Matters Right Now in Mechanistic Interpretability Read the full episode description	Dec 16, 2025
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning Read the full episode description	Dec 16, 2025
Self-Improving AI and Human Co-Improvement for Safer Co-Superintelligence Read the full episode description	Dec 16, 2025
Towards a Science of Scaling Agent Systems / Google Deepmind Read the full episode description	Dec 15, 2025
Emergent hierarchical reasoning in LLMs through reinforcement learning Read the full episode description	Dec 14, 2025
AI revolution finally comes to Relational foundational models for structured data Read the full episode description	Dec 13, 2025
REFRAG: Rethinking RAG based Decoding Read the full episode description	Dec 13, 2025
Provable Long-Range Benefits of Next-Token Prediction Read the full episode description	Dec 12, 2025
Jeff Dean on TPUs, AI Research, and Funding Read the full episode description	Dec 12, 2025
Latent Debate: surrogate framework for Interpreting LLM Thinking Read the full episode description	Dec 11, 2025
Distribution-calibrated inference time compute for thinking llm-as-a-judge Read the full episode description	Dec 11, 2025
Principled RL for diffusion LLMs emerges from sequence level perspective Read the full episode description	Dec 11, 2025
Algorithmic Thinking Theory Read the full episode description	Dec 10, 2025
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Read the full episode description	Dec 10, 2025
Natural language actor-critic: Scalable off-policy learning in language space Read the full episode description	Dec 09, 2025
Beyond the Transformer: Titans, MIRAS, and the Future of Infinite Context Read the full episode description	Dec 07, 2025
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference Read the full episode description	Dec 07, 2025
The Universal Weight Subspace Hypothesis Read the full episode description	Dec 07, 2025
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Read the full episode description	Dec 07, 2025
Benchmarking In-context Experiential Learning Through Repeated Product Recommendations Read the full episode description	Dec 04, 2025
Training LLMs for Honesty via Confessions Read the full episode description	Dec 04, 2025
STOIC REASONER: Dual-Mode Transformers that Compress to Think and Decompress to Speak Read the full episode description	Dec 04, 2025
E-GEO: A Testbed for Generative Engine Optimization in E-Commerce Read the full episode description	Dec 04, 2025
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities Read the full episode description	Dec 04, 2025
Treatment Effect Estimation for Optimal Decision-Making Read the full episode description	Dec 04, 2025
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems Read the full episode description	Dec 03, 2025
Debugging misaligned completions with sparse-autoencoder latent attribution Read the full episode description	Dec 02, 2025
Building Effective AI Agents \ Anthropic Read the full episode description	Dec 02, 2025
How to Correctly Report LLM-as-a-Judge Evaluations Read the full episode description	Dec 02, 2025
In-Context Learning with Hypothesis-Class Guidance Read the full episode description	Dec 02, 2025
Selecting Belief-State Approximations in Simulators with Latent States Read the full episode description	Dec 01, 2025
Latent Collaboration in Multi-Agent Systems Read the full episode description	Nov 29, 2025
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning Read the full episode description	Nov 28, 2025
DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs? Read the full episode description	Nov 28, 2025
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing Read the full episode description	Nov 27, 2025
Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs Read the full episode description	Nov 27, 2025
Ilya Sutskever – We're moving from the age of scaling to the age of research Read the full episode description	Nov 26, 2025
Cognitive Foundations for Reasoning and Their Manifestation in LLMs Read the full episode description	Nov 26, 2025
Natural emergent misalignment from reward hacking in production RL Read the full episode description	Nov 25, 2025
Evolution Strategies at the Hyperscale Read the full episode description	Nov 25, 2025
The Path Not Taken: RLVR Provably Learns Off the Principals Read the full episode description	Nov 23, 2025
Back to Basics: Let Denoising Generative Models Denoise Read the full episode description	Nov 23, 2025
LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization Read the full episode description	Nov 22, 2025
Black-Box On-Policy Distillation of Large Language Models Read the full episode description	Nov 20, 2025
Solving a million step LLM task with zero errors Read the full episode description	Nov 20, 2025
Not All Thoughts Matter: Selective Attention for Efficient Reasoning Read the full episode description	Nov 19, 2025
Sample-Efficient Parametric Learning from Natural Language Read the full episode description	Nov 19, 2025
Bayesian Optimization in Language space: An Eval-Efficient AI Self-Improvement Framework Read the full episode description	Nov 18, 2025
Context Engineering: Sessions, Memory Read the full episode description	Nov 16, 2025
The Era of Agentic Organization: Learning to Organize with Language Models Read the full episode description	Nov 15, 2025
Understanding neural networks through sparse circuits Read the full episode description	Nov 14, 2025
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Read the full episode description	Nov 14, 2025
Multi-Agent Evolve: LLM Self-Improvement Through Co-Evolution Read the full episode description	Nov 14, 2025
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics Read the full episode description	Nov 14, 2025
PREFDISCO: Evaluating Proactive Personalization through Interactive Preference Discovery Read the full episode description	Nov 12, 2025
Reusing pre-training data at test time is a compute multiplier Read the full episode description	Nov 10, 2025
Scaling Agent Learning via Experience Synthesis Read the full episode description	Nov 09, 2025
Continuous Autoregressive Language Models Read the full episode description	Nov 08, 2025
Toward a Theory of Agents as Tool-Use Decision-Makers Read the full episode description	Nov 07, 2025
Nested Learning: The Illusion of Deep Learning Architectures Read the full episode description	Nov 05, 2025
GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding Read the full episode description	Nov 05, 2025
Beyond a million tokens: benchmarking and enhancing long-term memory in llms Read the full episode description	Nov 04, 2025
Agentic Economic Modeling Read the full episode description	Nov 03, 2025
Emergent Introspective Awareness in Large Language Models Read the full episode description	Nov 03, 2025
Can Large reasoning models self-train? Read the full episode description	Nov 01, 2025
ALITA-G: Self-Evolving Generative Agent for Agent Generation Read the full episode description	Nov 01, 2025
Self-improving LLM agents at test-time Read the full episode description	Oct 30, 2025
Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization Read the full episode description	Oct 30, 2025
Language models are injective and hence invertible Read the full episode description	Oct 30, 2025
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory Read the full episode description	Oct 29, 2025
RLAD: Training LLMs to Discover Abstractions Read the full episode description	Oct 29, 2025
How to Train Your Advisor: Steering Black-Box LLMs with ADVISOR MODELS Read the full episode description	Oct 29, 2025
Self-improving LLM agents at Test-Time Read the full episode description	Oct 27, 2025
KL-Regularized Reinforcement Learning is designed to Mode Collapse Read the full episode description	Oct 27, 2025
How do LLMs use their depth? Read the full episode description	Oct 27, 2025
Thought Communication in Multiagent Collaboration Read the full episode description	Oct 27, 2025
Reasoning with Sampling: Base Models Outperform RL Read the full episode description	Oct 26, 2025
Continual Learning via Sparse Memory Finetuning Read the full episode description	Oct 26, 2025
Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences Read the full episode description	Oct 24, 2025
The Coverage Principle: How Pre-Training Enables Post-Training Read the full episode description	Oct 24, 2025
The Era of Real-World Human Interaction: RL from User Conversations Read the full episode description	Oct 24, 2025
Agent Learning via Early Experience Read the full episode description	Oct 24, 2025
Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL Read the full episode description	Oct 22, 2025
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior Read the full episode description	Oct 22, 2025
A Definition of AGI Read the full episode description	Oct 22, 2025
Provably Learning from Language Feedback Read the full episode description	Oct 21, 2025
In-Context Learning for Pure Exploration Read the full episode description	Oct 21, 2025
On the Role of Preference Variance in Preference Optimization Read the full episode description	Oct 20, 2025
Training LLM Agents to Empower Humans Read the full episode description	Oct 20, 2025
Richard Sutton Declares LLMs a Dead End Read the full episode description	Oct 20, 2025
Demystifying Reinforcement Learning in Agentic Reasoning Read the full episode description	Oct 19, 2025
Emergent coordination in multi-agent language models Read the full episode description	Oct 19, 2025
Learning-to-measure: in-context active feature acquisition Read the full episode description	Oct 19, 2025
Andrej Karpathy's insights: AGI, Intelligence, and Evolution Read the full episode description	Oct 19, 2025
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data Read the full episode description	Oct 18, 2025
Representation-Based Exploration for Language Models: From Test-Time to Post-Training Read the full episode description	Oct 18, 2025
The attacker moves second: stronger adaptive attacks bypass defenses against LLM jail- Breaks and prompt injections Read the full episode description	Oct 18, 2025
When can in-context learning generalize out of task distribution? Read the full episode description	Oct 16, 2025
The Art of Scaling Reinforcement Learning Compute for LLMs Read the full episode description	Oct 16, 2025
A small number of samples can poison LLMs of any size Read the full episode description	Oct 16, 2025
Dual Goal Representations Read the full episode description	Oct 14, 2025
Welcome to the Era of Experience Read the full episode description	Oct 14, 2025
Value Flows: Flow-Based Distributional Reinforcement Learning Read the full episode description	Oct 14, 2025
Self-Adapting Language Models Read the full episode description	Oct 12, 2025
The Markovian Thinker Read the full episode description	Oct 12, 2025
Moloch’s Bargain: emergent misalignment when LLMs compete for audiences Read the full episode description	Oct 12, 2025
Transformer Predictor Dynamics and Task Diversity Read the full episode description	Oct 11, 2025
Base models know how to reason, thinking models learn when Read the full episode description	Oct 11, 2025
Spectrum tuning: Post-training for distributional coverage and in-context steerability Read the full episode description	Oct 11, 2025
Understanding Prompt Tuning and In-Context Learning via Meta-Learning Read the full episode description	Oct 11, 2025
MLPs Learn In-Context on Regression and Classification tasks Read the full episode description	Oct 11, 2025
Is Pre-Training Truly Better than Meta-Learning? Read the full episode description	Oct 11, 2025
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Read the full episode description	Oct 11, 2025
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs Read the full episode description	Oct 09, 2025
Learning dynamics of LLM finetuning Read the full episode description	Oct 09, 2025
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF Read the full episode description	Oct 09, 2025
OpenAI Agent Builder and n8n: Orchestrating Reasoning Versus Automating Process Read the full episode description	Oct 08, 2025
Training Agents Inside of Scalable World Models Read the full episode description	Oct 08, 2025
Small Language Models are the Future of Agentic AI Read the full episode description	Oct 07, 2025
Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis Read the full episode description	Oct 06, 2025
Eliciting Secret Knowledge from Language Models Read the full episode description	Oct 06, 2025
Temporal difference flow Read the full episode description	Oct 06, 2025
Personalized reasoning: just-in-time personalization and why LLMs fail at it Read the full episode description	Oct 05, 2025
Prompt Curriculum Learning for Efficient LLM Post-Training Read the full episode description	Oct 05, 2025
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Read the full episode description	Oct 04, 2025
Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward Read the full episode description	Oct 04, 2025
Learning to summarize user information for personalized reinforcement learning from human feedback Read the full episode description	Oct 04, 2025
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF Read the full episode description	Oct 03, 2025
LIMI: Less is More for Agency Read the full episode description	Oct 01, 2025
LoRA Without Regret Read the full episode description	Oct 01, 2025
Actor-Critic without Actor: Critic-Guided Denoising for RL Read the full episode description	Sep 29, 2025
DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs? Read the full episode description	Sep 29, 2025
Linear Transformers Implicitly Discover Unified Numerical Algorithms Read the full episode description	Sep 29, 2025
Regularizing Extrapolation in Causal Inference Read the full episode description	Sep 27, 2025
DoubleGen - Debiased Generative Modeling of Counterfactuals Read the full episode description	Sep 27, 2025
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Read the full episode description	Sep 27, 2025
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision Read the full episode description	Sep 27, 2025
Learning without training: The implicit dynamics of in-context learning Read the full episode description	Sep 24, 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model Read the full episode description	Sep 24, 2025
Open Problems in Mechanistic Interpretability Read the full episode description	Sep 21, 2025
Maestro: Joint Graph & Config Optimization for Reliable AI Agents Read the full episode description	Sep 21, 2025
Thought Anchors: Which LLM Reasoning Steps Matter? Read the full episode description	Sep 21, 2025
RL's Razor: Why Online RL Forgets Less Read the full episode description	Sep 07, 2025
Why Language Models Hallucinate Read the full episode description	Sep 06, 2025
ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning Read the full episode description	Sep 06, 2025
Sample Efficient Preference Alignment in LLMs via Active Exploration Read the full episode description	Sep 06, 2025
Adventures in Demand Analysis Using AI Read the full episode description	Sep 04, 2025
Memento: Fine-tuning LLM Agents without Fine-tuning LLMs Read the full episode description	Sep 01, 2025
On the Theoretical Limitations of Embedding-Based Retrieval Read the full episode description	Aug 31, 2025
Performance Prediction for Large Systems via Text-to-Text Regression Read the full episode description	Aug 30, 2025
Demystifying the Visual Quality Paradox in Multimodal Large Language Models Read the full episode description	Aug 30, 2025
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Read the full episode description	Aug 30, 2025
Compute-Optimal Scaling for Value-Based Deep RL Read the full episode description	Aug 25, 2025
LLM-based Conversational Recommendation Agents with Collaborative Verbalized Experience Read the full episode description	Aug 23, 2025
Signal and Noise: Evaluating Language Model Benchmarks Read the full episode description	Aug 23, 2025
Breaking Feedback Loops in Recommender Systems with Causal Inference Read the full episode description	Aug 21, 2025
RAG is Dead, Context Engineering is King: Building Reliable AI Systems Read the full episode description	Aug 20, 2025
A Survey of Personalization: From RAG to Agent Read the full episode description	Aug 20, 2025
Facilitating the Adoption of Causal Infer-ence Methods Through LLM-Empowered Co-Pilot Read the full episode description	Aug 19, 2025
Performance Prediction for Large Systems via Text-to-Text Regression Read the full episode description	Aug 16, 2025
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning Read the full episode description	Aug 15, 2025
DINOv3: Vision Models for Self-Supervised Learning Read the full episode description	Aug 15, 2025
Agent Lightning: Training Any AI Agents with Reinforcement Learning Read the full episode description	Aug 14, 2025
Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier Read the full episode description	Aug 14, 2025
From Model Weights to Agent Workflows: Charting the New Frontier of Optimization in Large Language Models Read the full episode description	Aug 12, 2025
Is Chain-of-Thought Reasoning a Mirage? Read the full episode description	Aug 12, 2025
Agentic Web: Weaving the Next Web with AI Agents Read the full episode description	Aug 11, 2025
The Assimilation-Accommodation Gap in LLM Intelligence Read the full episode description	Aug 10, 2025
The Minimalist AI Kernel: A New Frontier in Reasoning Read the full episode description	Aug 06, 2025
Statistical Rigor for Interpretable AI Read the full episode description	Aug 06, 2025
Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value Read the full episode description	Aug 04, 2025
A foundation model to predict and capture human cognition Read the full episode description	Aug 04, 2025
Generative Recommendation with Semantic IDs: A Practitioner’s Handbook Read the full episode description	Aug 04, 2025
Hierarchical Reasoning Model Read the full episode description	Aug 04, 2025
Test-time Offline Reinforcement Learning on Goal-related Experience Read the full episode description	Aug 04, 2025
Interpreting Chain of Thought: A Walkthrough and Discussion Read the full episode description	Aug 04, 2025
The wall confronting large language models Read the full episode description	Aug 04, 2025
COLLABLLM: LLMs From Passive to Collaborative Read the full episode description	Jul 31, 2025
A decade's battle on dataset bias: are we there yet? Read the full episode description	Jul 29, 2025
GEPA: Generative Feedback for AI System Optimization Read the full episode description	Jul 29, 2025
From AI-Curious to AI-First: Engineering Production AI Systems Read the full episode description	Jul 28, 2025
Context Engineering: Beyond Simple Prompting to LLM Architecture Read the full episode description	Jul 28, 2025
Agentic Misalignment: LLMs as Insider Threats Read the full episode description	Jul 28, 2025
Small Language Models: Future of Agentic AI Read the full episode description	Jul 28, 2025
Learning without training: The implicit dynamics of in-context learning Read the full episode description	Jul 28, 2025
Inverse Scaling in Test-Time Compute Read the full episode description	Jul 28, 2025
LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra Read the full episode description	Jul 28, 2025
Microsoft's Blueprint: AI, Quantum, and the Agentic Future Read the full episode description	Jul 26, 2025
Zuckerberg's AI Vision Analyzed Read the full episode description	Jul 26, 2025
Inside Claude: Scaling, Agency, and Interpretability Read the full episode description	Jul 26, 2025
Personalized language modeling from personalized human feedback Read the full episode description	Jul 26, 2025
Position: Empowering Time Series Reasoning with Multimodal LLMs Read the full episode description	Jul 25, 2025
An empirical risk minimization approach for offline inverse RL and Dynamic Discrete Choice models Read the full episode description	Jul 22, 2025
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Read the full episode description	Jul 22, 2025
The Invisible Leash: Why RLVR May Not Escape Its Origin Read the full episode description	Jul 20, 2025
Language Model Personalization via Reward Factorization Read the full episode description	Jul 20, 2025
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions Read the full episode description	Jul 18, 2025
Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective Read the full episode description	Jul 17, 2025
Soft Best-of-n Sampling for Model Alignment Read the full episode description	Jul 16, 2025
On Temporal Credit Assignment and Data-Efficient Reinforcement Learning Read the full episode description	Jul 15, 2025
Bradley–Terry and Multi-Objective Reward Modeling Are Complementary Read the full episode description	Jul 15, 2025
Probing Foundation Models for World Models Read the full episode description	Jul 15, 2025
GenAI-Powered Statistical Inference (with Unstructured Data) Read the full episode description	Jul 14, 2025
Interpretable Reward Modeling with Active Concept Bottlenecks Read the full episode description	Jul 14, 2025
PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications Read the full episode description	Jul 14, 2025
A Collectivist, Economic Perspective on AI Read the full episode description	Jul 14, 2025
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems Read the full episode description	Jul 12, 2025
The Winner's Curse in Data-Driven Decisions Read the full episode description	Jul 11, 2025
SPIRAL: Self-Play for Reasoning Through Zero-Sum Games Read the full episode description	Jul 11, 2025
Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence Read the full episode description	Jul 11, 2025
Aligning Learning and Endogenous Decision-Making Read the full episode description	Jul 11, 2025
Reliable Statistical Inference with Synthetic Data from Large Language Models Read the full episode description	Jul 11, 2025
Multi-Turn Reinforcement Learning from Human Preference Feedback Read the full episode description	Jul 10, 2025
Provably Learning from Language Feedback Read the full episode description	Jul 09, 2025
Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners Read the full episode description	Jul 05, 2025
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation Read the full episode description	Jul 05, 2025
Causal Abstraction with Lossy Representations Read the full episode description	Jul 04, 2025
The Winner's Curse in Data-Driven Decisions Read the full episode description	Jul 04, 2025
Embodied AI Agents: Modeling the World Read the full episode description	Jul 04, 2025
Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence Read the full episode description	Jul 04, 2025
What Has a Foundation Model Found? Inductive Bias Reveals World Models Read the full episode description	Jul 04, 2025
Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond Read the full episode description	Jul 03, 2025
Learning to Explore: An In-Context Learning Approach for Pure Exploration Read the full episode description	Jul 03, 2025
Human-AI Matching: The Limits of Algorithmic Search Read the full episode description	Jun 25, 2025
Uncertainty Quantification Needs Reassessment for Large-language Model Agents Read the full episode description	Jun 25, 2025
Bayesian Meta-Reasoning for Robust LLM Generalization Read the full episode description	Jun 25, 2025
General Intelligence Requires Reward-based Pretraining Read the full episode description	Jun 25, 2025
Deep Learning is Not So Mysterious or Different Read the full episode description	Jun 25, 2025
AI Agents Need Authenticated Delegation Read the full episode description	Jun 25, 2025
Probabilistic Modelling is Sufficient for Causal Inference Read the full episode description	Jun 25, 2025
Not All Explanations for Deep Learning Phenomena Are Equally Valuable Read the full episode description	Jun 25, 2025
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs Read the full episode description	Jun 17, 2025
Extrapolation by Association: Length Generalization Transfer in Transformers Read the full episode description	Jun 17, 2025
Uncovering Causal Hierarchies in Language Model Capabilities Read the full episode description	Jun 17, 2025
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Read the full episode description	Jun 17, 2025
Improving Treatment Effect Estimation with LLM-Based Data Augmentation Read the full episode description	Jun 17, 2025
LLM Numerical Prediction Without Auto-Regression Read the full episode description	Jun 17, 2025
Why in-context learning models are good few-shot learners? Read the full episode description	Jun 17, 2025
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina∗ Read the full episode description	Jun 14, 2025
The Logic of Machines: The AI Reasoning Debate Read the full episode description	Jun 12, 2025
Layer by Layer: Uncovering Hidden Representations in Language Models Read the full episode description	Jun 12, 2025
Causal Attribution Analysis for Continuous Outcomes Read the full episode description	Jun 12, 2025
Training a Generally Curious Agent Read the full episode description	Jun 12, 2025
Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s Read the full episode description	Jun 12, 2025
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning Read the full episode description	Jun 12, 2025
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs Read the full episode description	Jun 11, 2025
Agentic Supernet for Multi-agent Architecture Search Read the full episode description	Jun 11, 2025
Sample Complexity and Representation Ability of Test-time Scaling Paradigms Read the full episode description	Jun 11, 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators Read the full episode description	Jun 10, 2025
LLMs Get Lost In Multi-Turn Conversation Read the full episode description	Jun 09, 2025
PromptPex: Automatic Test Generation for Prompts Read the full episode description	Jun 08, 2025
General Agents Need World Models Read the full episode description	Jun 08, 2025
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models Read the full episode description	Jun 07, 2025
Decisions With Algorithms Read the full episode description	Jun 07, 2025
Adapting, fast and slow: Causal Approach to Few-Shot Sequence Learning Read the full episode description	Jun 06, 2025
Conformal Arbitrage for LLM Objective Balancing Read the full episode description	Jun 06, 2025
Simulation-Based Inference for Adaptive Experiments Read the full episode description	Jun 06, 2025
Agents as Tool-Use Decision-Makers Read the full episode description	Jun 06, 2025
Quantitative Judges for Large Language Models Read the full episode description	Jun 06, 2025
Self-Challenging Language Model Agents Read the full episode description	Jun 06, 2025
Learning to Explore: An In-Context Learning Approach for Pure Exploration Read the full episode description	Jun 06, 2025
How Bidirectionality Helps Language Models Learn Better via Dynamic Bottleneck Estimation Read the full episode description	Jun 06, 2025
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models Read the full episode description	Jun 05, 2025
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling Read the full episode description	Jun 05, 2025
Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models Read the full episode description	Jun 05, 2025
IPO: Interpretable Prompt Optimization for Vision-Language Models Read the full episode description	Jun 05, 2025
Evolutionary Prompt Optimization discovers emergent multimodal reasoning strategies Read the full episode description	Jun 05, 2025
Evaluating the Unseen Capabilities: How Many Theorems Do LLMs Know? Read the full episode description	Jun 04, 2025
Diffusion Guidance Is a Controllable Policy Improvement Operator Read the full episode description	Jun 02, 2025
Alita: Generalist Agent With Self-Evolution Read the full episode description	Jun 02, 2025
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning Read the full episode description	Jun 02, 2025
Learning Compositional Functions with Transformers from Easy-to-Hard Data Read the full episode description	Jun 02, 2025
Preference Learning with Response Time Read the full episode description	Jun 02, 2025
Accelerating RL for LLM Reasoning with Optimal Advantage Regression Read the full episode description	May 31, 2025
Algorithms for reliable decision-making need causal reasoning Read the full episode description	May 31, 2025
Belief Attribution as Mental Explanation: The Role of Accuracy, Informativity, and Causality Read the full episode description	May 31, 2025
Distances for Markov chains from sample streams Read the full episode description	May 31, 2025
When and Why LLMs Fail to Reason Globally Read the full episode description	May 31, 2025
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis Read the full episode description	May 31, 2025
No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference Read the full episode description	May 31, 2025
Accelerating RL for LLM Reasoning with Optimal Advantage Regression Read the full episode description	May 31, 2025
Statistical Inference for Online Algorithms Read the full episode description	May 31, 2025
Prismatic Synthesis for Diverse LLM Reasoning Data Read the full episode description	May 31, 2025
Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents Read the full episode description	May 31, 2025
The Agentic Economy Read the full episode description	May 30, 2025
Statistics for Large Language Models Read the full episode description	May 29, 2025
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search Read the full episode description	May 29, 2025
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning Read the full episode description	May 29, 2025
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL Read the full episode description	May 29, 2025
Value-Guided Search for Efficient Chain-of-Thought Reasoning Read the full episode description	May 29, 2025
Shallow Preference Signals: Large Language model aligns even better without truncated data? Read the full episode description	May 29, 2025
Gaming Tool Preferences in Agentic LLMs Read the full episode description	May 29, 2025
Partner Modelling Emerges in Recurrent Agents (But Only When It Matters) Read the full episode description	May 29, 2025
LLM Populations Form Social Conventions and Collective Bias Read the full episode description	May 29, 2025
LLM Generated Persona is a Promise with a Catch Read the full episode description	May 29, 2025
Large Language Models for Digital Twin Simulation Read the full episode description	May 29, 2025
From RL Distillation to Autonomous LLM Agents Read the full episode description	May 29, 2025
Prompting, Auto-Prompting, and Human-AI Communication Read the full episode description	May 29, 2025
Textual Gradients for LLM Optimization Read the full episode description	May 29, 2025
Large Language Models as Markov Chains Read the full episode description	May 28, 2025
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation Read the full episode description	May 28, 2025
Selective induction heads: how transformers select causal structures in context Read the full episode description	May 28, 2025
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains Read the full episode description	May 28, 2025
How Transformers Learn Causal Structure with Gradient Descent Read the full episode description	May 28, 2025
Planning anything with rigor: general-purpose zero-shot planning with llm-based formalized programming Read the full episode description	May 28, 2025
Automated Design of Agentic Systems Read the full episode description	May 28, 2025
What’s the Magic Word? A Control Theory of LLM Prompting Read the full episode description	May 28, 2025
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling Read the full episode description	May 27, 2025
RL with KL penalties is better viewed as Bayesian inference Read the full episode description	May 27, 2025
Asymptotics of Language Model Alignment Read the full episode description	May 27, 2025
Qwen 2.5, RL, and Random Rewards Read the full episode description	May 27, 2025
Theoretical guarantees on the best-of-n alignment policy Read the full episode description	May 27, 2025
Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models Read the full episode description	May 27, 2025
Improved Techniques for Training Score-Based Generative Models Read the full episode description	May 27, 2025
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator Read the full episode description	May 27, 2025
AlphaEvolve: A coding agent for scientific and algorithmic discovery Read the full episode description	May 27, 2025
Harnessing the Universal Geometry of Embeddings Read the full episode description	May 27, 2025
Goal Inference using Reward-Producing Programs in a Novel Physics Environment Read the full episode description	May 27, 2025
Trial-Error-Explain In-Context Learning for Personalized Text Generation Read the full episode description	May 27, 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Read the full episode description	May 27, 2025
Test-Time Reinforcement Learning (TTRL) Read the full episode description	May 27, 2025
Interpreting Emergent Planning in Model-Free Reinforcement Learning Read the full episode description	May 26, 2025
Agentic Reward Modeling_Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Read the full episode description	May 26, 2025
Beyond Reward Hacking: Causal Rewards for Large LanguageModel Alignment Read the full episode description	May 26, 2025
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation Read the full episode description	May 26, 2025
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval Read the full episode description	May 26, 2025
UFT: Unifying Supervised and Reinforcement Fine-Tuning Read the full episode description	May 26, 2025
Understanding High-Dimensional Bayesian Optimization Read the full episode description	May 26, 2025
Inference time alignment in continuous space Read the full episode description	May 25, 2025
Efficient Test-Time Scaling via Self-Calibration Read the full episode description	May 25, 2025
Conformal Prediction via Bayesian Quadrature Read the full episode description	May 25, 2025
Predicting from Strings: Language Model Embeddings for Bayesian Optimization Read the full episode description	May 25, 2025
Self-Evolving Curriculum for LLM Reasoning Read the full episode description	May 25, 2025
Online Decision-Focused Learning in Dynamic Environments Read the full episode description	May 25, 2025
FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain Read the full episode description	May 25, 2025
Reward Shaping from Confounded Offline Data Read the full episode description	May 25, 2025
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning Read the full episode description	May 25, 2025
Understanding Best-of-N Language Model Alignment Read the full episode description	May 25, 2025
Maximizing Acquisition Functions for Bayesian Optimization - and its relation to Gradient Descent Read the full episode description	May 24, 2025
Bayesian Prompt Ensembles: Model Uncertainty Estimation for Black-Box Large Language Models Read the full episode description	May 24, 2025
Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation Read the full episode description	May 24, 2025
The Parallel Knowledge Gradient Method for Batch Bayesian Optimization Read the full episode description	May 24, 2025
FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch Read the full episode description	May 24, 2025
Automated Social Science: A Structural Causal Model-Based Approach Read the full episode description	May 24, 2025
Causal Interpretation of Transformer Self-Attention Read the full episode description	May 24, 2025
A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment Read the full episode description	May 24, 2025
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs Read the full episode description	May 24, 2025
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation Read the full episode description	May 24, 2025
Prompts from Reinforcement Learning (PRL) Read the full episode description	May 24, 2025
Logits are All We Need to Adapt Closed Models Read the full episode description	May 24, 2025
Large Language Models Are (Bayesian) Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning Read the full episode description	May 23, 2025
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model Read the full episode description	May 23, 2025
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models Read the full episode description	May 23, 2025
LLM In-Context Learning as Kernel Regression Read the full episode description	May 23, 2025
Personalizing LLMs via Decode-Time Human Preference Optimization Read the full episode description	May 23, 2025
Almost Surely Safe LLM Inference-Time Alignment Read the full episode description	May 23, 2025
Survey of In-Context Learning Interpretation and Analysis Read the full episode description	May 23, 2025
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models Read the full episode description	May 23, 2025
LLM In-Context Learning as Kernel Regression Read the full episode description	May 23, 2025
Where does In-context Learning Happen in Large Language Models? Read the full episode description	May 23, 2025
Auto-Differentiating Any LLM Workflow: A Farewell to Manual Prompting Read the full episode description	May 22, 2025
metaTextGrad: Learning to learn with language models as optimizers Read the full episode description	May 22, 2025
Semantic Operators: A Declarative Model for Rich, AI-based Data Processing Read the full episode description	May 22, 2025
Isolated Causal Effects of Language Read the full episode description	May 22, 2025
Sleep-time Compute: Beyond Inference Scaling at Test-time Read the full episode description	May 22, 2025
J1: Incentivizing Thinking in LLM-as-a-Judge Read the full episode description	May 22, 2025
ShiQ: Bringing back Bellman to LLMs Read the full episode description	May 22, 2025
Policy Learning with a Natural Language Action Space: A Causal Approach Read the full episode description	May 22, 2025
Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models Read the full episode description	May 22, 2025
End-to-End Learning for Stochastic Optimization: A Bayesian Perspective Read the full episode description	May 21, 2025
TEXTGRAD: Automatic Differentiation via Text Read the full episode description	May 21, 2025
Steering off Course: Reliability Challenges in Steering Language Models Read the full episode description	May 20, 2025
Past-Token Prediction for Long-Context Robot Policies Read the full episode description	May 20, 2025
Recovering Coherent Event Probabilities from LLM Embeddings Read the full episode description	May 20, 2025
Systematic Meta-Abilities Alignment in Large Reasoning Models Read the full episode description	May 20, 2025
Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers Read the full episode description	May 20, 2025
Efficient Exploration for LLMs Read the full episode description	May 19, 2025
Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation Read the full episode description	May 18, 2025
Bayesian Concept Bottlenecks with LLM Priors Read the full episode description	May 17, 2025
Transformers for In-Context Reinforcement Learning Read the full episode description	May 17, 2025
Evaluating Large Language Models Across the Lifecycle Read the full episode description	May 17, 2025
Active Ranking from Human Feedback with DopeWolfe Read the full episode description	May 16, 2025
Optimal Designs for Preference Elicitation Read the full episode description	May 16, 2025
Dual Active Learning for Reinforcement Learning from Human Feedback Read the full episode description	May 16, 2025
Active Learning for Direct Preference Optimization Read the full episode description	May 16, 2025
Active Preference Optimization for RLHF Read the full episode description	May 16, 2025
Test-Time Alignment of Diffusion Models without reward over-optimization Read the full episode description	May 16, 2025
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Read the full episode description	May 16, 2025
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment Read the full episode description	May 16, 2025
Advantage-Weighted Regression: Simple and Scalable Off-Policy RL Read the full episode description	May 16, 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective Read the full episode description	May 16, 2025
Transformers can be used for in-context linear regression in the presence of endogeneity Read the full episode description	May 15, 2025
Bayesian Concept Bottlenecks with LLM Priors Read the full episode description	May 15, 2025
In-Context Parametric Inference: Point or Distribution Estimators? Read the full episode description	May 15, 2025
Enough Coin Flips Can Make LLMs Act Bayesian Read the full episode description	May 15, 2025
Bayesian Scaling Laws for In-Context Learning Read the full episode description	May 15, 2025
Posterior Mean Matching Generative Modeling Read the full episode description	May 15, 2025
Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective Read the full episode description	May 15, 2025
Dynamic Search for Inference-Time Alignment in Diffusion Models Read the full episode description	May 15, 2025
Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective Read the full episode description	May 12, 2025
Leaked Claude Sonnet 3.7 System Instruction tuning Read the full episode description	May 12, 2025
Converging Predictions with Shared Information Read the full episode description	May 11, 2025
Test-Time Alignment Via Hypothesis Reweighting Read the full episode description	May 11, 2025
Rethinking Diverse Human Preference Learning through Principal Component Analysis Read the full episode description	May 11, 2025
Active Statistical Inference Read the full episode description	May 10, 2025
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework Read the full episode description	May 10, 2025
AI-Powered Bayesian Inference Read the full episode description	May 10, 2025
Can Unconfident LLM Annotations Be Used for Confident Conclusions? Read the full episode description	May 09, 2025
Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI Read the full episode description	May 09, 2025
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control Read the full episode description	May 09, 2025
How to Evaluate Reward Models for RLHF Read the full episode description	May 09, 2025
LLMs as Judges: Survey of Evaluation Methods Read the full episode description	May 09, 2025
The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs Read the full episode description	May 09, 2025
Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data Read the full episode description	May 09, 2025
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation Read the full episode description	May 09, 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback Read the full episode description	May 09, 2025
Prediction-Powered Statistical Inference Framework Read the full episode description	May 09, 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Read the full episode description	May 09, 2025
RM-R1: Reward Modeling as Reasoning Read the full episode description	May 09, 2025
Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy Read the full episode description	May 08, 2025
Decoding Claude Code: Terminal Agent for Developers Read the full episode description	May 07, 2025
Emergent Strategic AI Equilibrium from Pre-trained Reasoning Read the full episode description	May 07, 2025
Benefiting from Proprietary Data with Siloed Training Read the full episode description	May 06, 2025
Advantage Alignment Algorithms Read the full episode description	May 06, 2025
Asymptotic Safety Guarantees Based On Scalable Oversight Read the full episode description	May 06, 2025
What Makes a Reward Model a Good Teacher? An Optimization Perspective Read the full episode description	May 06, 2025
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems Read the full episode description	May 06, 2025
Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts Read the full episode description	May 06, 2025
You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation Read the full episode description	May 06, 2025
Interplay of LLMs in Information Retrieval Evaluation Read the full episode description	May 03, 2025
Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence Read the full episode description	May 03, 2025
Toward Efficient Exploration by Large Language Model Agents Read the full episode description	May 03, 2025
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT Read the full episode description	May 02, 2025
Self-Consuming Generative Models with Curated Data Read the full episode description	May 02, 2025
Bootstrapping Language Models with DPO Implicit Rewards Read the full episode description	May 02, 2025
DeepSeek-Prover-V2: Advancing Formal Reasoning Read the full episode description	May 01, 2025
THINKPRM: Data-Efficient Process Reward Models Read the full episode description	May 01, 2025
Societal Frameworks and LLM Alignment Read the full episode description	Apr 29, 2025
Risks from Multi-Agent Advanced AI Read the full episode description	Apr 29, 2025
Causality-Aware Alignment for Large Language Model Debiasing Read the full episode description	Apr 29, 2025
Reward Models Evaluate Consistency, Not Causality Read the full episode description	Apr 28, 2025
Causal Rewards for Large Language Model Alignment Read the full episode description	Apr 28, 2025
Sycophancy to subterfuge: Investigating reward-tampering in large language models Read the full episode description	Apr 28, 2025
Bidirectional AI Alignment Read the full episode description	Apr 28, 2025
Why Do Multi-Agent LLM Systems Fail? Read the full episode description	Apr 27, 2025
LLMs as Greedy Agents: RL Fine-tuning for Decision-Making Read the full episode description	Apr 27, 2025
LLM Feedback Loops and the Lock-in Hypothesis Read the full episode description	Apr 27, 2025
Representational Alignment Drives Effective Teaching and Learning Read the full episode description	Apr 27, 2025
Adaptive Parallel Reasoning with Language Models Read the full episode description	Apr 27, 2025
AI: Rewiring the Flow of Ideas and Human Knowledge Read the full episode description	Apr 27, 2025
Learning and Equilibrium with Ranking Feedback Read the full episode description	Apr 27, 2025
Designing Human-AI Collaboration: A Sufficient-Statistic Approach Read the full episode description	Apr 27, 2025
GOAT: Generative Adversarial Training for Human-AI Coordination Read the full episode description	Apr 27, 2025
π0.5: Generalization in Robotic Manipulation via Diverse Data Read the full episode description	Apr 27, 2025
NoWag: Unified Compression for Large Language Models Read the full episode description	Apr 26, 2025
Optimal Tool Calls in Language Model Reasoning Read the full episode description	Apr 26, 2025
Data Selection for Empirical Risk Minimization Read the full episode description	Apr 26, 2025
LoRe: Low-Rank Reward Modeling for Personalized LLMs Read the full episode description	Apr 26, 2025
ParaPO: Reducing Language Model Verbatim Reproduction Read the full episode description	Apr 26, 2025
Test-Time RL: Self-Evolving LLMs via Majority Voting Rewards Read the full episode description	Apr 25, 2025
Tina: Tiny LoRA Reasoning Models Read the full episode description	Apr 25, 2025
Evaluating large language models in theory of mind tasks Read the full episode description	Apr 25, 2025
QUEST: Quality Sampling for Machine Translation Read the full episode description	Apr 24, 2025
Offline Preference Learning via Simulated Trajectory Feedback Read the full episode description	Apr 24, 2025
Reasoning Elicitation in Language Models via Counterfactual Feedback Read the full episode description	Apr 24, 2025
Eliciting Human Preferences with Language Models Read the full episode description	Apr 24, 2025
Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning Read the full episode description	Apr 24, 2025
γ-Bench: Evaluating LLMs in Multi-Agent Games Read the full episode description	Apr 24, 2025
DRAFT: Self-Driven LLM Tool Mastery via Documentation Refinement Read the full episode description	Apr 24, 2025
Optimal Prediction Sets for Enhanced Human-AI Accuracy Read the full episode description	Apr 24, 2025
Self-Correction via Reinforcement Learning for Language Models Read the full episode description	Apr 24, 2025
Tractable Multi-Agent Reinforcement Learning through Behavioral Economics Read the full episode description	Apr 24, 2025
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement Read the full episode description	Apr 24, 2025
Iterative Nash Policy Optimization for Language Model Alignment Read the full episode description	Apr 24, 2025
SycEval: Benchmarking LLM Sycophancy in Mathematics and Medicine Read the full episode description	Apr 23, 2025
Stack AI: Democratizing Enterprise AI Development Read the full episode description	Apr 22, 2025
Evaluating Modern Recommender Systems: Challenges and Future Directions Read the full episode description	Apr 22, 2025
AI in the Enterprise: Seven Lessons from Frontier Companies by OpenAI Read the full episode description	Apr 22, 2025
Discussion: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Read the full episode description	Apr 21, 2025
AI Agent Protocols and Human Preference Read the full episode description	Apr 21, 2025
Cross-Environment Cooperation for Zero-Shot Multi-Agent Coordination Read the full episode description	Apr 20, 2025
Sutton and Silver: The Era of Experience: Learning Beyond Human Data Read the full episode description	Apr 19, 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models Read the full episode description	Apr 19, 2025
AI Agents: Echoes of Past Technology Pivots? Read the full episode description	Apr 19, 2025
Minimalist LLM Reasoning: Rejection Sampling to Reinforcement Read the full episode description	Apr 19, 2025
Securing the Model Context Protocol in Enterprise Environments Read the full episode description	Apr 19, 2025
Improving Multi-Turn Tool Use with Reinforcement Learning Read the full episode description	Apr 19, 2025
Cultural Knowledge Conservation and Control in Large Language Models Read the full episode description	Apr 19, 2025
Data Quality, Repetition, and Scaling of Language Models Read the full episode description	Apr 18, 2025
Compute-Optimal Scaling Laws for Language Models Revisited Read the full episode description	Apr 18, 2025
Concise Reasoning via Reinforcement Learning Read the full episode description	Apr 18, 2025
Throughput Limits for LLM Inference and AI Agent Scheduling Read the full episode description	Apr 14, 2025
RL Post-training Amplifies Pretraining Behaviors in Language Models Read the full episode description	Apr 14, 2025
Fast Adaptation of Behavioral Foundation Models Read the full episode description	Apr 14, 2025
Proprietary Reward Models: Sustaining Advantage in Agentic AI Read the full episode description	Apr 13, 2025
Why Multi-Agent LLM Systems Fail: A Comprehensive Study Read the full episode description	Apr 12, 2025
Play2Prompt: Zero-Shot Tool Instruction Optimization via Tool Play Read the full episode description	Apr 12, 2025
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Read the full episode description	Apr 12, 2025
API and GUI Agents: Divergence, Convergence, and Hybrid Approaches Read the full episode description	Apr 12, 2025
AI, Chess, and Competitive Advantage: Substitution and Complementation Read the full episode description	Apr 12, 2025
Knowledge of the Firm and Replication of Technology Read the full episode description	Apr 12, 2025
Firm Resources and Sustained Competitive Advantage Read the full episode description	Apr 12, 2025
Evaluating Pharmaceutical Marketing to Physicians with Panel Data Read the full episode description	Apr 12, 2025
Theory of the firm in the era of Agents Read the full episode description	Apr 12, 2025
Large Language Models: An Applied Econometric Framework Read the full episode description	Apr 12, 2025
Evaluating the World Model Implicit in a Generative Model Read the full episode description	Apr 12, 2025
Machine Learning for Hypothesis Generation in Social Science Read the full episode description	Apr 11, 2025
Active Learning for Moral Preference Elicitation: Challenges and Nuances Read the full episode description	Apr 11, 2025
Gradient-Based Surveys for Nonparametric Discrete Choice Experiments Read the full episode description	Apr 11, 2025
Explainable Data-driven Share-of-choice Product Line Design Optimization Read the full episode description	Apr 11, 2025
The More You Ask, the Less You Get: When Additional Questions Hurt External Validity Read the full episode description	Apr 11, 2025
Conjoint topics from Handbook of Marketing Analytics: Methods and Applications Read the full episode description	Apr 11, 2025
Choice-Based Conjoint Analysis: Methods and Applications Read the full episode description	Apr 11, 2025
Beyond Conjoint Analysis: The Future of Preference Measurement Read the full episode description	Apr 11, 2025
An Optimization Framework for Adaptive Questionnaire Design Read the full episode description	Apr 11, 2025
Adaptive Self-Explication of Multiattribute Preferences Read the full episode description	Apr 11, 2025
Conjoint Analysis: Methods, Applications, and Recent Developments Read the full episode description	Apr 11, 2025
Current Issues and a “Wish List” for Conjoint Analysis Read the full episode description	Apr 11, 2025
Ellipsoidal Methods for Adaptive Choice-Based Conjoint Analysis Read the full episode description	Apr 11, 2025
Adaptive Polyhedral Methods for Conjoint Analysis Read the full episode description	Apr 11, 2025
MSL: Enhancing LLM Recommenders via Masked Softmax Loss Read the full episode description	Apr 11, 2025
Self-Supervised Deep Reinforcement Learning for Optimal Question Ranking Read the full episode description	Apr 11, 2025
Adaptive Language Elicitation for Latent Information Discovery Read the full episode description	Apr 10, 2025
LLM Persona Bias: Promise and Peril in Simulation Read the full episode description	Apr 10, 2025
AutoTools: Automating Tool Use for Large Language Models Read the full episode description	Apr 10, 2025
Tool Learning with Large Language Models: A Comprehensive Survey Read the full episode description	Apr 10, 2025
All Roads Lead to Likelihood: RL for Fine-Tuning Value Read the full episode description	Apr 08, 2025
ATLAS: Tuning Agents via Critical Step Learning Read the full episode description	Apr 08, 2025
Thinking Faster by Writing Less: Chain of Draft Reasoning Read the full episode description	Apr 08, 2025
Meta Plan Optimization for Boosting LLM Agents Read the full episode description	Apr 08, 2025
L1: Length Controlled Reasoning with Reinforcement Learning Read the full episode description	Apr 08, 2025
WikiBigEdit: Benchmarking Lifelong Knowledge Editing in LLMs Read the full episode description	Apr 08, 2025
PLAN-AND-ACT: LLM Agent Planning with Synthetic Data Read the full episode description	Apr 08, 2025
SEARCH-R1: LLMs Learn to Reason and Search via Reinforcement Learning Read the full episode description	Apr 08, 2025
The Theory of the Firm: Information, Incentives, and Organization Read the full episode description	Apr 08, 2025
Four Formalizable Theories of the Firm Read the full episode description	Apr 08, 2025
Efficient Tool Use with Chain-of-Abstraction Reasoning Read the full episode description	Apr 06, 2025
CodeTool: Process Supervision for Enhanced LLM Tool Invocation Read the full episode description	Apr 06, 2025
Evaluating LLM Agents in Multi-Turn Conversations: A Survey Read the full episode description	Apr 06, 2025
Epistemic Alignment in User-LLM Knowledge Delivery Read the full episode description	Apr 06, 2025
MCP is (not) all you need Read the full episode description	Apr 06, 2025
AI, Human Skills, and Competitive Advantage in Chess Read the full episode description	Apr 05, 2025
Inference-Time Scaling for Generalist Reward Modeling Read the full episode description	Apr 04, 2025
Optimal Pure Exploration in Linear Bandits via Sampling Read the full episode description	Apr 04, 2025
Presidential Address: The Economist as Designer in the Innovation Process for Socially Impactful Digital Products Read the full episode description	Apr 04, 2025
Emergent Symbolic Mechanisms for Reasoning in Large Language Models Read the full episode description	Apr 03, 2025
Inference-Time Alignment: Coverage, Scaling, and Optimality Read the full episode description	Apr 03, 2025
Sharpe Ratio-Guided Active Learning for Preference Optimization Read the full episode description	Apr 03, 2025
Active Learning for Adaptive In-Context Prompt Design Read the full episode description	Apr 03, 2025
Visual Chain-of-Thought Reasoning for Vision-Language-Action Models Read the full episode description	Apr 03, 2025
On the Biology of a Large Language Model Read the full episode description	Apr 01, 2025
Async-TB: Asynchronous Trajectory Balance for Scalable LLM RL Read the full episode description	Apr 01, 2025
Instacart's Economics Team: A Hybrid Role in Tech Read the full episode description	Mar 31, 2025
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework Read the full episode description	Mar 31, 2025
Why MCP won Read the full episode description	Mar 31, 2025
SWEET-RL: Training LLM Agents for Collaborative Reasoning Read the full episode description	Mar 31, 2025
TheoryCoder: Bilevel Planning with Synthesized World Models Read the full episode description	Mar 30, 2025
Driving Forces in AI: Scaling to 2025 and Beyond (Jason Wei, OpenAI) Read the full episode description	Mar 29, 2025
Expert Demonstrations for Sequential Decision Making under Heterogeneity Read the full episode description	Mar 28, 2025
TextGrad: Backpropagating Language Model Feedback for Generative AI Optimization Read the full episode description	Mar 27, 2025
MemReasoner: Generalizing Language Models on Reasoning-in-a-Haystack Tasks Read the full episode description	Mar 27, 2025
RAFT: In-Domain Retrieval-Augmented Fine-Tuning for Language Models Read the full episode description	Mar 27, 2025
Inductive Biases for Exchangeable Sequence Modeling Read the full episode description	Mar 26, 2025
InverseRLignment: LLM Alignment via Inverse Reinforcement Learning Read the full episode description	Mar 26, 2025
Prompt-OIRL: Offline Inverse RL for Query-Dependent Prompting Read the full episode description	Mar 26, 2025
Alignment from Demonstrations for Large Language Models Read the full episode description	Mar 25, 2025
Q♯: Distributional RL for Optimal LLM Post-Training Read the full episode description	Mar 18, 2025
Scaling Test-Time Compute Without Verification or RL is Suboptimal Read the full episode description	Mar 14, 2025
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Read the full episode description	Mar 14, 2025
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Read the full episode description	Mar 14, 2025
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Read the full episode description	Mar 14, 2025
Revisiting Superficial Alignment Hypothesis Read the full episode description	Mar 14, 2025
Diagnostic uncertainty: teaching language Models to describe open-ended uncertainty Read the full episode description	Mar 14, 2025
Language Model Personalization via Reward Factorization Read the full episode description	Mar 14, 2025
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach Read the full episode description	Mar 14, 2025
Can Large Language Models Extract Customer Needs as well as Professional Analysts? Read the full episode description	Mar 13, 2025
Spurlens: finding spurious correlations in Multimodal llms Read the full episode description	Mar 13, 2025
Improving test-time search with backtrack- Ing Improving test-time search with backtrack- Ing against in-context value verifiersagainst in-context value verifiers Read the full episode description	Mar 13, 2025
Adaptive elicitation of latent information Using natural language Read the full episode description	Mar 13, 2025
Document Valuation in LLM Summaries: A Cluster Shapley Approach Read the full episode description	Mar 13, 2025
s1: simple test time scaling Read the full episode description	Mar 13, 2025