AI Post Transformers Podcast Republic

AI Post Transformers

By mcgrof

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by mcgrof

Open Website

Rate for this podcast

Subscribers: 0
Reviews: 0
Episodes: 656

Description

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

Episode	Date
Titans: Learning to Memorize at Test Time Read the full episode description	May 20, 2026
The Sparsity Wall: What Reiner Pope Told Dwarkesh About MoE and Sparse Attention Read the full episode description	May 19, 2026
Serving MoE Models with Disaggregated Expert Parallelism Read the full episode description	May 19, 2026
Affordable Large-Scale Decoding Through Model-System Co-Design Read the full episode description	May 19, 2026
NanoFlow and the Future of LLM Serving Read the full episode description	May 19, 2026
Air Force One, Jensen Huang, and Anthropic's 2028 Memo Read the full episode description	May 15, 2026
Deep Kernel Fusion for Transformer Decoding Read the full episode description	May 15, 2026
FlashFuser and Hopper-Era FFN Kernel Fusion Read the full episode description	May 15, 2026
Lossless Sparse Deltas for RL Networks Read the full episode description	May 15, 2026
JANUS for Scalable MoE Inference Read the full episode description	May 15, 2026
Scaling Laws for Multilingual Code Pretraining Read the full episode description	May 15, 2026
Causal-JEPA for Object-Level World Models Read the full episode description	May 15, 2026
Ministral 3: Cascade Distillation for Long-Context Multimodal Models Read the full episode description	May 15, 2026
When Many-Shot CoT Becomes Test-Time Learning Read the full episode description	May 15, 2026
Agentic AI as a Path to AGI Read the full episode description	May 15, 2026
Trace Rewriting Against Unauthorized LLM Distillation Read the full episode description	May 15, 2026
TMAS: Scaling Test-Time Compute with Multi-Agent Synergy Read the full episode description	May 14, 2026
AI Co-Mathematician for Mathematical Research Read the full episode description	May 14, 2026
δ-mem and Online Memory for LLMs Read the full episode description	May 13, 2026
Qwen-Image-2.0 for Unified Generation and Editing Read the full episode description	May 13, 2026
MiA-Signature and Global Activation for Long Context Read the full episode description	May 13, 2026
Long Context Pre-Training with Lighthouse Attention Read the full episode description	May 13, 2026
MELT: Decoupling Compute From Memory Read the full episode description	May 13, 2026
ELF and Continuous Language Diffusion Read the full episode description	May 12, 2026
ForkKV for Multi-LoRA Agent Serving Read the full episode description	May 12, 2026
Metacognition Against Confident Hallucinations Read the full episode description	May 12, 2026
TIDE and the Rare Token Problem Read the full episode description	May 12, 2026
Agentic Discovery for Test-Time Scaling Read the full episode description	May 12, 2026
Vistara Brings CXL Memory to Hyperscale Read the full episode description	May 11, 2026
Split Personality Training Reveals Latent Knowledge Read the full episode description	May 10, 2026
Why Transformers Fail at Counting Read the full episode description	May 10, 2026
Optimization, Credit Assignment, and Consciousness Read the full episode description	May 10, 2026
Reasoning Theater and Unfaithful Chain-of-Thought Read the full episode description	May 10, 2026
RAPTOR: Stable Concept Directions From Logistic Probes Read the full episode description	May 10, 2026
When LLM Judges Become Coin Flips Read the full episode description	May 09, 2026
LAPS for Length-Aware LLM Serving Read the full episode description	May 09, 2026
Generative Modeling via Drifting in One Step Read the full episode description	May 09, 2026
EverMemOS for Long-Horizon Agent Memory Read the full episode description	May 09, 2026
Explicit Information Transmission for Context Compression Read the full episode description	May 09, 2026
Why LLM Serving Needs Mathematical Optimization Read the full episode description	May 09, 2026
Can Models Learn from Long Context? Read the full episode description	May 08, 2026
How Models Detect Hidden Activation Steering Read the full episode description	May 08, 2026
What the Frog’s Eye Tells the Brain Read the full episode description	May 08, 2026
Learning in Random Nets and Generalization Read the full episode description	May 08, 2026
Backpropagation Through Time Explained Read the full episode description	May 08, 2026
SGLang for Faster Structured LLM Programs Read the full episode description	May 08, 2026
TensorFlow for Distributed Machine Learning Systems Read the full episode description	May 08, 2026
Machine Learning Self-Calibrated FPGA Time-to-Digital Converter Read the full episode description	May 08, 2026
Why LightGBM Made Boosted Trees Fast Read the full episode description	May 07, 2026
Fast FPGA BDT Inference for LHC Triggers Read the full episode description	May 07, 2026
Fast FPGA Inference for LHC Triggers Read the full episode description	May 07, 2026
Boosted Decision Trees for CMS Muon Triggers Read the full episode description	May 07, 2026
Automating DNN Compilation for FPGA Accelerators Read the full episode description	May 07, 2026
Synchronous Data Flow for Signal Processing Read the full episode description	May 07, 2026
Caffeine: A Unified FPGA for CNNs Read the full episode description	May 07, 2026
Caffe and the Rise of CNN Frameworks Read the full episode description	May 07, 2026
RFNoC SISO Processor via High-Level Synthesis Read the full episode description	May 06, 2026
Automating CNN Mapping on Embedded FPGAs Read the full episode description	May 06, 2026
Training Million-Token LLMs Beyond the Memory Barrier Read the full episode description	May 04, 2026
Training LLMs for Divide-and-Conquer Reasoning Read the full episode description	May 04, 2026
Reinforcement Learning in 2025: An Overview Read the full episode description	May 04, 2026
PackKV Lossy Compression for KV Caches Read the full episode description	May 04, 2026
Geometric Memory in Deep Sequence Models Read the full episode description	May 03, 2026
Selective Classification with Deep Neural Networks Read the full episode description	May 03, 2026
Self-Improving Pretraining With Post-Trained Models Read the full episode description	May 03, 2026
How Induction Heads Emerge in Transformers Read the full episode description	May 03, 2026
node2vec and Learning Graph Embeddings Read the full episode description	May 03, 2026
DeepWalk and the Rise of Graph Embeddings Read the full episode description	May 03, 2026
Fast Speech Recognition by Transcript Editing Read the full episode description	May 02, 2026
Discrete Representations for Continual Reinforcement Learning Read the full episode description	May 02, 2026
Recursive Multi-Agent Systems in Latent Space Read the full episode description	May 02, 2026
A Practical Review of Mechanistic Interpretability Read the full episode description	May 02, 2026
DeltaKV: Compressing KV Caches for Long Context Read the full episode description	May 02, 2026
CacheFlow and 3D-Parallel KV Cache Restoration Read the full episode description	May 02, 2026
Do Language Models Know Their Limits Read the full episode description	May 02, 2026
Hopfield Networks and Transformer Attention as Memory Read the full episode description	May 01, 2026
Sequence to Sequence Learning with Neural Networks Read the full episode description	May 01, 2026
Bahdanau Attention for Neural Machine Translation Read the full episode description	May 01, 2026
Convolutional Sequence to Sequence Learning for Translation Read the full episode description	May 01, 2026
Teaching Language Models to Verbalize Uncertainty Read the full episode description	May 01, 2026
Can LLMs Judge Their Own Capabilities? Read the full episode description	May 01, 2026
ChartNet for Robust Multimodal Chart Understanding Read the full episode description	May 01, 2026
Scaling Test-Time Compute for Reasoning Models Read the full episode description	Apr 30, 2026
Breaking the Prefix Barrier with Shared KV Cache Read the full episode description	Apr 30, 2026
Reverse-Mode Differentiation Across AD and Neural Nets Read the full episode description	Apr 30, 2026
Stabilizing Efficient Reasoning with Step-Level Advantage Selection Read the full episode description	Apr 29, 2026
Stochastic KV Routing for Cache Sharing Read the full episode description	Apr 29, 2026
Deep Learning in Spiking Neural Networks Read the full episode description	Apr 28, 2026
FPGA Neural Network Accelerators for Space Read the full episode description	Apr 28, 2026
World-R1 Improves 3D Consistency in Text-to-Video Read the full episode description	Apr 28, 2026
Long Short-Term Memory and Vanishing Gradients Read the full episode description	Apr 27, 2026
Compressed Convolutional Attention in Latent Space Read the full episode description	Apr 27, 2026
AgenticQwen and Small Industrial Tool Agents Read the full episode description	Apr 27, 2026
Experience-Based Learning Beyond Human Data Read the full episode description	Apr 26, 2026
Breiman's Two Cultures of Statistical Modeling Read the full episode description	Apr 26, 2026
Muon Is Scalable for LLM Training Read the full episode description	Apr 25, 2026
Kimi K2.5 and Visual Agent Swarms Read the full episode description	Apr 25, 2026
ScoutAttention for Efficient KV Cache Offloading Read the full episode description	Apr 25, 2026
DeepSeek-V4 and Practical Million-Token Context Read the full episode description	Apr 25, 2026
Agentic Aggregation for Long-Horizon AI Tasks Read the full episode description	Apr 23, 2026
TokenDance for Multi-Agent KV Cache Sharing Read the full episode description	Apr 23, 2026
RetrievalAttention for Long-Context LLM Inference Read the full episode description	Apr 22, 2026
DreamerV3 World Models Across 150 Tasks Read the full episode description	Apr 22, 2026
Program Synthesis with Large Language Models Read the full episode description	Apr 22, 2026
Test-time Scaling for Multi-Agent Collaborative Reasoning Read the full episode description	Apr 22, 2026
TUMIX Multi-Agent Test-Time Scaling with Tools Read the full episode description	Apr 22, 2026
Benchmarking Test-Time Scaling for General LLM Agents Read the full episode description	Apr 22, 2026
Distilling Multi-Agent Reasoning into a Single LLM Read the full episode description	Apr 22, 2026
Efficient KV Cache Sharing for Multi-LoRA Agents Read the full episode description	Apr 22, 2026
CacheBlend for Fast RAG Serving Read the full episode description	Apr 22, 2026
Directly Trained Spiking DQNs for Atari Read the full episode description	Apr 22, 2026
Qwen3.5-Omni Thinker-Talker for Omnimodal Streaming Read the full episode description	Apr 21, 2026
The Abstraction Fallacy and AI Consciousness Read the full episode description	Apr 20, 2026
ContiguousKV for Faster LLM Prefill KV Reuse Read the full episode description	Apr 20, 2026
Gated Delta Networks for Long-Context Retrieval Read the full episode description	Apr 20, 2026
Prefill-as-a-Service for Cross-Datacenter KV Cache Read the full episode description	Apr 20, 2026
Nemotron 3 Super Hybrid Mamba-Transformer MoE Read the full episode description	Apr 20, 2026
Parallelizing DeltaNet Linear Transformers over Sequence Length Read the full episode description	Apr 19, 2026
Gated Linear Attention for Efficient Long Sequences Read the full episode description	Apr 19, 2026
Defining AGI as Adaptive Artificial Scientist Read the full episode description	Apr 19, 2026
Mamba-3 for Efficient Sequence Modeling Read the full episode description	Apr 16, 2026
Linear Classifier Probes for Intermediate Layers Read the full episode description	Apr 16, 2026
KVSwap for Disk-Aware Long-Context On-Device Inference Read the full episode description	Apr 16, 2026
Neural Chameleons and Evading Activation Monitors Read the full episode description	Apr 15, 2026
SkillsBench for Evaluating Agent Skills Read the full episode description	Apr 15, 2026
Experimental Comparison of Agentic and Enhanced RAG Read the full episode description	Apr 14, 2026
Learning to Reason with 13 Parameters Read the full episode description	Apr 14, 2026
Memory Intelligence Agents for Deep Research Read the full episode description	Apr 13, 2026
GPU-Accelerated Dynamic Quantized ANNS Graph Search Read the full episode description	Apr 13, 2026
Latent Space as a New Computational Paradigm Read the full episode description	Apr 12, 2026
Learning Latent Action World Models from Video Read the full episode description	Apr 12, 2026
KV Cache TTL for Multi-Turn Agent Scheduling Read the full episode description	Apr 12, 2026
VL-JEPA for Vision-Language Semantic Prediction Read the full episode description	Apr 12, 2026
ClawBench for Real-World Online AI Agents Read the full episode description	Apr 12, 2026
FengHuang for Rack-Scale LLM Inference Memory Read the full episode description	Apr 12, 2026
ASI-Evolve for Data, Architectures, and RL Read the full episode description	Apr 12, 2026
In-Place Test-Time Training for Transformers Read the full episode description	Apr 11, 2026
Neural Computers as Learned Latent Runtimes Read the full episode description	Apr 11, 2026
DRAM-Free In-Flash Computing for LLM Inference Read the full episode description	Apr 10, 2026
Computation-Bandwidth-Memory Trade-offs for AI Infrastructure Read the full episode description	Apr 10, 2026
TriAttention for Efficient Long-Context KV Compression Read the full episode description	Apr 08, 2026
When Spectral Gradient Updates Help Deep Learning Read the full episode description	Apr 08, 2026
Memory Sparse Attention for 100M-Token Scaling Read the full episode description	Apr 08, 2026
Cache Mechanism for Agent RAG Systems Read the full episode description	Apr 08, 2026
Speculative Decoding in Real vLLM Serving Read the full episode description	Apr 07, 2026
Real Context Size and Context Rot Read the full episode description	Apr 07, 2026
MEMSEARCHER: Reinforcement Learning for LLM Memory Management Read the full episode description	Apr 04, 2026
Recursive Language Models for Arbitrarily Long Prompts Read the full episode description	Apr 04, 2026
Internal Safety Collapse in Frontier LLMs Read the full episode description	Apr 04, 2026
Kosmos AI Scientist for Autonomous Discovery Read the full episode description	Apr 04, 2026
QVCache for Semantic Caching in ANN Search Read the full episode description	Apr 04, 2026
FlatAttention for Tile-Based Accelerator Inference Read the full episode description	Apr 04, 2026
IMO-Bench for Robust Mathematical Reasoning Read the full episode description	Apr 04, 2026
Batch-Aware Expert Routing for Faster MoE Decoding Read the full episode description	Apr 03, 2026
CXL Computational Memory Offloading for Lower Runtime Read the full episode description	Apr 03, 2026
OOD Shifts Make LLM Representations Sparser Read the full episode description	Apr 02, 2026
Meta-Harness and the Power of LLM Plumbing Read the full episode description	Apr 02, 2026
Emergent Social Risks in Multi-Agent Systems Read the full episode description	Apr 02, 2026
AI Agent Traps and Prompt Injection Read the full episode description	Apr 02, 2026
Simple Self-Distillation for Better Code Generation Read the full episode description	Apr 02, 2026
MetaClaw: Just Talk and Continual Agent Adaptation Read the full episode description	Mar 31, 2026
MAML and the Basics of Meta-Learning Read the full episode description	Mar 30, 2026
Doc-to-LoRA: Internalizing Context as LoRA Read the full episode description	Mar 30, 2026
Agentic AI and the Next Intelligence Explosion Read the full episode description	Mar 29, 2026
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate Read the full episode description	Mar 28, 2026
Splitwise: Phase-Split LLM Inference Read the full episode description	Mar 28, 2026
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation Read the full episode description	Mar 27, 2026
HyperAgents and Metacognitive Self-Improvement Read the full episode description	Mar 27, 2026
LeWorldModel: Stable Joint-Embedding World Models from Pixels Read the full episode description	Mar 26, 2026
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation Read the full episode description	Mar 26, 2026
Language Models are Injective and Hence Invertible Read the full episode description	Mar 25, 2026
Jet-Nemotron and PostNAS for Faster Long Context Read the full episode description	Mar 25, 2026
Speculative Speculative Decoding Read the full episode description	Mar 25, 2026
Lookahead Q-Cache for Consistent KV Eviction Read the full episode description	Mar 25, 2026
New Site Features: Conferences Tracking and How to Support the Show Read the full episode description	Mar 20, 2026
Accelerating LLM Cold Starts with Programmable Page Cache Read the full episode description	Mar 19, 2026
SolidAttention: Co-Designing Sparse Attention and SSD I/O Read the full episode description	Mar 19, 2026
Generative File Systems: Replacing Code with Formal Specifications Read the full episode description	Mar 19, 2026
Xerxes: CXL 3.0 Simulation for Scalable Memory Systems Read the full episode description	Mar 19, 2026
Optimizing Mixture of Block Attention Through Statistical Theory Read the full episode description	Mar 19, 2026
Predicting Task Performance with Context-aware Scaling Laws Read the full episode description	Mar 17, 2026
Model-Aware Tokenizer Transfer for Multilingual LLMs Read the full episode description	Mar 17, 2026
xLLM: Co-Locating Online and Offline LLM Inference Read the full episode description	Mar 17, 2026
Memory in the Age of AI Agents: Forms, Functions, Dynamics Read the full episode description	Mar 17, 2026
Qwen3Guard: Streaming Three-Way Safety Classification for LLMs Read the full episode description	Mar 17, 2026
Bidaw: Computation-Storage Aware KV Caching for LLMs Read the full episode description	Mar 17, 2026
New Voices, Same Nerds: The Kokoro TTS Episode Read the full episode description	Mar 17, 2026
Rethinking Residual Connections: Learned Attention Across Transformer Depth Read the full episode description	Mar 17, 2026
CacheSlide: Position-Aware KV Cache Reuse for Agent LLMs Read the full episode description	Mar 17, 2026
The Art of Scaling Reinforcement Learning Compute for LLMs Read the full episode description	Mar 16, 2026
LLM Agents Reason About Code Without Running It Read the full episode description	Mar 15, 2026
Emergent Cooperation in Self-Interested Multi-Agent AI Read the full episode description	Mar 13, 2026
50x KV Cache Compression in Seconds via Attention Matching Read the full episode description	Mar 12, 2026
Gradient Descent at Inference Time for LLM Reasoning Read the full episode description	Mar 11, 2026
MalGEN: Multi-Agent AI for Red Teaming Malware Read the full episode description	Mar 08, 2026
Structured State Space Duality Unifies Transformers and SSMs Read the full episode description	Mar 07, 2026
DualPath Breaks Storage Bandwidth Bottleneck in Agentic Inference Read the full episode description	Mar 07, 2026
Why CARTRIDGE Works: Keys as Routers in KV Caches Read the full episode description	Mar 07, 2026
Systematic Characterization of LLM Inference on GPUs Read the full episode description	Mar 07, 2026
Tokenization Bias: The Hidden Flaw Breaking Language Models Read the full episode description	Mar 07, 2026
NVIDIA Nemotron 3 Hybrid SSM Transformer Architecture Read the full episode description	Mar 07, 2026
We're Open Source! New Home, Visualizations, and How to Shape Our Queue Read the full episode description	Mar 06, 2026
FlashAttention-4 Conquers Asymmetric GPU Hardware Scaling Read the full episode description	Mar 06, 2026
We're Open Source! New Home, Visualizations, and How to Shape Our Queue Read the full episode description	Mar 06, 2026
Parallel Token Prediction: From ProphetNet to Dependent Multi-Token Generation Read the full episode description	Mar 04, 2026
FlashOptim: Optimizers for Memory Efficient Training Read the full episode description	Mar 02, 2026
Cognizant - New Work, New World 2026 Read the full episode description	Mar 01, 2026
Regular Fourier Features for Nonstationary Gaussian Processes Read the full episode description	Mar 01, 2026
Unified Latents (UL): How to train your latents Read the full episode description	Feb 28, 2026
QuantSpec: Hierarchical KV Cache for Self-Speculative Decoding Read the full episode description	Feb 28, 2026
MatFormer: Nested Transformer for Elastic Inference Read the full episode description	Feb 28, 2026
MagicDec: Breaking Latency-Throughput Tradeoffs via KV-Compressed Speculative Decoding Read the full episode description	Feb 28, 2026
KV selection algorithms: static (SnapKV) Vs dynamic (PQCache) Read the full episode description	Feb 28, 2026
Fast Inference from Transformers via Speculative Decoding Read the full episode description	Feb 28, 2026
EAGLE: Evolution of Lossless Acceleration for LLM Inference Read the full episode description	Feb 28, 2026
CXL-SpecKV: Bridging the LLM Memory Wall with Speculative FPGA Disaggregation Read the full episode description	Feb 28, 2026
Building Production-Ready Speculative Decoding with TensorRT-LLM Read the full episode description	Feb 28, 2026
Apple's Speculative Streaming: Fast LLM Inference without Auxiliary Models Read the full episode description	Feb 28, 2026
Apple's Mirror Speculative Decoding: Parallel LLM Inference via Heterogeneous Accelerators Read the full episode description	Feb 28, 2026
Adaptive Control for Batched Speculative Decoding in LLM Serving Read the full episode description	Feb 28, 2026
Taming the Long-Tail: Efficient Reasoning RL with Adaptive Drafters Read the full episode description	Feb 26, 2026
Optimizing Verification and Efficiency in Multi-Draft Speculative Decoding Read the full episode description	Feb 26, 2026
MEDUSA: Parallel Decoding Heads for Accelerated LLM Inference Read the full episode description	Feb 26, 2026
Measuring LLM Reasoning Effort via Deep-Thinking Tokens Read the full episode description	Feb 26, 2026
Measuring AI Ability to Complete Long Tasks Read the full episode description	Feb 26, 2026
FastGRPO: Concurrency-Aware Speculative Decoding for Policy Optimization Read the full episode description	Feb 26, 2026
Evaluating Collective Behaviour of Hundreds of LLM Agents Read the full episode description	Feb 26, 2026
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding Read the full episode description	Feb 26, 2026
Deep Learning Frameworks for Robust Quadrupedal Locomotion Read the full episode description	Feb 26, 2026
Advancements in Efficient KV Cache Quantization and Management Read the full episode description	Feb 26, 2026
Accelerating Large Language Model Decoding with Speculative Sampling Read the full episode description	Feb 26, 2026
NeurIPS 2025: Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents Read the full episode description	Feb 25, 2026
FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional Read the full episode description	Feb 25, 2026
FAST26: CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving Read the full episode description	Feb 25, 2026
Cortex: Semantic Knowledge Caching for Low-Latency LLM Agents Read the full episode description	Feb 25, 2026
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Read the full episode description	Feb 25, 2026
1.3 Billion Agents by 2028: The $50 Billion Boom and the Hidden Enterprise Crisis Read the full episode description	Feb 25, 2026
Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts Read the full episode description	Feb 24, 2026
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? Read the full episode description	Feb 24, 2026
PPDS: Achieving Persona Consistency Through Large-Scale Dialogue Data Engineering Read the full episode description	Feb 24, 2026
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Read the full episode description	Feb 24, 2026
Personalized Dialogue Generation via Persona-Adaptive Attention Read the full episode description	Feb 24, 2026
PersonaPKT: Parameter-Efficient Knowledge Transfer for Personalized Dialogue Agents Read the full episode description	Feb 24, 2026
Persona Vectors: monitoring and controlling character traits on LLMs Read the full episode description	Feb 24, 2026
Machine Learning for Electrophysiological Phenotyping of Schizophrenia and Bipolar Disorder Read the full episode description	Feb 24, 2026
Evaluating LLM Embeddings for Psychometric Personality Prediction Read the full episode description	Feb 24, 2026
Bloom: an open source tool for automated behavioral evaluations Read the full episode description	Feb 24, 2026
AI and the Decline of Entry-Level Employment Read the full episode description	Feb 24, 2026
2019 UNILM: Unified Language Model Pre-training for NLU and NLG Read the full episode description	Feb 24, 2026
Procgen Benchmark: Measuring Generalization in Reinforcement Learning Read the full episode description	Feb 20, 2026
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Read the full episode description	Feb 20, 2026
GLM-5: Transitioning from Vibe Coding to Agentic Engineering Read the full episode description	Feb 20, 2026
Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training Read the full episode description	Feb 20, 2026
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning Read the full episode description	Feb 20, 2026
A 2024 Survey Analyzing Generalization in Deep Reinforcement Learning Read the full episode description	Feb 20, 2026
Voxtral Realtime: Native Streaming ASR with Sub-Second Latency Read the full episode description	Feb 17, 2026
The Endless Gym: Training Terminal Agents Read the full episode description	Feb 17, 2026
Teaching Models to Teach Themselves via Stepping Stone Curricula Read the full episode description	Feb 17, 2026
Moltbook: The Heartbeat of Autonomy: Fingerprinting Human Influence in AI Societies Read the full episode description	Feb 17, 2026
Jet-RL: Stable On-Policy Reinforcement Learning with Unified FP8 Flow Read the full episode description	Feb 17, 2026
Information Bottleneck-based Causal Attention for Medical Image Recognition Read the full episode description	Feb 17, 2026
Intelligent AI Delegation Read the full episode description	Feb 17, 2026
DeepVerifier: Self-Evolving Research Agents via Rubric-Guided Verification Read the full episode description	Feb 17, 2026
Advancing Mechanistic Interpretability with Sparse Autoencoders Read the full episode description	Feb 17, 2026
Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning Read the full episode description	Feb 17, 2026
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models Read the full episode description	Feb 13, 2026
Sapient Intelligence: Hierarchical Reasoning Model Read the full episode description	Feb 11, 2026
Reinforced Attention Learning Read the full episode description	Feb 11, 2026
LongCat: Scaling Embeddings Outperforms Scaling Experts in Language Models Read the full episode description	Feb 11, 2026
DR. KERNEL: Reinforcement Learning for Optimized Triton Kernel Generation Read the full episode description	Feb 11, 2026
Dario Amodei: The Adolescence of Technology Read the full episode description	Feb 11, 2026
Dario Amodei: Machines of Loving Grace Read the full episode description	Feb 11, 2026
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference Read the full episode description	Feb 11, 2026
Advances in Attention Distillation for Efficient Transformer Models Read the full episode description	Feb 11, 2026
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Read the full episode description	Feb 11, 2026
Towards a Science of Scaling Agent Systems Read the full episode description	Feb 09, 2026
Uncertainty-aware genomic deep learning with knowledge distillation Read the full episode description	Feb 06, 2026
Reinforcement Learning via Self-Distillation Read the full episode description	Feb 06, 2026
On-Policy Self-Distillation for Advanced LLM Reasoning Read the full episode description	Feb 06, 2026
Moloch’s Bargain: Market Incentives and the Rise of AI Misalignment Read the full episode description	Feb 06, 2026
Knowledge distillation to context distillation Read the full episode description	Feb 06, 2026
Distilling GNN Knowledge into Non-Neural Cell Graph Student Models Read the full episode description	Feb 06, 2026
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents Read the full episode description	Feb 06, 2026
Claude Opus 4.6 Technical Report and Agent Capabilities Read the full episode description	Feb 06, 2026
Advancing regulatory variant effect prediction with AlphaGenome Read the full episode description	Feb 06, 2026
2006 Model Compression: Ensembles Read the full episode description	Feb 06, 2026
2015: Distilling the Knowledge in a Neural Network Read the full episode description	Feb 06, 2026
Reasoning Models Generate Societies of Thought Read the full episode description	Jan 28, 2026
NeurIPS 2025: L2M: Mutual Information Scaling Law for Long-Context Language Modeling Read the full episode description	Jan 28, 2026
Long context: Dichotomy of Findings & Status of Research Read the full episode description	Jan 28, 2026
Keel: Post-LayerNorm Is Back: Stable, ExpressivE, and Deep Read the full episode description	Jan 28, 2026
SLDAgent: Evolutionary Discovery of Superhuman AI Scaling Laws Read the full episode description	Jan 26, 2026
Sequoia Capital: AGI is here Read the full episode description	Jan 24, 2026
OpenAI: Scaling PostgreSQL to 800 Million ChatGPT Users Read the full episode description	Jan 24, 2026
Agentic Reasoning for Large Language Models: A Comprehensive Roadmap Read the full episode description	Jan 24, 2026
MEMRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Read the full episode description	Jan 23, 2026
Google: R&D inference value on HBF + PNM + low latency interconnect Read the full episode description	Jan 23, 2026
Storage-next: Do We Need New Hardware for AI Storage, or Just Better Layouts? Read the full episode description	Jan 21, 2026
Meta's solution to massive DLRM inference through software defined memory Read the full episode description	Jan 21, 2026
LeCun's AMI Energy-Based Models and the Path to Autonomous Intelligence Read the full episode description	Jan 21, 2026
Process Reward Learning for LLM Reasoning Optimization Read the full episode description	Jan 19, 2026
Parallel Context-of-Experts Decoding for Efficient RAG reasoning Read the full episode description	Jan 19, 2026
OpenRouter 2025 Report: Analysis of Global LLM Usage Patterns Read the full episode description	Jan 19, 2026
Multidimensional Safety Evaluation of Frontier AI Models Read the full episode description	Jan 19, 2026
MemoBrain: Executive Memory for Tool-Augmented Reasoning Agents Read the full episode description	Jan 19, 2026
MATTRL: Collaborative Test-Time Reinforcement Learning for Multi-Agent Reasoning Read the full episode description	Jan 19, 2026
How to make LLMs better at sci-fi writing Read the full episode description	Jan 19, 2026
H-net: End-to-End Hierarchical Sequence Modeling via Dynamic Chunking Read the full episode description	Jan 19, 2026
GLM-Image: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Read the full episode description	Jan 19, 2026
CompassMem: Event-Centric Logic Maps for Agent Memory Read the full episode description	Jan 19, 2026
A Chain of Thought reasoning academic brawl Read the full episode description	Jan 19, 2026
Squisher: Approximating the Fisher Information Matrix and use cases Read the full episode description	Jan 17, 2026
Scaling laws: long context length and in context learning Read the full episode description	Jan 17, 2026
NVIDIA: TTT-E2E: Unlocking Long-Context Learning via End-to-End Test-Time Training Read the full episode description	Jan 17, 2026
Attention with a bias Read the full episode description	Jan 17, 2026
DeepSeek Engram: Scaling Large Language Models via Conditional Memory Lookup Read the full episode description	Jan 14, 2026
PageANN: Scalable Disk ANNS with Page-Aligned Graphs Read the full episode description	Dec 07, 2025
NeurIPS 2025: Homogeneous Keys, Heterogeneous Values Read the full episode description	Dec 04, 2025
NeurIPS 2025: MoBA: Mixture of Block Attention for Long-Context LLMs Read the full episode description	Nov 29, 2025
NeurIPS 2025: Reward Reasoning Model Read the full episode description	Nov 29, 2025
NeurIPS 2025: Parallel Scaling Law for Language Models Read the full episode description	Nov 29, 2025
NeurIPS 2025: Thinkless: LLM Learns When to Think Read the full episode description	Nov 29, 2025
NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data Read the full episode description	Nov 29, 2025
NeurIPS 2025: Self-Adapting Language Models Read the full episode description	Nov 29, 2025
NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example Read the full episode description	Nov 29, 2025
NeurIPS 2025: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models Read the full episode description	Nov 29, 2025
NeurIPS 2025: Large Language Diffusion Models Read the full episode description	Nov 29, 2025
NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces Read the full episode description	Nov 29, 2025
NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Read the full episode description	Nov 29, 2025
NeurIPS 2025: FlashBias: Fast Computation of Attention with Bias Read the full episode description	Nov 29, 2025
NeurIPS 2025: A-Mem: Agentic Memory for LLM Agents Read the full episode description	Nov 29, 2025
Anthropic: Disrupting the First AI-Orchestrated Cyber Espionage Campaign Read the full episode description	Nov 27, 2025
Neuromorphic computing: Brain-Inspired AI and Hardware Read the full episode description	Nov 22, 2025
DeepSeek-OCR: Contexts Optical Compression Read the full episode description	Nov 22, 2025
Anthropic: reward hacking & misalignment & sabotage Read the full episode description	Nov 22, 2025
Meta: SAM 3 Read the full episode description	Nov 20, 2025
vAttention Vs Strata: advanced GPU memory management Read the full episode description	Nov 19, 2025
MLP Mixer Models Read the full episode description	Nov 19, 2025
Mixture-of-Depths: Dynamic Compute Allocation in Transformers Read the full episode description	Nov 19, 2025
Mamba-360: State Space Models for Long Sequence Modeling Read the full episode description	Nov 19, 2025
Marin: Open LLM Optimization & Diagnostics Read the full episode description	Nov 19, 2025
AMD: Instella: Fully Open Language Models with Stellar Performance Read the full episode description	Nov 16, 2025
Mechanistic interpretability: Decoding the AI's Inner Logic: Circuits and Sparse Features Read the full episode description	Nov 15, 2025
Vidur: Simulation for Efficient LLM Inference Deployment Read the full episode description	Nov 10, 2025
SYMPHONY: Memory Management for LLM Multi-Turn Inference Read the full episode description	Nov 10, 2025
Tempo: SLO-Aware LLM Serving Maximizing Service Gain Read the full episode description	Nov 10, 2025
Spectral Gap: Analysis of Attention Layers and Graph Transformers Read the full episode description	Nov 10, 2025
Random Walk Methods for Graph Learning and Networks Read the full episode description	Nov 10, 2025
Metacognition and Skill Discovery in LLM Math Reasoning Read the full episode description	Nov 10, 2025
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow Read the full episode description	Nov 10, 2025
DSPy and TextGrad: Compiling Language Model Systems Read the full episode description	Nov 10, 2025
Doubly Stochastic Attention for Transformers Read the full episode description	Nov 10, 2025
Continuous Autoregressive Language Models: CALM Read the full episode description	Nov 10, 2025
Context Distillation for Language Models Read the full episode description	Nov 10, 2025
Confucius: Intent-Driven Network Management with Multi-Agent LLMs Read the full episode description	Nov 10, 2025
CARTRIDGE: Efficient In-Context Learning via Distillation Read the full episode description	Nov 10, 2025
AlphaEvolve: Mathematical Discovery at Scale Read the full episode description	Nov 10, 2025
AdaFlow: Variance-Adaptive Flow-Based Imitation Learning Read the full episode description	Nov 10, 2025
A Framework for LLM Application Safety Evaluation Read the full episode description	Nov 10, 2025
SuperBPE: Space Travel for Language Models Read the full episode description	Nov 04, 2025
Samsung: zFLoRA: Zero-Latency Fused Low-Rank Adapters Read the full episode description	Nov 04, 2025
PolicySmith: Automated Systems Heuristic Generation via LLMs Read the full episode description	Nov 04, 2025
MorphKV: Constant-Sized KV Caches for LLM Inference Read the full episode description	Nov 04, 2025
HALoS: Hierarchical Asynchronous LLM Training over Slow Networks Read the full episode description	Nov 04, 2025
Gumbel-Softmax for Differentiable Categorical Reparameterization and Selective Networks Read the full episode description	Nov 04, 2025
Google: Supervised Reinforcement Learning for Step-wise Reasoning in LLMs Read the full episode description	Nov 04, 2025
Anchored Diffusion Language Model: Superior Generation and Reasoning Read the full episode description	Nov 04, 2025
RetNet: Retentive Networks: Transformer Successor for Large Language Models Read the full episode description	Nov 02, 2025
Kimi Linear: Efficient Expressive Attention Architecture Read the full episode description	Nov 02, 2025
ALiBi: Attention with Linear Biases Enables Length Extrapolation Read the full episode description	Nov 01, 2025
Small Versus Large Models for Requirements Classification Read the full episode description	Oct 31, 2025
Quest: Query-Aware Sparsity for Efficient LLM Inference Read the full episode description	Oct 31, 2025
Hyper-Scaling LLM Inference with KV Cache Compression Read the full episode description	Oct 31, 2025
Flash-LLM: Efficient LLM Inference with Unstructured Sparsity on Tensor Cores Read the full episode description	Oct 31, 2025
ELASTIC: Linear Attention for Sequential Interest Compression Read the full episode description	Oct 31, 2025
Architectural Scaling Laws for Efficient LLMs Read the full episode description	Oct 31, 2025
Anthropic: Introspective Awareness in LLMs Read the full episode description	Oct 31, 2025
TxGNN: Foundation Model for Zero-Shot Drug Repurposing Read the full episode description	Oct 29, 2025
Sentence-BERT: Siamese Networks for Sentence Embeddings Read the full episode description	Oct 29, 2025
ATTENTION2D and lean attention: Distributed Self-Attention Read the full episode description	Oct 29, 2025
The Free Transformer: VAE Extension for Decoders Read the full episode description	Oct 26, 2025
Survey of Emerging Topics in AI and Robotics Read the full episode description	Oct 26, 2025
Stuck in the Matrix: LLM Spatial Reasoning Read the full episode description	Oct 26, 2025
Structural Understanding of LLM Overthinking Read the full episode description	Oct 26, 2025
Strata: Efficient Hierarchical Context Caching for LLM Serving Read the full episode description	Oct 26, 2025
STAR: Sub-Entry Sharing TLB for Multi-Instance GPU Efficiency Read the full episode description	Oct 26, 2025
Ring-linear: Efficient Hybrid Architecture for Long-Context Reasoning Read the full episode description	Oct 26, 2025
Open-o3 Video: Spatio-Temporal Grounded Reasoning Read the full episode description	Oct 26, 2025
MASA: Meta-Awareness via Self-Alignment Reinforcement Learning Read the full episode description	Oct 26, 2025
Lp-Reg: Low-Probability Tokens Sustain RL Exploration Read the full episode description	Oct 26, 2025
LLMs Learning from Verbal Feedback Without Scalar Rewards Read the full episode description	Oct 26, 2025
LithOS: Operating System for Efficient GPU Machine Learning Read the full episode description	Oct 26, 2025
LLM-Empowered Knowledge Graph Construction: A Survey Read the full episode description	Oct 26, 2025
LFM2-8B-A1B: Efficient On-Device Mixture-of-Experts Read the full episode description	Oct 26, 2025
Latent Constituency in Humans and LLMs Read the full episode description	Oct 26, 2025
Introducing MTEB v2: Multimodal Embedding Evaluation Read the full episode description	Oct 26, 2025
Internal Mechanisms of a Large Language Model Read the full episode description	Oct 26, 2025
GigaBrain-0: World Model-Powered Generalist Robots Read the full episode description	Oct 26, 2025
FlashAttention: IO-Aware Fast and Memory-Efficient Attention Read the full episode description	Oct 26, 2025
Cognitive Impact of AI and Search on Essay Writing Read the full episode description	Oct 26, 2025
Cattell–Horn–Carroll Theory of Intelligence Read the full episode description	Oct 26, 2025
Structural Understanding of LLM Overthinking Read the full episode description	Oct 22, 2025
RoBERTa: Robustly Optimized BERT Pretraining Approach Read the full episode description	Oct 22, 2025
REFRAG: v2 paper: Efficient RAG Decoding via Context Compression Read the full episode description	Oct 22, 2025
RAG-Anything: Unified Multimodal Knowledge Retrieval Framework Read the full episode description	Oct 22, 2025
LLM-Guided Hierarchical Retrieval: The LATTICE Framework Read the full episode description	Oct 22, 2025
LightMem: Lightweight Efficient Memory-Augmented Generation Read the full episode description	Oct 22, 2025
Inheritune: Efficient LLM Training via Attention Collapse Read the full episode description	Oct 22, 2025
In-Context Learning as Implicit Learning Algorithms Read the full episode description	Oct 22, 2025
EssenceBench: Compressing LLM Benchmarks via Redundancy and Genetic Algorithm Read the full episode description	Oct 22, 2025
Elastic-Cache: Adaptive KV Caching for Diffusion LLMs Read the full episode description	Oct 22, 2025
Dr.LLM: Dynamic Layer Routing in LLMs Read the full episode description	Oct 22, 2025
A Psychometric Framework for Artificial General Intelligence Read the full episode description	Oct 22, 2025
Mojo: Performance-Portable HPC Kernels on GPUs Read the full episode description	Oct 18, 2025
Geometric Flows of Logic in LLM Representation Space Read the full episode description	Oct 18, 2025
Scaling Reinforcement Learning Compute for LLMs Read the full episode description	Oct 17, 2025
HBF: High Bandwidth Flash for AI Inferencing Read the full episode description	Oct 15, 2025
COPA: Composable On-Package GPU Architecture for Domain Specialization Read the full episode description	Oct 15, 2025
Architectural Migration to Multi-head Latent Attention Read the full episode description	Oct 15, 2025
UniVideo: Unified Video Understanding, Generation, and Editing Read the full episode description	Oct 11, 2025
Training-Free GRPO: Policy Optimization via Context Space Read the full episode description	Oct 11, 2025
RAND: Securing AI Model Weights: Preventing Theft and Misuse Read the full episode description	Oct 11, 2025
Performance of Confidential Computing for Large Language Models Read the full episode description	Oct 11, 2025
Multi-Agent Tool-Integrated Policy Optimization (MATPO) Read the full episode description	Oct 11, 2025
Google: Confidential Computing with Accelerated AI Workloads on GCE Read the full episode description	Oct 11, 2025
AWS: Nitro System: Security, Enclaves, and Generative AI Read the full episode description	Oct 11, 2025
Anthropic: Confidential Inference via Trusted Virtual Machines Read the full episode description	Oct 11, 2025
Samsung: Less is More: Recursive Reasoning with Tiny Networks Read the full episode description	Oct 10, 2025
Petri: Accelerating AI Safety Auditing Read the full episode description	Oct 10, 2025
Low-Precision Transformer Failure in Flash Attention Read the full episode description	Oct 10, 2025
Early Experience for Language Agent Improvement Read the full episode description	Oct 10, 2025
Dragon Hatchling: Brain-Inspired AI Architecture Read the full episode description	Oct 10, 2025
CLUE: Hidden-State Clustering for Non-parametric Verification Read the full episode description	Oct 10, 2025
Agentic Context Engineering: Evolving Contexts for Self-Improving LLMs Read the full episode description	Oct 10, 2025
AGENTFLOW: In-the-Flow Agentic System Optimization Read the full episode description	Oct 10, 2025
Test-Time Reinforcement Learning for LLMs Read the full episode description	Oct 08, 2025
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory Read the full episode description	Oct 08, 2025
Reactive Transformer: Stateful Real-Time Language Models Read the full episode description	Oct 08, 2025
RECAP: Safety Alignment via Counter-Aligned Prefilling Read the full episode description	Oct 08, 2025
Paris: Decentralized Open-Weight Diffusion Model Read the full episode description	Oct 08, 2025
ONNX Ecosystem, Optimization, and Deployment Read the full episode description	Oct 08, 2025
NIST Evaluation of DeepSeek AI Models Read the full episode description	Oct 08, 2025
MotionRAG: Retrieval-Augmented Image-to-Video Generation Read the full episode description	Oct 08, 2025
LongCodeZip: Compress Long Code Context for LLMs Read the full episode description	Oct 08, 2025
Imperceptible Jailbreaking Against Large Language Models Read the full episode description	Oct 08, 2025
Implicit Dynamics of In-Context Learning Read the full episode description	Oct 08, 2025
GNN101: Visual Learning of Graph Neural Networks Read the full episode description	Oct 08, 2025
Emergent Abilities of Large Language Models Read the full episode description	Oct 08, 2025
DC-VideoGen: Efficient Video Generation with Deep Compression Read the full episode description	Oct 08, 2025
Contextual Blocks: Implicit Weight Updates and Federated Learning Read the full episode description	Oct 08, 2025
CoDA: Collaborative Multi-Agent Data Visualization Read the full episode description	Oct 08, 2025
Analog In-Memory Attention for Energy-Efficient LLMs Read the full episode description	Oct 08, 2025
ACON: Optimizing Context Compression for LLM Agents Read the full episode description	Oct 08, 2025
Regression Language Models for Code Metrics Read the full episode description	Oct 03, 2025
Introducing RTEB: Retrieval Embedding Benchmark Read the full episode description	Oct 03, 2025
CUDA Unified Memory and Heterogeneous Memory Management Read the full episode description	Oct 02, 2025
Moravec's Paradox and AI Automation Limits Read the full episode description	Oct 01, 2025
Characterizing LLM KV Cache Workloads in Production Read the full episode description	Oct 01, 2025
BurstGPT: A Real-World LLM Serving Workload Dataset Read the full episode description	Oct 01, 2025
Qwen3-Next & Qwen3-Omni technical report Read the full episode description	Sep 30, 2025
Variational Reasoning Framework for Language Models Read the full episode description	Sep 29, 2025
Schoenfeld Theory Applied to Large Reasoning Models Read the full episode description	Sep 27, 2025
Federated Learning with Soft Embeddings for Retrieval Read the full episode description	Sep 27, 2025
CWM: Code Generation with World Models Read the full episode description	Sep 27, 2025
Tree-based Group Policy Optimization for LLM Agents Read the full episode description	Sep 26, 2025
Seedream 4.0: Multimodal Image Generation System Read the full episode description	Sep 26, 2025
GDPval: Measuring AI Performance on Real-World Work Read the full episode description	Sep 26, 2025
EmbeddingGemma: Powerful Lightweight Text Representations Read the full episode description	Sep 26, 2025
CE-GPPO: Controlling Entropy via Gradient-Preserving Policy Optimization Read the full episode description	Sep 26, 2025
LLM-I: Interleaved Multimodal Creators via Tool-Use Read the full episode description	Sep 20, 2025
Adaptive Compression Techniques for Efficient LLM Inference Read the full episode description	Sep 20, 2025
THOR: Hierarchical RL for Mathematical Reasoning Read the full episode description	Sep 19, 2025
The Uneven Diffusion of AI Adoption Read the full episode description	Sep 19, 2025
Single-stream Policy Optimization for LLMs Read the full episode description	Sep 19, 2025
SearchInstruct: Instruction Tuning with Dynamic Retrieval Read the full episode description	Sep 19, 2025
FlowRL: Distribution Matching for LLM Reasoning Read the full episode description	Sep 19, 2025
Evolving Language Models Without Labels: EVOL-RL Read the full episode description	Sep 19, 2025
REFRAG: Rethinking RAG-based Decoding Read the full episode description	Sep 18, 2025
Pre-computing & reusing KV caches to accelerate RAG inference Read the full episode description	Sep 18, 2025
DeepSeek-R1: Reinforcing LLM Reasoning Through Self-Evolution Read the full episode description	Sep 18, 2025
WebSailor-V2: Bridging Proprietary Agents with Synthetic Data and RL Read the full episode description	Sep 17, 2025
TailorKV: Hybrid KV Cache Compression for LLMs Read the full episode description	Sep 17, 2025
ShadowKV: High-Throughput Long-Context LLM Inference Read the full episode description	Sep 17, 2025
QuantAgent: Multi-Agent LLM for High-Frequency Trading Read the full episode description	Sep 17, 2025
MIRAGE: Optimizing LLM KV Cache with Parameter Remapping Read the full episode description	Sep 17, 2025
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning Read the full episode description	Sep 17, 2025
Infini-gram: Scaling Unbounded N-gram Language Models Read the full episode description	Sep 17, 2025
Dynamic Chunking for Hierarchical Sequence Modeling Read the full episode description	Sep 17, 2025
UQ: Unsolved Questions for Language Models Read the full episode description	Sep 16, 2025
PETALS: Collaborative Large Language Model Inference and Fine-tuning Read the full episode description	Sep 16, 2025
Non-Penetrative Tensor Partitioning for Collaborative AIoT Inference Read the full episode description	Sep 16, 2025
Native Sparse Attention: Efficient Long-Context LLMs Read the full episode description	Sep 16, 2025
Janus-Pro: Unified Multimodal AI with Scaled Improvements Read the full episode description	Sep 16, 2025
Hierarchical Reasoning Model: Brain-Inspired AI for Complex Tasks Read the full episode description	Sep 16, 2025
Generalist Reward Modeling with Inference-Time Scaling Read the full episode description	Sep 16, 2025
Federated Post-Training LLMs: An Accessibility and Efficiency Survey Read the full episode description	Sep 16, 2025
Collaborative Edge Inference with Dynamic Task Offloading and Early Exiting Read the full episode description	Sep 16, 2025
CodeI/O: Reasoning Patterns Through Code Input-Output Prediction Read the full episode description	Sep 16, 2025
Adaptive LLM Partitioning for Edge Inference Read the full episode description	Sep 16, 2025
The Illusion of Diminishing Returns in LLM Execution Read the full episode description	Sep 15, 2025
PyTorch FSDP: Scaling Fully Sharded Data Parallel Read the full episode description	Sep 15, 2025
MetaGraph: knowledge graphs from financial NLP Read the full episode description	Sep 15, 2025
Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Model Read the full episode description	Sep 15, 2025
HybridServe: Efficient LLM Inference with Hybrid Caching Read the full episode description	Sep 15, 2025
GraphSAGE: Inductive Representation Learning on Large Graphs Read the full episode description	Sep 15, 2025
FlexGen: High-Throughput LLM Inference on a Single GPU Read the full episode description	Sep 15, 2025
AWQ: On-Device LLM Compression and Acceleration Read the full episode description	Sep 15, 2025
Llama 3: Architecture, Capabilities, and Safety Read the full episode description	Sep 14, 2025
Graph Patterns of Knowledge in Large Language Models Read the full episode description	Sep 14, 2025
Survey of Reinforcement Learning for Large Reasoning Models Read the full episode description	Sep 13, 2025
Statistical Methods for Generative AI Reliability Read the full episode description	Sep 13, 2025
SpikingBrain: Brain-Inspired LLMs for Efficient Long-Context Processing Read the full episode description	Sep 13, 2025
EntiGraph: Scaling Language Models with Synthetic Pretraining Read the full episode description	Sep 13, 2025
All for One: LLMs Solve Mental Math at the Last Token Read the full episode description	Sep 13, 2025
Parallel-R1: Reinforcement Learning for Parallel Thinking in LLMs Read the full episode description	Sep 12, 2025
NOVELTYBENCH: Evaluating Language Model Diversity Read the full episode description	Sep 12, 2025
HyperController: Fast, Stable Reinforcement Learning Hyperparameter Optimization Read the full episode description	Sep 12, 2025
Explaining AI for Digital Advertising with LLMs Read the full episode description	Sep 11, 2025
ByteCheckpoint: A Unified LLM Checkpointing System Read the full episode description	Sep 11, 2025
AdLlama: Boosting Ad Performance with Reinforcement Learning Read the full episode description	Sep 11, 2025
Mini-o3: Scaling Reasoning for Visual Search Read the full episode description	Sep 10, 2025
Masked Diffusion Models: Performance and Theory Read the full episode description	Sep 10, 2025
K2-Think: A Parameter-Efficient Reasoning System Read the full episode description	Sep 10, 2025
INF2: Near-Storage LLM Inference for High Throughput Read the full episode description	Sep 10, 2025
Darling: Reinforcing Diversity and Quality in Language Models Read the full episode description	Sep 10, 2025
BLEU: Automatic Machine Translation Evaluation Read the full episode description	Sep 10, 2025
AlphaEvolve: AI for Scientific and Algorithmic Discovery Read the full episode description	Sep 10, 2025
TraceRL: Reinforcement Learning for Diffusion Language Models Read the full episode description	Sep 09, 2025
LLM Benchmark Robustness to Linguistic Variation Read the full episode description	Sep 09, 2025
Behavioral Fingerprinting of Large Language Models Read the full episode description	Sep 09, 2025
Offloading LLM Models and KV Caches to NVMe SSDs Read the full episode description	Sep 08, 2025
SGLang: Efficient Language Model Program Execution Read the full episode description	Sep 07, 2025
OpenELM: Apple's Open Language Model Family Read the full episode description	Sep 07, 2025
GPT-NeoX: Large-Scale Autoregressive Language Modeling in PyTorch Read the full episode description	Sep 07, 2025
FineVision: Open Data for Computer Vision Read the full episode description	Sep 07, 2025
Evaluating Large Language Models Trained on Code Read the full episode description	Sep 07, 2025
Eleuther: evaluating LLMs Read the full episode description	Sep 07, 2025
Democratizing AI Compute: The Modular Vision Read the full episode description	Sep 07, 2025
SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence Read the full episode description	Sep 06, 2025
Limitations of Embedding-Based Retrieval Read the full episode description	Sep 06, 2025
MTEB & MMTEB: The Massive Text Embedding Benchmark Read the full episode description	Sep 05, 2025
Inverse IFEval: Unlearning LLM Cognitive Inertia Read the full episode description	Sep 05, 2025
EmbeddingGemma: On-Device AI for High-Quality Embeddings Read the full episode description	Sep 05, 2025
DeepResearch Arena: Benchmarking LLMs' Research Abilities Read the full episode description	Sep 05, 2025
The Rise of Physical Neural Networks Read the full episode description	Sep 04, 2025
Supervised Learning in DNA Neural Networks Read the full episode description	Sep 04, 2025
FastVLM: Efficient Vision Encoding for Language Models Read the full episode description	Sep 04, 2025
Apertus Tech Report Overview Read the full episode description	Sep 04, 2025
Scientific LLMs: A Data-Centric Survey and Roadmap Read the full episode description	Sep 03, 2025
rStar2-Agent: Smarter Math Reasoning Through Agentic RL Read the full episode description	Sep 03, 2025
FusionANNS: Billion-Scale ANNS with SSD and GPU Read the full episode description	Sep 03, 2025
Training Recurrent Neural Networks: Vanishing and Exploding Gradients Read the full episode description	Aug 27, 2025
SPAM: Stabilizing LLM Training with Spike-Aware Optimization Read the full episode description	Aug 27, 2025
Pimba: Processing-in-Memory for LLM Serving Read the full episode description	Aug 27, 2025
Oaken: Fast, Efficient LLM Serving with Hybrid KV Cache Quantization Read the full episode description	Aug 27, 2025
AdamW: Decoupled Weight Decay Regularization for Adaptive Gradient Algorithms Read the full episode description	Aug 27, 2025
Adafactor: Memory-Efficient Adaptive Learning Rates Read the full episode description	Aug 27, 2025
Quantizing Diffusion LLMs: A Systematic Study Read the full episode description	Aug 26, 2025
ODYSSEY: Unified Mobile Manipulation for Agile Quadruped Robots Read the full episode description	Aug 26, 2025
Google: Measuring AI's Environmental Impact at Scale Read the full episode description	Aug 26, 2025
ComoRAG: Cognitively Inspired Narrative Reasoning Read the full episode description	Aug 26, 2025
GPT-5 Spatial Intelligence: An Empirical Study Read the full episode description	Aug 24, 2025
DeepSeek-V3.1: A Hybrid AI Model with Enhanced Reasoning Read the full episode description	Aug 23, 2025
Compressed Experts: Efficient MoE Model Editing Read the full episode description	Aug 23, 2025
Genie 3: A New Frontier for World Models Read the full episode description	Aug 22, 2025
Los Alamos: overcoming the memory wall fighting sparse memory access Read the full episode description	Aug 21, 2025
Switch Transformers: Trillion Parameter Models with Sparsity Read the full episode description	Aug 20, 2025
Speed Always Wins: Efficient Large Language Model Architectures Read the full episode description	Aug 20, 2025
Linear Transformers: Faster Than RNNs Read the full episode description	Aug 20, 2025
Self-Search Reinforcement Learning for LLMs Read the full episode description	Aug 18, 2025
Diffusion Language Models: Principles, Techniques, and Applications Read the full episode description	Aug 18, 2025
Continuous Batching for LLM Inference: Throughput and Latency Gains Read the full episode description	Aug 18, 2025
Atom: Low-Bit Quantization for LLM Serving Read the full episode description	Aug 18, 2025
The Mapped Memory Mistake: Why DBMSs Should Avoid MMAP Read the full episode description	Aug 13, 2025
Scaling PostgreSQL at OpenAI: Read-Heavy Workloads and Optimizations - PGConf.dev 2025 Read the full episode description	Aug 13, 2025
pNFS Flex Files Read the full episode description	Aug 13, 2025
NVMe Offload on Colossal AI: Breaking the GPU Memory Wall Read the full episode description	Aug 13, 2025
NVIDIA GDS, BAM Vs RocM solutions Read the full episode description	Aug 13, 2025
fMoE: Fine-Grained Expert Offloading for MoE Serving Read the full episode description	Aug 13, 2025
ELMo-Tune-V2: LLM-Assisted Auto-Tuning for Key-Value Stores Read the full episode description	Aug 13, 2025
AiSAQ: DRAM-free ANNS with Product Quantization Read the full episode description	Aug 13, 2025
Qwen-Image: Generation and Editing with Precision Read the full episode description	Aug 12, 2025
Mem0: Scalable Long-Term Memory for AI Agents Read the full episode description	Aug 12, 2025
Chain-of-Thought Reasoning: A Brittle Mirage? Read the full episode description	Aug 11, 2025
Xavier Initialization: Deep Feedforward Networks: Training Difficulties and Solutions Read the full episode description	Aug 08, 2025
ZeRO-Offload: Democratizing Billion-Scale Model Training Read the full episode description	Aug 08, 2025
vLLM & PagedAttention: Efficient LLM Serving with Virtual Memory Read the full episode description	Aug 08, 2025
TierTrain: Proactive Memory Tiering for DNN Training Read the full episode description	Aug 08, 2025
Test-Time Scaling Read the full episode description	Aug 08, 2025
The Elements of Differentiable Programming Read the full episode description	Aug 08, 2025
Teraio: Cost-Efficient LLM Training via Lifetime-Aware Tensor Offloading Read the full episode description	Aug 08, 2025
MoE Offloaded Read the full episode description	Aug 08, 2025
Shared Virtual Memory: Design and Performance Implications Read the full episode description	Aug 08, 2025
Reinforcement Learning Read the full episode description	Aug 08, 2025
Reinforcement Pre-Training for Language Models Read the full episode description	Aug 08, 2025
PartitionedVC: Optimizing External Memory Graph Analytics Read the full episode description	Aug 08, 2025
Multiagent Debate Improves Language Model Reasoning Read the full episode description	Aug 08, 2025
Multi Query Attention: PaLM: Scaling Language Modeling with Pathways Read the full episode description	Aug 08, 2025
MUVERA: Efficient Multi-Vector Information Retrieval Read the full episode description	Aug 08, 2025
Mu: The Windows Settings Agent Language Model Read the full episode description	Aug 08, 2025
Movement Pruning: Adaptive Sparsity by Fine-Tuning Read the full episode description	Aug 08, 2025
Mistral 7B: Superior Performance in a Smaller Package Read the full episode description	Aug 08, 2025
MetaScale: Test-Time Scaling with Evolving Meta-Thoughts Read the full episode description	Aug 08, 2025
Microscaling Quantization for Large Language Models Read the full episode description	Aug 08, 2025
Lost in the Middle: How Language Models Use Long Contexts Read the full episode description	Aug 08, 2025
MEGABYTE: Multiscale Transformers for Million-byte Sequences Read the full episode description	Aug 08, 2025
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Read the full episode description	Aug 08, 2025
LoRA: Low-Rank Adaptation of Large Language Models Read the full episode description	Aug 08, 2025
LMCache: Supercharging LLM Performance with KV Cache Management Read the full episode description	Aug 08, 2025
KVQuant: LLM Inference with KV Cache Quantization Read the full episode description	Aug 08, 2025
Kaiming Initialization and PReLU Read the full episode description	Aug 08, 2025
Gemma: Google DeepMind's Open Language Models Read the full episode description	Aug 08, 2025
Fast Learning for Deep Belief Networks Read the full episode description	Aug 08, 2025
FP8 Quantization Read the full episode description	Aug 08, 2025
FlashAttention-2: Faster Attention with Better Parallelism Read the full episode description	Aug 08, 2025
DyNN-Offload: Efficient Memory for Dynamic Neural Networks Read the full episode description	Aug 08, 2025
Dynamic Tanh: Transformers Without Normalization Read the full episode description	Aug 08, 2025
DroidSpeak: Cross-LLM KV Cache Sharing Read the full episode description	Aug 08, 2025
DiMSUM: Image Generation with Diffusion Mamba Read the full episode description	Aug 08, 2025
DeepSeek Safety Concerns Read the full episode description	Aug 08, 2025
Diffusion Probabilitic Models: Deep Unsupervised Learning Through Diffusion Processes Read the full episode description	Aug 08, 2025
Demystifying Mamba: Architecture and Capabilities Read the full episode description	Aug 08, 2025
DeepSeek-R1: Incentivizing Reasoning in LLMs Read the full episode description	Aug 08, 2025
DeepSeek-V3: A Technical Report Read the full episode description	Aug 08, 2025
DeepSeekMoE: Scalable Mixture-of-Experts Language Models Read the full episode description	Aug 08, 2025
DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis Read the full episode description	Aug 08, 2025
Concept Drift Read the full episode description	Aug 08, 2025
ColBERT and ColBERT v2 Read the full episode description	Aug 08, 2025
CODEGEN: Open Language Model for Code Synthesis Read the full episode description	Aug 08, 2025
Chain of thought Read the full episode description	Aug 08, 2025
AI and the Memory Wall: Overcoming Bottlenecks Read the full episode description	Aug 08, 2025
AdamW: Decoupled Weight Decay Regularization for Adaptive Gradient Algorithms Read the full episode description	Aug 08, 2025
Adam: A Method for Stochastic Optimization Read the full episode description	Aug 08, 2025
Transformer Scaling Read the full episode description	Aug 07, 2025
Scaling Laws Read the full episode description	Aug 07, 2025
ResNets - residual block Read the full episode description	Aug 07, 2025
RoPE Read the full episode description	Aug 07, 2025
LSTM: the forget gate Read the full episode description	Aug 07, 2025
Longformer: A Transformer for Long Documents Read the full episode description	Aug 07, 2025
Linformer: Efficient Self-Attention with Linear Complexity Read the full episode description	Aug 07, 2025
LeNet-5: Convolutional Networks for Character Recognition Read the full episode description	Aug 07, 2025
Layer Normalization and Dual Patch Normalization Read the full episode description	Aug 07, 2025
Learning from repeated data Read the full episode description	Aug 07, 2025
GPT4 Technical Report Read the full episode description	Aug 07, 2025
GQA: Grouped Query Attention Read the full episode description	Aug 07, 2025
GPT3 Read the full episode description	Aug 07, 2025
GPT2 Read the full episode description	Aug 07, 2025
GELU Read the full episode description	Aug 07, 2025
Dropout Read the full episode description	Aug 07, 2025
Constitutional AI: Harmlessness Through Self-Improvement Read the full episode description	Aug 07, 2025
Chinchilla: Optimal Language Model Scaling Read the full episode description	Aug 07, 2025
Chamfer Matching: Image Registration and Medical Applications Read the full episode description	Aug 07, 2025
BART Read the full episode description	Aug 07, 2025
BERT Read the full episode description	Aug 07, 2025
Batch Normalization Read the full episode description	Aug 07, 2025
Attention is all you need Read the full episode description	Aug 07, 2025

AI Post Transformers

By mcgrof

Category: Technology

Open in Apple Podcasts

Open RSS feed

Open Website

Description