Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
| Episode | Date |
|---|---|
|
Titans: Learning to Memorize at Test Time
|
May 20, 2026 |
|
The Sparsity Wall: What Reiner Pope Told Dwarkesh About MoE and Sparse Attention
|
May 19, 2026 |
|
Serving MoE Models with Disaggregated Expert Parallelism
|
May 19, 2026 |
|
Affordable Large-Scale Decoding Through Model-System Co-Design
|
May 19, 2026 |
|
NanoFlow and the Future of LLM Serving
|
May 19, 2026 |
|
Air Force One, Jensen Huang, and Anthropic's 2028 Memo
|
May 15, 2026 |
|
Deep Kernel Fusion for Transformer Decoding
|
May 15, 2026 |
|
FlashFuser and Hopper-Era FFN Kernel Fusion
|
May 15, 2026 |
|
Lossless Sparse Deltas for RL Networks
|
May 15, 2026 |
|
JANUS for Scalable MoE Inference
|
May 15, 2026 |
|
Scaling Laws for Multilingual Code Pretraining
|
May 15, 2026 |
|
Causal-JEPA for Object-Level World Models
|
May 15, 2026 |
|
Ministral 3: Cascade Distillation for Long-Context Multimodal Models
|
May 15, 2026 |
|
When Many-Shot CoT Becomes Test-Time Learning
|
May 15, 2026 |
|
Agentic AI as a Path to AGI
|
May 15, 2026 |
|
Trace Rewriting Against Unauthorized LLM Distillation
|
May 15, 2026 |
|
TMAS: Scaling Test-Time Compute with Multi-Agent Synergy
|
May 14, 2026 |
|
AI Co-Mathematician for Mathematical Research
|
May 14, 2026 |
|
δ-mem and Online Memory for LLMs
|
May 13, 2026 |
|
Qwen-Image-2.0 for Unified Generation and Editing
|
May 13, 2026 |
|
MiA-Signature and Global Activation for Long Context
|
May 13, 2026 |
|
Long Context Pre-Training with Lighthouse Attention
|
May 13, 2026 |
|
MELT: Decoupling Compute From Memory
|
May 13, 2026 |
|
ELF and Continuous Language Diffusion
|
May 12, 2026 |
|
ForkKV for Multi-LoRA Agent Serving
|
May 12, 2026 |
|
Metacognition Against Confident Hallucinations
|
May 12, 2026 |
|
TIDE and the Rare Token Problem
|
May 12, 2026 |
|
Agentic Discovery for Test-Time Scaling
|
May 12, 2026 |
|
Vistara Brings CXL Memory to Hyperscale
|
May 11, 2026 |
|
Split Personality Training Reveals Latent Knowledge
|
May 10, 2026 |
|
Why Transformers Fail at Counting
|
May 10, 2026 |
|
Optimization, Credit Assignment, and Consciousness
|
May 10, 2026 |
|
Reasoning Theater and Unfaithful Chain-of-Thought
|
May 10, 2026 |
|
RAPTOR: Stable Concept Directions From Logistic Probes
|
May 10, 2026 |
|
When LLM Judges Become Coin Flips
|
May 09, 2026 |
|
LAPS for Length-Aware LLM Serving
|
May 09, 2026 |
|
Generative Modeling via Drifting in One Step
|
May 09, 2026 |
|
EverMemOS for Long-Horizon Agent Memory
|
May 09, 2026 |
|
Explicit Information Transmission for Context Compression
|
May 09, 2026 |
|
Why LLM Serving Needs Mathematical Optimization
|
May 09, 2026 |
|
Can Models Learn from Long Context?
|
May 08, 2026 |
|
How Models Detect Hidden Activation Steering
|
May 08, 2026 |
|
What the Frog’s Eye Tells the Brain
|
May 08, 2026 |
|
Learning in Random Nets and Generalization
|
May 08, 2026 |
|
Backpropagation Through Time Explained
|
May 08, 2026 |
|
SGLang for Faster Structured LLM Programs
|
May 08, 2026 |
|
TensorFlow for Distributed Machine Learning Systems
|
May 08, 2026 |
|
Machine Learning Self-Calibrated FPGA Time-to-Digital Converter
|
May 08, 2026 |
|
Why LightGBM Made Boosted Trees Fast
|
May 07, 2026 |
|
Fast FPGA BDT Inference for LHC Triggers
|
May 07, 2026 |
|
Fast FPGA Inference for LHC Triggers
|
May 07, 2026 |
|
Boosted Decision Trees for CMS Muon Triggers
|
May 07, 2026 |
|
Automating DNN Compilation for FPGA Accelerators
|
May 07, 2026 |
|
Synchronous Data Flow for Signal Processing
|
May 07, 2026 |
|
Caffeine: A Unified FPGA for CNNs
|
May 07, 2026 |
|
Caffe and the Rise of CNN Frameworks
|
May 07, 2026 |
|
RFNoC SISO Processor via High-Level Synthesis
|
May 06, 2026 |
|
Automating CNN Mapping on Embedded FPGAs
|
May 06, 2026 |
|
Training Million-Token LLMs Beyond the Memory Barrier
|
May 04, 2026 |
|
Training LLMs for Divide-and-Conquer Reasoning
|
May 04, 2026 |
|
Reinforcement Learning in 2025: An Overview
|
May 04, 2026 |
|
PackKV Lossy Compression for KV Caches
|
May 04, 2026 |
|
Geometric Memory in Deep Sequence Models
|
May 03, 2026 |
|
Selective Classification with Deep Neural Networks
|
May 03, 2026 |
|
Self-Improving Pretraining With Post-Trained Models
|
May 03, 2026 |
|
How Induction Heads Emerge in Transformers
|
May 03, 2026 |
|
node2vec and Learning Graph Embeddings
|
May 03, 2026 |
|
DeepWalk and the Rise of Graph Embeddings
|
May 03, 2026 |
|
Fast Speech Recognition by Transcript Editing
|
May 02, 2026 |
|
Discrete Representations for Continual Reinforcement Learning
|
May 02, 2026 |
|
Recursive Multi-Agent Systems in Latent Space
|
May 02, 2026 |
|
A Practical Review of Mechanistic Interpretability
|
May 02, 2026 |
|
DeltaKV: Compressing KV Caches for Long Context
|
May 02, 2026 |
|
CacheFlow and 3D-Parallel KV Cache Restoration
|
May 02, 2026 |
|
Do Language Models Know Their Limits
|
May 02, 2026 |
|
Hopfield Networks and Transformer Attention as Memory
|
May 01, 2026 |
|
Sequence to Sequence Learning with Neural Networks
|
May 01, 2026 |
|
Bahdanau Attention for Neural Machine Translation
|
May 01, 2026 |
|
Convolutional Sequence to Sequence Learning for Translation
|
May 01, 2026 |
|
Teaching Language Models to Verbalize Uncertainty
|
May 01, 2026 |
|
Can LLMs Judge Their Own Capabilities?
|
May 01, 2026 |
|
ChartNet for Robust Multimodal Chart Understanding
|
May 01, 2026 |
|
Scaling Test-Time Compute for Reasoning Models
|
Apr 30, 2026 |
|
Breaking the Prefix Barrier with Shared KV Cache
|
Apr 30, 2026 |
|
Reverse-Mode Differentiation Across AD and Neural Nets
|
Apr 30, 2026 |
|
Stabilizing Efficient Reasoning with Step-Level Advantage Selection
|
Apr 29, 2026 |
|
Stochastic KV Routing for Cache Sharing
|
Apr 29, 2026 |
|
Deep Learning in Spiking Neural Networks
|
Apr 28, 2026 |
|
FPGA Neural Network Accelerators for Space
|
Apr 28, 2026 |
|
World-R1 Improves 3D Consistency in Text-to-Video
|
Apr 28, 2026 |
|
Long Short-Term Memory and Vanishing Gradients
|
Apr 27, 2026 |
|
Compressed Convolutional Attention in Latent Space
|
Apr 27, 2026 |
|
AgenticQwen and Small Industrial Tool Agents
|
Apr 27, 2026 |
|
Experience-Based Learning Beyond Human Data
|
Apr 26, 2026 |
|
Breiman's Two Cultures of Statistical Modeling
|
Apr 26, 2026 |
|
Muon Is Scalable for LLM Training
|
Apr 25, 2026 |
|
Kimi K2.5 and Visual Agent Swarms
|
Apr 25, 2026 |
|
ScoutAttention for Efficient KV Cache Offloading
|
Apr 25, 2026 |
|
DeepSeek-V4 and Practical Million-Token Context
|
Apr 25, 2026 |
|
Agentic Aggregation for Long-Horizon AI Tasks
|
Apr 23, 2026 |
|
TokenDance for Multi-Agent KV Cache Sharing
|
Apr 23, 2026 |
|
RetrievalAttention for Long-Context LLM Inference
|
Apr 22, 2026 |
|
DreamerV3 World Models Across 150 Tasks
|
Apr 22, 2026 |
|
Program Synthesis with Large Language Models
|
Apr 22, 2026 |
|
Test-time Scaling for Multi-Agent Collaborative Reasoning
|
Apr 22, 2026 |
|
TUMIX Multi-Agent Test-Time Scaling with Tools
|
Apr 22, 2026 |
|
Benchmarking Test-Time Scaling for General LLM Agents
|
Apr 22, 2026 |
|
Distilling Multi-Agent Reasoning into a Single LLM
|
Apr 22, 2026 |
|
Efficient KV Cache Sharing for Multi-LoRA Agents
|
Apr 22, 2026 |
|
CacheBlend for Fast RAG Serving
|
Apr 22, 2026 |
|
Directly Trained Spiking DQNs for Atari
|
Apr 22, 2026 |
|
Qwen3.5-Omni Thinker-Talker for Omnimodal Streaming
|
Apr 21, 2026 |
|
The Abstraction Fallacy and AI Consciousness
|
Apr 20, 2026 |
|
ContiguousKV for Faster LLM Prefill KV Reuse
|
Apr 20, 2026 |
|
Gated Delta Networks for Long-Context Retrieval
|
Apr 20, 2026 |
|
Prefill-as-a-Service for Cross-Datacenter KV Cache
|
Apr 20, 2026 |
|
Nemotron 3 Super Hybrid Mamba-Transformer MoE
|
Apr 20, 2026 |
|
Parallelizing DeltaNet Linear Transformers over Sequence Length
|
Apr 19, 2026 |
|
Gated Linear Attention for Efficient Long Sequences
|
Apr 19, 2026 |
|
Defining AGI as Adaptive Artificial Scientist
|
Apr 19, 2026 |
|
Mamba-3 for Efficient Sequence Modeling
|
Apr 16, 2026 |
|
Linear Classifier Probes for Intermediate Layers
|
Apr 16, 2026 |
|
KVSwap for Disk-Aware Long-Context On-Device Inference
|
Apr 16, 2026 |
|
Neural Chameleons and Evading Activation Monitors
|
Apr 15, 2026 |
|
SkillsBench for Evaluating Agent Skills
|
Apr 15, 2026 |
|
Experimental Comparison of Agentic and Enhanced RAG
|
Apr 14, 2026 |
|
Learning to Reason with 13 Parameters
|
Apr 14, 2026 |
|
Memory Intelligence Agents for Deep Research
|
Apr 13, 2026 |
|
GPU-Accelerated Dynamic Quantized ANNS Graph Search
|
Apr 13, 2026 |
|
Latent Space as a New Computational Paradigm
|
Apr 12, 2026 |
|
Learning Latent Action World Models from Video
|
Apr 12, 2026 |
|
KV Cache TTL for Multi-Turn Agent Scheduling
|
Apr 12, 2026 |
|
VL-JEPA for Vision-Language Semantic Prediction
|
Apr 12, 2026 |
|
ClawBench for Real-World Online AI Agents
|
Apr 12, 2026 |
|
FengHuang for Rack-Scale LLM Inference Memory
|
Apr 12, 2026 |
|
ASI-Evolve for Data, Architectures, and RL
|
Apr 12, 2026 |
|
In-Place Test-Time Training for Transformers
|
Apr 11, 2026 |
|
Neural Computers as Learned Latent Runtimes
|
Apr 11, 2026 |
|
DRAM-Free In-Flash Computing for LLM Inference
|
Apr 10, 2026 |
|
Computation-Bandwidth-Memory Trade-offs for AI Infrastructure
|
Apr 10, 2026 |
|
TriAttention for Efficient Long-Context KV Compression
|
Apr 08, 2026 |
|
When Spectral Gradient Updates Help Deep Learning
|
Apr 08, 2026 |
|
Memory Sparse Attention for 100M-Token Scaling
|
Apr 08, 2026 |
|
Cache Mechanism for Agent RAG Systems
|
Apr 08, 2026 |
|
Speculative Decoding in Real vLLM Serving
|
Apr 07, 2026 |
|
Real Context Size and Context Rot
|
Apr 07, 2026 |
|
MEMSEARCHER: Reinforcement Learning for LLM Memory Management
|
Apr 04, 2026 |
|
Recursive Language Models for Arbitrarily Long Prompts
|
Apr 04, 2026 |
|
Internal Safety Collapse in Frontier LLMs
|
Apr 04, 2026 |
|
Kosmos AI Scientist for Autonomous Discovery
|
Apr 04, 2026 |
|
QVCache for Semantic Caching in ANN Search
|
Apr 04, 2026 |
|
FlatAttention for Tile-Based Accelerator Inference
|
Apr 04, 2026 |
|
IMO-Bench for Robust Mathematical Reasoning
|
Apr 04, 2026 |
|
Batch-Aware Expert Routing for Faster MoE Decoding
|
Apr 03, 2026 |
|
CXL Computational Memory Offloading for Lower Runtime
|
Apr 03, 2026 |
|
OOD Shifts Make LLM Representations Sparser
|
Apr 02, 2026 |
|
Meta-Harness and the Power of LLM Plumbing
|
Apr 02, 2026 |
|
Emergent Social Risks in Multi-Agent Systems
|
Apr 02, 2026 |
|
AI Agent Traps and Prompt Injection
|
Apr 02, 2026 |
|
Simple Self-Distillation for Better Code Generation
|
Apr 02, 2026 |
|
MetaClaw: Just Talk and Continual Agent Adaptation
|
Mar 31, 2026 |
|
MAML and the Basics of Meta-Learning
|
Mar 30, 2026 |
|
Doc-to-LoRA: Internalizing Context as LoRA
|
Mar 30, 2026 |
|
Agentic AI and the Next Intelligence Explosion
|
Mar 29, 2026 |
|
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
|
Mar 28, 2026 |
|
Splitwise: Phase-Split LLM Inference
|
Mar 28, 2026 |
|
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
|
Mar 27, 2026 |
|
HyperAgents and Metacognitive Self-Improvement
|
Mar 27, 2026 |
|
LeWorldModel: Stable Joint-Embedding World Models from Pixels
|
Mar 26, 2026 |
|
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
|
Mar 26, 2026 |
|
Language Models are Injective and Hence Invertible
|
Mar 25, 2026 |
|
Jet-Nemotron and PostNAS for Faster Long Context
|
Mar 25, 2026 |
|
Speculative Speculative Decoding
|
Mar 25, 2026 |
|
Lookahead Q-Cache for Consistent KV Eviction
|
Mar 25, 2026 |
|
New Site Features: Conferences Tracking and How to Support the Show
|
Mar 20, 2026 |
|
Accelerating LLM Cold Starts with Programmable Page Cache
|
Mar 19, 2026 |
|
SolidAttention: Co-Designing Sparse Attention and SSD I/O
|
Mar 19, 2026 |
|
Generative File Systems: Replacing Code with Formal Specifications
|
Mar 19, 2026 |
|
Xerxes: CXL 3.0 Simulation for Scalable Memory Systems
|
Mar 19, 2026 |
|
Optimizing Mixture of Block Attention Through Statistical Theory
|
Mar 19, 2026 |
|
Predicting Task Performance with Context-aware Scaling Laws
|
Mar 17, 2026 |
|
Model-Aware Tokenizer Transfer for Multilingual LLMs
|
Mar 17, 2026 |
|
xLLM: Co-Locating Online and Offline LLM Inference
|
Mar 17, 2026 |
|
Memory in the Age of AI Agents: Forms, Functions, Dynamics
|
Mar 17, 2026 |
|
Qwen3Guard: Streaming Three-Way Safety Classification for LLMs
|
Mar 17, 2026 |
|
Bidaw: Computation-Storage Aware KV Caching for LLMs
|
Mar 17, 2026 |
|
New Voices, Same Nerds: The Kokoro TTS Episode
|
Mar 17, 2026 |
|
Rethinking Residual Connections: Learned Attention Across Transformer Depth
|
Mar 17, 2026 |
|
CacheSlide: Position-Aware KV Cache Reuse for Agent LLMs
|
Mar 17, 2026 |
|
The Art of Scaling Reinforcement Learning Compute for LLMs
|
Mar 16, 2026 |
|
LLM Agents Reason About Code Without Running It
|
Mar 15, 2026 |
|
Emergent Cooperation in Self-Interested Multi-Agent AI
|
Mar 13, 2026 |
|
50x KV Cache Compression in Seconds via Attention Matching
|
Mar 12, 2026 |
|
Gradient Descent at Inference Time for LLM Reasoning
|
Mar 11, 2026 |
|
MalGEN: Multi-Agent AI for Red Teaming Malware
|
Mar 08, 2026 |
|
Structured State Space Duality Unifies Transformers and SSMs
|
Mar 07, 2026 |
|
DualPath Breaks Storage Bandwidth Bottleneck in Agentic Inference
|
Mar 07, 2026 |
|
Why CARTRIDGE Works: Keys as Routers in KV Caches
|
Mar 07, 2026 |
|
Systematic Characterization of LLM Inference on GPUs
|
Mar 07, 2026 |
|
Tokenization Bias: The Hidden Flaw Breaking Language Models
|
Mar 07, 2026 |
|
NVIDIA Nemotron 3 Hybrid SSM Transformer Architecture
|
Mar 07, 2026 |
|
We're Open Source! New Home, Visualizations, and How to Shape Our Queue
|
Mar 06, 2026 |
|
FlashAttention-4 Conquers Asymmetric GPU Hardware Scaling
|
Mar 06, 2026 |
|
We're Open Source! New Home, Visualizations, and How to Shape Our Queue
|
Mar 06, 2026 |
|
Parallel Token Prediction: From ProphetNet to Dependent Multi-Token Generation
|
Mar 04, 2026 |
|
FlashOptim: Optimizers for Memory Efficient Training
|
Mar 02, 2026 |
|
Cognizant - New Work, New World 2026
|
Mar 01, 2026 |
|
Regular Fourier Features for Nonstationary Gaussian Processes
|
Mar 01, 2026 |
|
Unified Latents (UL): How to train your latents
|
Feb 28, 2026 |
|
QuantSpec: Hierarchical KV Cache for Self-Speculative Decoding
|
Feb 28, 2026 |
|
MatFormer: Nested Transformer for Elastic Inference
|
Feb 28, 2026 |
|
MagicDec: Breaking Latency-Throughput Tradeoffs via KV-Compressed Speculative Decoding
|
Feb 28, 2026 |
|
KV selection algorithms: static (SnapKV) Vs dynamic (PQCache)
|
Feb 28, 2026 |
|
Fast Inference from Transformers via Speculative Decoding
|
Feb 28, 2026 |
|
EAGLE: Evolution of Lossless Acceleration for LLM Inference
|
Feb 28, 2026 |
|
CXL-SpecKV: Bridging the LLM Memory Wall with Speculative FPGA Disaggregation
|
Feb 28, 2026 |
|
Building Production-Ready Speculative Decoding with TensorRT-LLM
|
Feb 28, 2026 |
|
Apple's Speculative Streaming: Fast LLM Inference without Auxiliary Models
|
Feb 28, 2026 |
|
Apple's Mirror Speculative Decoding: Parallel LLM Inference via Heterogeneous Accelerators
|
Feb 28, 2026 |
|
Adaptive Control for Batched Speculative Decoding in LLM Serving
|
Feb 28, 2026 |
|
Taming the Long-Tail: Efficient Reasoning RL with Adaptive Drafters
|
Feb 26, 2026 |
|
Optimizing Verification and Efficiency in Multi-Draft Speculative Decoding
|
Feb 26, 2026 |
|
MEDUSA: Parallel Decoding Heads for Accelerated LLM Inference
|
Feb 26, 2026 |
|
Measuring LLM Reasoning Effort via Deep-Thinking Tokens
|
Feb 26, 2026 |
|
Measuring AI Ability to Complete Long Tasks
|
Feb 26, 2026 |
|
FastGRPO: Concurrency-Aware Speculative Decoding for Policy Optimization
|
Feb 26, 2026 |
|
Evaluating Collective Behaviour of Hundreds of LLM Agents
|
Feb 26, 2026 |
|
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
|
Feb 26, 2026 |
|
Deep Learning Frameworks for Robust Quadrupedal Locomotion
|
Feb 26, 2026 |
|
Advancements in Efficient KV Cache Quantization and Management
|
Feb 26, 2026 |
|
Accelerating Large Language Model Decoding
with Speculative Sampling
|
Feb 26, 2026 |
|
NeurIPS 2025: Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
|
Feb 25, 2026 |
|
FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional
|
Feb 25, 2026 |
|
FAST26: CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving
|
Feb 25, 2026 |
|
Cortex: Semantic Knowledge Caching for Low-Latency LLM Agents
|
Feb 25, 2026 |
|
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
|
Feb 25, 2026 |
|
1.3 Billion Agents by 2028: The $50 Billion Boom and the Hidden Enterprise Crisis
|
Feb 25, 2026 |
|
Zero-Shot Context Generalization in Reinforcement
Learning from Few Training Contexts
|
Feb 24, 2026 |
|
Superintelligent Agents Pose Catastrophic Risks:
Can Scientist AI Offer a Safer Path?
|
Feb 24, 2026 |
|
PPDS: Achieving Persona Consistency Through Large-Scale Dialogue Data Engineering
|
Feb 24, 2026 |
|
Probing Scientific General Intelligence of LLMs
with Scientist-Aligned Workflows
|
Feb 24, 2026 |
|
Personalized Dialogue Generation via Persona-Adaptive Attention
|
Feb 24, 2026 |
|
PersonaPKT: Parameter-Efficient Knowledge Transfer for Personalized Dialogue Agents
|
Feb 24, 2026 |
|
Persona Vectors: monitoring and controlling character traits on LLMs
|
Feb 24, 2026 |
|
Machine Learning for Electrophysiological Phenotyping of Schizophrenia and Bipolar Disorder
|
Feb 24, 2026 |
|
Evaluating LLM Embeddings for Psychometric Personality Prediction
|
Feb 24, 2026 |
|
Bloom: an open source tool for automated behavioral evaluations
|
Feb 24, 2026 |
|
AI and the Decline of Entry-Level Employment
|
Feb 24, 2026 |
|
2019 UNILM: Unified Language Model Pre-training for NLU and NLG
|
Feb 24, 2026 |
|
Procgen Benchmark: Measuring Generalization in Reinforcement Learning
|
Feb 20, 2026 |
|
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
|
Feb 20, 2026 |
|
GLM-5: Transitioning from Vibe Coding to Agentic Engineering
|
Feb 20, 2026 |
|
Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training
|
Feb 20, 2026 |
|
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
|
Feb 20, 2026 |
|
A 2024 Survey Analyzing Generalization in
Deep Reinforcement Learning
|
Feb 20, 2026 |
|
Voxtral Realtime: Native Streaming ASR with Sub-Second Latency
|
Feb 17, 2026 |
|
The Endless Gym: Training Terminal Agents
|
Feb 17, 2026 |
|
Teaching Models to Teach Themselves via Stepping Stone Curricula
|
Feb 17, 2026 |
|
Moltbook: The Heartbeat of Autonomy: Fingerprinting Human Influence in AI Societies
|
Feb 17, 2026 |
|
Jet-RL: Stable On-Policy Reinforcement Learning with Unified FP8 Flow
|
Feb 17, 2026 |
|
Information Bottleneck-based Causal Attention for Medical Image Recognition
|
Feb 17, 2026 |
|
Intelligent AI Delegation
|
Feb 17, 2026 |
|
DeepVerifier: Self-Evolving Research Agents via Rubric-Guided Verification
|
Feb 17, 2026 |
|
Advancing Mechanistic Interpretability with Sparse Autoencoders
|
Feb 17, 2026 |
|
Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
|
Feb 17, 2026 |
|
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models
|
Feb 13, 2026 |
|
Sapient Intelligence: Hierarchical Reasoning Model
|
Feb 11, 2026 |
|
Reinforced Attention Learning
|
Feb 11, 2026 |
|
LongCat: Scaling Embeddings Outperforms Scaling Experts in Language Models
|
Feb 11, 2026 |
|
DR. KERNEL: Reinforcement Learning for Optimized Triton Kernel Generation
|
Feb 11, 2026 |
|
Dario Amodei: The Adolescence of Technology
|
Feb 11, 2026 |
|
Dario Amodei: Machines of Loving Grace
|
Feb 11, 2026 |
|
ChunkKV: Semantic-Preserving KV Cache
Compression for Efficient Long-Context LLM
Inference
|
Feb 11, 2026 |
|
Advances in Attention Distillation for Efficient Transformer Models
|
Feb 11, 2026 |
|
A Comprehensive Survey of Mixture-of-Experts:
Algorithms, Theory, and Applications
|
Feb 11, 2026 |
|
Towards a Science of Scaling Agent Systems
|
Feb 09, 2026 |
|
Uncertainty-aware genomic deep learning with knowledge distillation
|
Feb 06, 2026 |
|
Reinforcement Learning via Self-Distillation
|
Feb 06, 2026 |
|
On-Policy Self-Distillation for Advanced LLM Reasoning
|
Feb 06, 2026 |
|
Moloch’s Bargain: Market Incentives and the Rise of AI Misalignment
|
Feb 06, 2026 |
|
Knowledge distillation to context distillation
|
Feb 06, 2026 |
|
Distilling GNN Knowledge into Non-Neural Cell Graph Student Models
|
Feb 06, 2026 |
|
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
|
Feb 06, 2026 |
|
Claude Opus 4.6 Technical Report and Agent Capabilities
|
Feb 06, 2026 |
|
Advancing regulatory variant effect prediction with AlphaGenome
|
Feb 06, 2026 |
|
2006 Model Compression: Ensembles
|
Feb 06, 2026 |
|
2015: Distilling the Knowledge in a Neural Network
|
Feb 06, 2026 |
|
Reasoning Models Generate Societies of Thought
|
Jan 28, 2026 |
|
NeurIPS 2025: L2M: Mutual Information Scaling Law for Long-Context Language Modeling
|
Jan 28, 2026 |
|
Long context: Dichotomy of Findings & Status of Research
|
Jan 28, 2026 |
|
Keel: Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
|
Jan 28, 2026 |
|
SLDAgent: Evolutionary Discovery of Superhuman AI Scaling Laws
|
Jan 26, 2026 |
|
Sequoia Capital: AGI is here
|
Jan 24, 2026 |
|
OpenAI: Scaling PostgreSQL to 800 Million ChatGPT Users
|
Jan 24, 2026 |
|
Agentic Reasoning for Large Language Models: A Comprehensive Roadmap
|
Jan 24, 2026 |
|
MEMRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic
|
Jan 23, 2026 |
|
Google: R&D inference value on HBF + PNM + low latency interconnect
|
Jan 23, 2026 |
|
Storage-next: Do We Need New Hardware for AI Storage, or Just Better Layouts?
|
Jan 21, 2026 |
|
Meta's solution to massive DLRM inference through software defined memory
|
Jan 21, 2026 |
|
LeCun's AMI Energy-Based Models and the Path to Autonomous Intelligence
|
Jan 21, 2026 |
|
Process Reward Learning for LLM Reasoning Optimization
|
Jan 19, 2026 |
|
Parallel Context-of-Experts Decoding for Efficient RAG reasoning
|
Jan 19, 2026 |
|
OpenRouter 2025 Report: Analysis of Global LLM Usage Patterns
|
Jan 19, 2026 |
|
Multidimensional Safety Evaluation of Frontier AI Models
|
Jan 19, 2026 |
|
MemoBrain: Executive Memory for Tool-Augmented Reasoning Agents
|
Jan 19, 2026 |
|
MATTRL: Collaborative Test-Time Reinforcement Learning for Multi-Agent Reasoning
|
Jan 19, 2026 |
|
How to make LLMs better at sci-fi writing
|
Jan 19, 2026 |
|
H-net: End-to-End Hierarchical Sequence Modeling via Dynamic Chunking
|
Jan 19, 2026 |
|
GLM-Image: Towards
Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
|
Jan 19, 2026 |
|
CompassMem: Event-Centric Logic Maps for Agent Memory
|
Jan 19, 2026 |
|
A Chain of Thought reasoning academic brawl
|
Jan 19, 2026 |
|
Squisher: Approximating the Fisher Information Matrix and use cases
|
Jan 17, 2026 |
|
Scaling laws: long context length and in context learning
|
Jan 17, 2026 |
|
NVIDIA: TTT-E2E: Unlocking Long-Context Learning via End-to-End Test-Time Training
|
Jan 17, 2026 |
|
Attention with a bias
|
Jan 17, 2026 |
|
DeepSeek Engram: Scaling Large Language Models via Conditional Memory Lookup
|
Jan 14, 2026 |
|
PageANN: Scalable Disk ANNS with Page-Aligned Graphs
|
Dec 07, 2025 |
|
NeurIPS 2025: Homogeneous Keys, Heterogeneous Values
|
Dec 04, 2025 |
|
NeurIPS 2025: MoBA: Mixture of Block Attention for Long-Context LLMs
|
Nov 29, 2025 |
|
NeurIPS 2025: Reward Reasoning Model
|
Nov 29, 2025 |
|
NeurIPS 2025: Parallel Scaling Law for Language Models
|
Nov 29, 2025 |
|
NeurIPS 2025: Thinkless: LLM Learns When to Think
|
Nov 29, 2025 |
|
NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
|
Nov 29, 2025 |
|
NeurIPS 2025: Self-Adapting Language Models
|
Nov 29, 2025 |
|
NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example
|
Nov 29, 2025 |
|
NeurIPS 2025: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
|
Nov 29, 2025 |
|
NeurIPS 2025: Large Language Diffusion Models
|
Nov 29, 2025 |
|
NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces
|
Nov 29, 2025 |
|
NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
|
Nov 29, 2025 |
|
NeurIPS 2025: FlashBias: Fast Computation of Attention with Bias
|
Nov 29, 2025 |
|
NeurIPS 2025: A-Mem: Agentic Memory for LLM Agents
|
Nov 29, 2025 |
|
Anthropic: Disrupting the First AI-Orchestrated Cyber Espionage Campaign
|
Nov 27, 2025 |
|
Neuromorphic computing: Brain-Inspired AI and Hardware
|
Nov 22, 2025 |
|
DeepSeek-OCR: Contexts Optical Compression
|
Nov 22, 2025 |
|
Anthropic: reward hacking & misalignment & sabotage
|
Nov 22, 2025 |
|
Meta: SAM 3
|
Nov 20, 2025 |
|
vAttention Vs Strata: advanced GPU memory management
|
Nov 19, 2025 |
|
MLP Mixer Models
|
Nov 19, 2025 |
|
Mixture-of-Depths: Dynamic Compute Allocation in Transformers
|
Nov 19, 2025 |
|
Mamba-360: State Space Models for Long Sequence Modeling
|
Nov 19, 2025 |
|
Marin: Open LLM Optimization & Diagnostics
|
Nov 19, 2025 |
|
AMD: Instella: Fully Open Language Models with Stellar Performance
|
Nov 16, 2025 |
|
Mechanistic interpretability: Decoding the AI's Inner Logic: Circuits and Sparse Features
|
Nov 15, 2025 |
|
Vidur: Simulation for Efficient LLM Inference Deployment
|
Nov 10, 2025 |
|
SYMPHONY: Memory Management for LLM Multi-Turn Inference
|
Nov 10, 2025 |
|
Tempo: SLO-Aware LLM Serving Maximizing Service Gain
|
Nov 10, 2025 |
|
Spectral Gap: Analysis of Attention Layers and Graph Transformers
|
Nov 10, 2025 |
|
Random Walk Methods for Graph Learning and Networks
|
Nov 10, 2025 |
|
Metacognition and Skill Discovery in LLM Math Reasoning
|
Nov 10, 2025 |
|
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow
|
Nov 10, 2025 |
|
DSPy and TextGrad: Compiling Language Model Systems
|
Nov 10, 2025 |
|
Doubly Stochastic Attention for Transformers
|
Nov 10, 2025 |
|
Continuous Autoregressive Language Models: CALM
|
Nov 10, 2025 |
|
Context Distillation for Language Models
|
Nov 10, 2025 |
|
Confucius: Intent-Driven Network Management with Multi-Agent LLMs
|
Nov 10, 2025 |
|
CARTRIDGE: Efficient In-Context Learning via Distillation
|
Nov 10, 2025 |
|
AlphaEvolve: Mathematical Discovery at Scale
|
Nov 10, 2025 |
|
AdaFlow: Variance-Adaptive Flow-Based Imitation Learning
|
Nov 10, 2025 |
|
A Framework for LLM Application Safety Evaluation
|
Nov 10, 2025 |
|
SuperBPE: Space Travel for Language Models
|
Nov 04, 2025 |
|
Samsung: zFLoRA: Zero-Latency Fused Low-Rank Adapters
|
Nov 04, 2025 |
|
PolicySmith: Automated Systems Heuristic Generation via LLMs
|
Nov 04, 2025 |
|
MorphKV: Constant-Sized KV Caches for LLM Inference
|
Nov 04, 2025 |
|
HALoS: Hierarchical Asynchronous LLM Training over Slow Networks
|
Nov 04, 2025 |
|
Gumbel-Softmax for Differentiable Categorical Reparameterization and Selective Networks
|
Nov 04, 2025 |
|
Google: Supervised Reinforcement Learning for Step-wise Reasoning in LLMs
|
Nov 04, 2025 |
|
Anchored Diffusion Language Model: Superior Generation and Reasoning
|
Nov 04, 2025 |
|
RetNet: Retentive Networks: Transformer Successor for Large Language Models
|
Nov 02, 2025 |
|
Kimi Linear: Efficient Expressive Attention Architecture
|
Nov 02, 2025 |
|
ALiBi: Attention with Linear Biases Enables Length Extrapolation
|
Nov 01, 2025 |
|
Small Versus Large Models for Requirements Classification
|
Oct 31, 2025 |
|
Quest: Query-Aware Sparsity for Efficient LLM Inference
|
Oct 31, 2025 |
|
Hyper-Scaling LLM Inference with KV Cache Compression
|
Oct 31, 2025 |
|
Flash-LLM: Efficient LLM Inference with Unstructured Sparsity on Tensor Cores
|
Oct 31, 2025 |
|
ELASTIC: Linear Attention for Sequential Interest Compression
|
Oct 31, 2025 |
|
Architectural Scaling Laws for Efficient LLMs
|
Oct 31, 2025 |
|
Anthropic: Introspective Awareness in LLMs
|
Oct 31, 2025 |
|
TxGNN: Foundation Model for Zero-Shot Drug Repurposing
|
Oct 29, 2025 |
|
Sentence-BERT: Siamese Networks for Sentence Embeddings
|
Oct 29, 2025 |
|
ATTENTION2D and lean attention: Distributed Self-Attention
|
Oct 29, 2025 |
|
The Free Transformer: VAE Extension for Decoders
|
Oct 26, 2025 |
|
Survey of Emerging Topics in AI and Robotics
|
Oct 26, 2025 |
|
Stuck in the Matrix: LLM Spatial Reasoning
|
Oct 26, 2025 |
|
Structural Understanding of LLM Overthinking
|
Oct 26, 2025 |
|
Strata: Efficient Hierarchical Context Caching for LLM Serving
|
Oct 26, 2025 |
|
STAR: Sub-Entry Sharing TLB for Multi-Instance GPU Efficiency
|
Oct 26, 2025 |
|
Ring-linear: Efficient Hybrid Architecture for Long-Context Reasoning
|
Oct 26, 2025 |
|
Open-o3 Video: Spatio-Temporal Grounded Reasoning
|
Oct 26, 2025 |
|
MASA: Meta-Awareness via Self-Alignment Reinforcement Learning
|
Oct 26, 2025 |
|
Lp-Reg: Low-Probability Tokens Sustain RL Exploration
|
Oct 26, 2025 |
|
LLMs Learning from Verbal Feedback Without Scalar Rewards
|
Oct 26, 2025 |
|
LithOS: Operating System for Efficient GPU Machine Learning
|
Oct 26, 2025 |
|
LLM-Empowered Knowledge Graph Construction: A Survey
|
Oct 26, 2025 |
|
LFM2-8B-A1B: Efficient On-Device Mixture-of-Experts
|
Oct 26, 2025 |
|
Latent Constituency in Humans and LLMs
|
Oct 26, 2025 |
|
Introducing MTEB v2: Multimodal Embedding Evaluation
|
Oct 26, 2025 |
|
Internal Mechanisms of a Large Language Model
|
Oct 26, 2025 |
|
GigaBrain-0: World Model-Powered Generalist Robots
|
Oct 26, 2025 |
|
FlashAttention: IO-Aware Fast and Memory-Efficient Attention
|
Oct 26, 2025 |
|
Cognitive Impact of AI and Search on Essay Writing
|
Oct 26, 2025 |
|
Cattell–Horn–Carroll Theory of Intelligence
|
Oct 26, 2025 |
|
Structural Understanding of LLM Overthinking
|
Oct 22, 2025 |
|
RoBERTa: Robustly Optimized BERT Pretraining Approach
|
Oct 22, 2025 |
|
REFRAG: v2 paper: Efficient RAG Decoding via Context Compression
|
Oct 22, 2025 |
|
RAG-Anything: Unified Multimodal Knowledge Retrieval Framework
|
Oct 22, 2025 |
|
LLM-Guided Hierarchical Retrieval: The LATTICE Framework
|
Oct 22, 2025 |
|
LightMem: Lightweight Efficient Memory-Augmented Generation
|
Oct 22, 2025 |
|
Inheritune: Efficient LLM Training via Attention Collapse
|
Oct 22, 2025 |
|
In-Context Learning as Implicit Learning Algorithms
|
Oct 22, 2025 |
|
EssenceBench: Compressing LLM Benchmarks via Redundancy and Genetic Algorithm
|
Oct 22, 2025 |
|
Elastic-Cache: Adaptive KV Caching for Diffusion LLMs
|
Oct 22, 2025 |
|
Dr.LLM: Dynamic Layer Routing in LLMs
|
Oct 22, 2025 |
|
A Psychometric Framework for Artificial General Intelligence
|
Oct 22, 2025 |
|
Mojo: Performance-Portable HPC Kernels on GPUs
|
Oct 18, 2025 |
|
Geometric Flows of Logic in LLM Representation Space
|
Oct 18, 2025 |
|
Scaling Reinforcement Learning Compute for LLMs
|
Oct 17, 2025 |
|
HBF: High Bandwidth Flash for AI Inferencing
|
Oct 15, 2025 |
|
COPA: Composable On-Package GPU Architecture for Domain Specialization
|
Oct 15, 2025 |
|
Architectural Migration to Multi-head Latent Attention
|
Oct 15, 2025 |
|
UniVideo: Unified Video Understanding, Generation, and Editing
|
Oct 11, 2025 |
|
Training-Free GRPO: Policy Optimization via Context Space
|
Oct 11, 2025 |
|
RAND: Securing AI Model Weights: Preventing Theft and Misuse
|
Oct 11, 2025 |
|
Performance of Confidential Computing for Large Language Models
|
Oct 11, 2025 |
|
Multi-Agent Tool-Integrated Policy Optimization (MATPO)
|
Oct 11, 2025 |
|
Google: Confidential Computing with Accelerated AI Workloads on GCE
|
Oct 11, 2025 |
|
AWS: Nitro System: Security, Enclaves, and Generative AI
|
Oct 11, 2025 |
|
Anthropic: Confidential Inference via Trusted Virtual Machines
|
Oct 11, 2025 |
|
Samsung: Less is More: Recursive Reasoning with Tiny Networks
|
Oct 10, 2025 |
|
Petri: Accelerating AI Safety Auditing
|
Oct 10, 2025 |
|
Low-Precision Transformer Failure in Flash Attention
|
Oct 10, 2025 |
|
Early Experience for Language Agent Improvement
|
Oct 10, 2025 |
|
Dragon Hatchling: Brain-Inspired AI Architecture
|
Oct 10, 2025 |
|
CLUE: Hidden-State Clustering for Non-parametric Verification
|
Oct 10, 2025 |
|
Agentic Context Engineering: Evolving Contexts for Self-Improving LLMs
|
Oct 10, 2025 |
|
AGENTFLOW: In-the-Flow Agentic System Optimization
|
Oct 10, 2025 |
|
Test-Time Reinforcement Learning for LLMs
|
Oct 08, 2025 |
|
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
|
Oct 08, 2025 |
|
Reactive Transformer: Stateful Real-Time Language Models
|
Oct 08, 2025 |
|
RECAP: Safety Alignment via Counter-Aligned Prefilling
|
Oct 08, 2025 |
|
Paris: Decentralized Open-Weight Diffusion Model
|
Oct 08, 2025 |
|
ONNX Ecosystem, Optimization, and Deployment
|
Oct 08, 2025 |
|
NIST Evaluation of DeepSeek AI Models
|
Oct 08, 2025 |
|
MotionRAG: Retrieval-Augmented Image-to-Video Generation
|
Oct 08, 2025 |
|
LongCodeZip: Compress Long Code Context for LLMs
|
Oct 08, 2025 |
|
Imperceptible Jailbreaking Against Large Language Models
|
Oct 08, 2025 |
|
Implicit Dynamics of In-Context Learning
|
Oct 08, 2025 |
|
GNN101: Visual Learning of Graph Neural Networks
|
Oct 08, 2025 |
|
Emergent Abilities of Large Language Models
|
Oct 08, 2025 |
|
DC-VideoGen: Efficient Video Generation with Deep Compression
|
Oct 08, 2025 |
|
Contextual Blocks: Implicit Weight Updates and Federated Learning
|
Oct 08, 2025 |
|
CoDA: Collaborative Multi-Agent Data Visualization
|
Oct 08, 2025 |
|
Analog In-Memory Attention for Energy-Efficient LLMs
|
Oct 08, 2025 |
|
ACON: Optimizing Context Compression for LLM Agents
|
Oct 08, 2025 |
|
Regression Language Models for Code Metrics
|
Oct 03, 2025 |
|
Introducing RTEB: Retrieval Embedding Benchmark
|
Oct 03, 2025 |
|
CUDA Unified Memory and Heterogeneous Memory Management
|
Oct 02, 2025 |
|
Moravec's Paradox and AI Automation Limits
|
Oct 01, 2025 |
|
Characterizing LLM KV Cache Workloads in Production
|
Oct 01, 2025 |
|
BurstGPT: A Real-World LLM Serving Workload Dataset
|
Oct 01, 2025 |
|
Qwen3-Next & Qwen3-Omni technical report
|
Sep 30, 2025 |
|
Variational Reasoning Framework for Language Models
|
Sep 29, 2025 |
|
Schoenfeld Theory Applied to Large Reasoning Models
|
Sep 27, 2025 |
|
Federated Learning with Soft Embeddings for Retrieval
|
Sep 27, 2025 |
|
CWM: Code Generation with World Models
|
Sep 27, 2025 |
|
Tree-based Group Policy Optimization for LLM Agents
|
Sep 26, 2025 |
|
Seedream 4.0: Multimodal Image Generation System
|
Sep 26, 2025 |
|
GDPval: Measuring AI Performance on Real-World Work
|
Sep 26, 2025 |
|
EmbeddingGemma: Powerful Lightweight Text Representations
|
Sep 26, 2025 |
|
CE-GPPO: Controlling Entropy via Gradient-Preserving Policy Optimization
|
Sep 26, 2025 |
|
LLM-I: Interleaved Multimodal Creators via Tool-Use
|
Sep 20, 2025 |
|
Adaptive Compression Techniques for Efficient LLM Inference
|
Sep 20, 2025 |
|
THOR: Hierarchical RL for Mathematical Reasoning
|
Sep 19, 2025 |
|
The Uneven Diffusion of AI Adoption
|
Sep 19, 2025 |
|
Single-stream Policy Optimization for LLMs
|
Sep 19, 2025 |
|
SearchInstruct: Instruction Tuning with Dynamic Retrieval
|
Sep 19, 2025 |
|
FlowRL: Distribution Matching for LLM Reasoning
|
Sep 19, 2025 |
|
Evolving Language Models Without Labels: EVOL-RL
|
Sep 19, 2025 |
|
REFRAG: Rethinking RAG-based Decoding
|
Sep 18, 2025 |
|
Pre-computing & reusing KV caches to accelerate RAG inference
|
Sep 18, 2025 |
|
DeepSeek-R1: Reinforcing LLM Reasoning Through Self-Evolution
|
Sep 18, 2025 |
|
WebSailor-V2: Bridging Proprietary Agents with Synthetic Data and RL
|
Sep 17, 2025 |
|
TailorKV: Hybrid KV Cache Compression for LLMs
|
Sep 17, 2025 |
|
ShadowKV: High-Throughput Long-Context LLM Inference
|
Sep 17, 2025 |
|
QuantAgent: Multi-Agent LLM for High-Frequency Trading
|
Sep 17, 2025 |
|
MIRAGE: Optimizing LLM KV Cache with Parameter Remapping
|
Sep 17, 2025 |
|
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning
|
Sep 17, 2025 |
|
Infini-gram: Scaling Unbounded N-gram Language Models
|
Sep 17, 2025 |
|
Dynamic Chunking for Hierarchical Sequence Modeling
|
Sep 17, 2025 |
|
UQ: Unsolved Questions for Language Models
|
Sep 16, 2025 |
|
PETALS: Collaborative Large Language Model Inference and Fine-tuning
|
Sep 16, 2025 |
|
Non-Penetrative Tensor Partitioning for Collaborative AIoT Inference
|
Sep 16, 2025 |
|
Native Sparse Attention: Efficient Long-Context LLMs
|
Sep 16, 2025 |
|
Janus-Pro: Unified Multimodal AI with Scaled Improvements
|
Sep 16, 2025 |
|
Hierarchical Reasoning Model: Brain-Inspired AI for Complex Tasks
|
Sep 16, 2025 |
|
Generalist Reward Modeling with Inference-Time Scaling
|
Sep 16, 2025 |
|
Federated Post-Training LLMs: An Accessibility and Efficiency Survey
|
Sep 16, 2025 |
|
Collaborative Edge Inference with Dynamic Task Offloading and Early Exiting
|
Sep 16, 2025 |
|
CodeI/O: Reasoning Patterns Through Code Input-Output Prediction
|
Sep 16, 2025 |
|
Adaptive LLM Partitioning for Edge Inference
|
Sep 16, 2025 |
|
The Illusion of Diminishing Returns in LLM Execution
|
Sep 15, 2025 |
|
PyTorch FSDP: Scaling Fully Sharded Data Parallel
|
Sep 15, 2025 |
|
MetaGraph: knowledge graphs from financial NLP
|
Sep 15, 2025 |
|
Hallucination to Truth: A Review of Fact-Checking and Factuality
Evaluation in Large Language Model
|
Sep 15, 2025 |
|
HybridServe: Efficient LLM Inference with Hybrid Caching
|
Sep 15, 2025 |
|
GraphSAGE: Inductive Representation Learning on Large Graphs
|
Sep 15, 2025 |
|
FlexGen: High-Throughput LLM Inference on a Single GPU
|
Sep 15, 2025 |
|
AWQ: On-Device LLM Compression and Acceleration
|
Sep 15, 2025 |
|
Llama 3: Architecture, Capabilities, and Safety
|
Sep 14, 2025 |
|
Graph Patterns of Knowledge in Large Language Models
|
Sep 14, 2025 |
|
Survey of Reinforcement Learning for Large
Reasoning Models
|
Sep 13, 2025 |
|
Statistical Methods for Generative AI Reliability
|
Sep 13, 2025 |
|
SpikingBrain: Brain-Inspired LLMs for Efficient Long-Context Processing
|
Sep 13, 2025 |
|
EntiGraph: Scaling Language Models with Synthetic Pretraining
|
Sep 13, 2025 |
|
All for One: LLMs Solve Mental Math at the Last Token
|
Sep 13, 2025 |
|
Parallel-R1: Reinforcement Learning for Parallel Thinking in LLMs
|
Sep 12, 2025 |
|
NOVELTYBENCH: Evaluating Language Model Diversity
|
Sep 12, 2025 |
|
HyperController: Fast, Stable Reinforcement Learning Hyperparameter Optimization
|
Sep 12, 2025 |
|
Explaining AI for Digital Advertising with LLMs
|
Sep 11, 2025 |
|
ByteCheckpoint: A Unified LLM Checkpointing System
|
Sep 11, 2025 |
|
AdLlama: Boosting Ad Performance with Reinforcement Learning
|
Sep 11, 2025 |
|
Mini-o3: Scaling Reasoning for Visual Search
|
Sep 10, 2025 |
|
Masked Diffusion Models: Performance and Theory
|
Sep 10, 2025 |
|
K2-Think: A Parameter-Efficient Reasoning System
|
Sep 10, 2025 |
|
INF2: Near-Storage LLM Inference for High Throughput
|
Sep 10, 2025 |
|
Darling: Reinforcing Diversity and Quality in Language Models
|
Sep 10, 2025 |
|
BLEU: Automatic Machine Translation Evaluation
|
Sep 10, 2025 |
|
AlphaEvolve: AI for Scientific and Algorithmic Discovery
|
Sep 10, 2025 |
|
TraceRL: Reinforcement Learning for Diffusion Language Models
|
Sep 09, 2025 |
|
LLM Benchmark Robustness to Linguistic Variation
|
Sep 09, 2025 |
|
Behavioral Fingerprinting of Large Language Models
|
Sep 09, 2025 |
|
Offloading LLM Models and KV Caches to NVMe SSDs
|
Sep 08, 2025 |
|
SGLang: Efficient Language Model Program Execution
|
Sep 07, 2025 |
|
OpenELM: Apple's Open Language Model Family
|
Sep 07, 2025 |
|
GPT-NeoX: Large-Scale Autoregressive Language Modeling in PyTorch
|
Sep 07, 2025 |
|
FineVision: Open Data for Computer Vision
|
Sep 07, 2025 |
|
Evaluating Large Language Models Trained on Code
|
Sep 07, 2025 |
|
Eleuther: evaluating LLMs
|
Sep 07, 2025 |
|
Democratizing AI Compute: The Modular Vision
|
Sep 07, 2025 |
|
SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence
|
Sep 06, 2025 |
|
Limitations of Embedding-Based Retrieval
|
Sep 06, 2025 |
|
MTEB & MMTEB: The Massive Text Embedding Benchmark
|
Sep 05, 2025 |
|
Inverse IFEval: Unlearning LLM Cognitive Inertia
|
Sep 05, 2025 |
|
EmbeddingGemma: On-Device AI for High-Quality Embeddings
|
Sep 05, 2025 |
|
DeepResearch Arena: Benchmarking LLMs' Research Abilities
|
Sep 05, 2025 |
|
The Rise of Physical Neural Networks
|
Sep 04, 2025 |
|
Supervised Learning in DNA Neural Networks
|
Sep 04, 2025 |
|
FastVLM: Efficient Vision Encoding for Language Models
|
Sep 04, 2025 |
|
Apertus Tech Report Overview
|
Sep 04, 2025 |
|
Scientific LLMs: A Data-Centric Survey and Roadmap
|
Sep 03, 2025 |
|
rStar2-Agent: Smarter Math Reasoning Through Agentic RL
|
Sep 03, 2025 |
|
FusionANNS: Billion-Scale ANNS with SSD and GPU
|
Sep 03, 2025 |
|
Training Recurrent Neural Networks: Vanishing and Exploding Gradients
|
Aug 27, 2025 |
|
SPAM: Stabilizing LLM Training with Spike-Aware Optimization
|
Aug 27, 2025 |
|
Pimba: Processing-in-Memory for LLM Serving
|
Aug 27, 2025 |
|
Oaken: Fast, Efficient LLM Serving with Hybrid KV Cache Quantization
|
Aug 27, 2025 |
|
AdamW: Decoupled Weight Decay Regularization for Adaptive Gradient Algorithms
|
Aug 27, 2025 |
|
Adafactor: Memory-Efficient Adaptive Learning Rates
|
Aug 27, 2025 |
|
Quantizing Diffusion LLMs: A Systematic Study
|
Aug 26, 2025 |
|
ODYSSEY: Unified Mobile Manipulation for Agile Quadruped Robots
|
Aug 26, 2025 |
|
Google: Measuring AI's Environmental Impact at Scale
|
Aug 26, 2025 |
|
ComoRAG: Cognitively Inspired Narrative Reasoning
|
Aug 26, 2025 |
|
GPT-5 Spatial Intelligence: An Empirical Study
|
Aug 24, 2025 |
|
DeepSeek-V3.1: A Hybrid AI Model with Enhanced Reasoning
|
Aug 23, 2025 |
|
Compressed Experts: Efficient MoE Model Editing
|
Aug 23, 2025 |
|
Genie 3: A New Frontier for World Models
|
Aug 22, 2025 |
|
Los Alamos: overcoming the memory wall fighting sparse memory access
|
Aug 21, 2025 |
|
Switch Transformers: Trillion Parameter Models with Sparsity
|
Aug 20, 2025 |
|
Speed Always Wins: Efficient Large Language Model Architectures
|
Aug 20, 2025 |
|
Linear Transformers: Faster Than RNNs
|
Aug 20, 2025 |
|
Self-Search Reinforcement Learning for LLMs
|
Aug 18, 2025 |
|
Diffusion Language Models: Principles, Techniques, and Applications
|
Aug 18, 2025 |
|
Continuous Batching for LLM Inference: Throughput and Latency Gains
|
Aug 18, 2025 |
|
Atom: Low-Bit Quantization for LLM Serving
|
Aug 18, 2025 |
|
The Mapped Memory Mistake: Why DBMSs Should Avoid MMAP
|
Aug 13, 2025 |
|
Scaling PostgreSQL at OpenAI: Read-Heavy Workloads and Optimizations - PGConf.dev 2025
|
Aug 13, 2025 |
|
pNFS Flex Files
|
Aug 13, 2025 |
|
NVMe Offload on Colossal AI: Breaking the GPU Memory Wall
|
Aug 13, 2025 |
|
NVIDIA GDS, BAM Vs RocM solutions
|
Aug 13, 2025 |
|
fMoE: Fine-Grained Expert Offloading for MoE Serving
|
Aug 13, 2025 |
|
ELMo-Tune-V2: LLM-Assisted Auto-Tuning for Key-Value Stores
|
Aug 13, 2025 |
|
AiSAQ: DRAM-free ANNS with Product Quantization
|
Aug 13, 2025 |
|
Qwen-Image: Generation and Editing with Precision
|
Aug 12, 2025 |
|
Mem0: Scalable Long-Term Memory for AI Agents
|
Aug 12, 2025 |
|
Chain-of-Thought Reasoning: A Brittle Mirage?
|
Aug 11, 2025 |
|
Xavier Initialization: Deep Feedforward Networks: Training Difficulties and Solutions
|
Aug 08, 2025 |
|
ZeRO-Offload: Democratizing Billion-Scale Model Training
|
Aug 08, 2025 |
|
vLLM & PagedAttention: Efficient LLM Serving with Virtual Memory
|
Aug 08, 2025 |
|
TierTrain: Proactive Memory Tiering for DNN Training
|
Aug 08, 2025 |
|
Test-Time Scaling
|
Aug 08, 2025 |
|
The Elements of Differentiable Programming
|
Aug 08, 2025 |
|
Teraio: Cost-Efficient LLM Training via Lifetime-Aware Tensor Offloading
|
Aug 08, 2025 |
|
MoE Offloaded
|
Aug 08, 2025 |
|
Shared Virtual Memory: Design and Performance Implications
|
Aug 08, 2025 |
|
Reinforcement Learning
|
Aug 08, 2025 |
|
Reinforcement Pre-Training for Language Models
|
Aug 08, 2025 |
|
PartitionedVC: Optimizing External Memory Graph Analytics
|
Aug 08, 2025 |
|
Multiagent Debate Improves Language Model Reasoning
|
Aug 08, 2025 |
|
Multi Query Attention: PaLM: Scaling Language Modeling with Pathways
|
Aug 08, 2025 |
|
MUVERA: Efficient Multi-Vector Information Retrieval
|
Aug 08, 2025 |
|
Mu: The Windows Settings Agent Language Model
|
Aug 08, 2025 |
|
Movement Pruning: Adaptive Sparsity by Fine-Tuning
|
Aug 08, 2025 |
|
Mistral 7B: Superior Performance in a Smaller Package
|
Aug 08, 2025 |
|
MetaScale: Test-Time Scaling with Evolving Meta-Thoughts
|
Aug 08, 2025 |
|
Microscaling Quantization for Large Language Models
|
Aug 08, 2025 |
|
Lost in the Middle: How Language Models Use Long Contexts
|
Aug 08, 2025 |
|
MEGABYTE: Multiscale Transformers for Million-byte Sequences
|
Aug 08, 2025 |
|
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
|
Aug 08, 2025 |
|
LoRA: Low-Rank Adaptation of Large Language Models
|
Aug 08, 2025 |
|
LMCache: Supercharging LLM Performance with KV Cache Management
|
Aug 08, 2025 |
|
KVQuant: LLM Inference with KV Cache Quantization
|
Aug 08, 2025 |
|
Kaiming Initialization and PReLU
|
Aug 08, 2025 |
|
Gemma: Google DeepMind's Open Language Models
|
Aug 08, 2025 |
|
Fast Learning for Deep Belief Networks
|
Aug 08, 2025 |
|
FP8 Quantization
|
Aug 08, 2025 |
|
FlashAttention-2: Faster Attention with Better Parallelism
|
Aug 08, 2025 |
|
DyNN-Offload: Efficient Memory for Dynamic Neural Networks
|
Aug 08, 2025 |
|
Dynamic Tanh: Transformers Without Normalization
|
Aug 08, 2025 |
|
DroidSpeak: Cross-LLM KV Cache Sharing
|
Aug 08, 2025 |
|
DiMSUM: Image Generation with Diffusion Mamba
|
Aug 08, 2025 |
|
DeepSeek Safety Concerns
|
Aug 08, 2025 |
|
Diffusion Probabilitic Models: Deep Unsupervised Learning Through Diffusion Processes
|
Aug 08, 2025 |
|
Demystifying Mamba: Architecture and Capabilities
|
Aug 08, 2025 |
|
DeepSeek-R1: Incentivizing Reasoning in LLMs
|
Aug 08, 2025 |
|
DeepSeek-V3: A Technical Report
|
Aug 08, 2025 |
|
DeepSeekMoE: Scalable Mixture-of-Experts Language Models
|
Aug 08, 2025 |
|
DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis
|
Aug 08, 2025 |
|
Concept Drift
|
Aug 08, 2025 |
|
ColBERT and ColBERT v2
|
Aug 08, 2025 |
|
CODEGEN: Open Language Model for Code Synthesis
|
Aug 08, 2025 |
|
Chain of thought
|
Aug 08, 2025 |
|
AI and the Memory Wall: Overcoming Bottlenecks
|
Aug 08, 2025 |
|
AdamW: Decoupled Weight Decay Regularization for Adaptive Gradient Algorithms
|
Aug 08, 2025 |
|
Adam: A Method for Stochastic Optimization
|
Aug 08, 2025 |
|
Transformer Scaling
|
Aug 07, 2025 |
|
Scaling Laws
|
Aug 07, 2025 |
|
ResNets - residual block
|
Aug 07, 2025 |
|
RoPE
|
Aug 07, 2025 |
|
LSTM: the forget gate
|
Aug 07, 2025 |
|
Longformer: A Transformer for Long Documents
|
Aug 07, 2025 |
|
Linformer: Efficient Self-Attention with Linear Complexity
|
Aug 07, 2025 |
|
LeNet-5: Convolutional Networks for Character Recognition
|
Aug 07, 2025 |
|
Layer Normalization and Dual Patch Normalization
|
Aug 07, 2025 |
|
Learning from repeated data
|
Aug 07, 2025 |
|
GPT4 Technical Report
|
Aug 07, 2025 |
|
GQA: Grouped Query Attention
|
Aug 07, 2025 |
|
GPT3
|
Aug 07, 2025 |
|
GPT2
|
Aug 07, 2025 |
|
GELU
|
Aug 07, 2025 |
|
Dropout
|
Aug 07, 2025 |
|
Constitutional AI: Harmlessness Through Self-Improvement
|
Aug 07, 2025 |
|
Chinchilla: Optimal Language Model Scaling
|
Aug 07, 2025 |
|
Chamfer Matching: Image Registration and Medical Applications
|
Aug 07, 2025 |
|
BART
|
Aug 07, 2025 |
|
BERT
|
Aug 07, 2025 |
|
Batch Normalization
|
Aug 07, 2025 |
|
Attention is all you need
|
Aug 07, 2025 |