AI Post Transformers

By mcgrof

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by mcgrof

Category: Technology

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast

Subscribers: 0
Reviews: 0
Episodes: 656

Description

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

Episode Date
Titans: Learning to Memorize at Test Time
May 20, 2026
The Sparsity Wall: What Reiner Pope Told Dwarkesh About MoE and Sparse Attention
May 19, 2026
Serving MoE Models with Disaggregated Expert Parallelism
May 19, 2026
Affordable Large-Scale Decoding Through Model-System Co-Design
May 19, 2026
NanoFlow and the Future of LLM Serving
May 19, 2026
Air Force One, Jensen Huang, and Anthropic's 2028 Memo
May 15, 2026
Deep Kernel Fusion for Transformer Decoding
May 15, 2026
FlashFuser and Hopper-Era FFN Kernel Fusion
May 15, 2026
Lossless Sparse Deltas for RL Networks
May 15, 2026
JANUS for Scalable MoE Inference
May 15, 2026
Scaling Laws for Multilingual Code Pretraining
May 15, 2026
Causal-JEPA for Object-Level World Models
May 15, 2026
Ministral 3: Cascade Distillation for Long-Context Multimodal Models
May 15, 2026
When Many-Shot CoT Becomes Test-Time Learning
May 15, 2026
Agentic AI as a Path to AGI
May 15, 2026
Trace Rewriting Against Unauthorized LLM Distillation
May 15, 2026
TMAS: Scaling Test-Time Compute with Multi-Agent Synergy
May 14, 2026
AI Co-Mathematician for Mathematical Research
May 14, 2026
δ-mem and Online Memory for LLMs
May 13, 2026
Qwen-Image-2.0 for Unified Generation and Editing
May 13, 2026
MiA-Signature and Global Activation for Long Context
May 13, 2026
Long Context Pre-Training with Lighthouse Attention
May 13, 2026
MELT: Decoupling Compute From Memory
May 13, 2026
ELF and Continuous Language Diffusion
May 12, 2026
ForkKV for Multi-LoRA Agent Serving
May 12, 2026
Metacognition Against Confident Hallucinations
May 12, 2026
TIDE and the Rare Token Problem
May 12, 2026
Agentic Discovery for Test-Time Scaling
May 12, 2026
Vistara Brings CXL Memory to Hyperscale
May 11, 2026
Split Personality Training Reveals Latent Knowledge
May 10, 2026
Why Transformers Fail at Counting
May 10, 2026
Optimization, Credit Assignment, and Consciousness
May 10, 2026
Reasoning Theater and Unfaithful Chain-of-Thought
May 10, 2026
RAPTOR: Stable Concept Directions From Logistic Probes
May 10, 2026
When LLM Judges Become Coin Flips
May 09, 2026
LAPS for Length-Aware LLM Serving
May 09, 2026
Generative Modeling via Drifting in One Step
May 09, 2026
EverMemOS for Long-Horizon Agent Memory
May 09, 2026
Explicit Information Transmission for Context Compression
May 09, 2026
Why LLM Serving Needs Mathematical Optimization
May 09, 2026
Can Models Learn from Long Context?
May 08, 2026
How Models Detect Hidden Activation Steering
May 08, 2026
What the Frog’s Eye Tells the Brain
May 08, 2026
Learning in Random Nets and Generalization
May 08, 2026
Backpropagation Through Time Explained
May 08, 2026
SGLang for Faster Structured LLM Programs
May 08, 2026
TensorFlow for Distributed Machine Learning Systems
May 08, 2026
Machine Learning Self-Calibrated FPGA Time-to-Digital Converter
May 08, 2026
Why LightGBM Made Boosted Trees Fast
May 07, 2026
Fast FPGA BDT Inference for LHC Triggers
May 07, 2026
Fast FPGA Inference for LHC Triggers
May 07, 2026
Boosted Decision Trees for CMS Muon Triggers
May 07, 2026
Automating DNN Compilation for FPGA Accelerators
May 07, 2026
Synchronous Data Flow for Signal Processing
May 07, 2026
Caffeine: A Unified FPGA for CNNs
May 07, 2026
Caffe and the Rise of CNN Frameworks
May 07, 2026
RFNoC SISO Processor via High-Level Synthesis
May 06, 2026
Automating CNN Mapping on Embedded FPGAs
May 06, 2026
Training Million-Token LLMs Beyond the Memory Barrier
May 04, 2026
Training LLMs for Divide-and-Conquer Reasoning
May 04, 2026
Reinforcement Learning in 2025: An Overview
May 04, 2026
PackKV Lossy Compression for KV Caches
May 04, 2026
Geometric Memory in Deep Sequence Models
May 03, 2026
Selective Classification with Deep Neural Networks
May 03, 2026
Self-Improving Pretraining With Post-Trained Models
May 03, 2026
How Induction Heads Emerge in Transformers
May 03, 2026
node2vec and Learning Graph Embeddings
May 03, 2026
DeepWalk and the Rise of Graph Embeddings
May 03, 2026
Fast Speech Recognition by Transcript Editing
May 02, 2026
Discrete Representations for Continual Reinforcement Learning
May 02, 2026
Recursive Multi-Agent Systems in Latent Space
May 02, 2026
A Practical Review of Mechanistic Interpretability
May 02, 2026
DeltaKV: Compressing KV Caches for Long Context
May 02, 2026
CacheFlow and 3D-Parallel KV Cache Restoration
May 02, 2026
Do Language Models Know Their Limits
May 02, 2026
Hopfield Networks and Transformer Attention as Memory
May 01, 2026
Sequence to Sequence Learning with Neural Networks
May 01, 2026
Bahdanau Attention for Neural Machine Translation
May 01, 2026
Convolutional Sequence to Sequence Learning for Translation
May 01, 2026
Teaching Language Models to Verbalize Uncertainty
May 01, 2026
Can LLMs Judge Their Own Capabilities?
May 01, 2026
ChartNet for Robust Multimodal Chart Understanding
May 01, 2026
Scaling Test-Time Compute for Reasoning Models
Apr 30, 2026
Breaking the Prefix Barrier with Shared KV Cache
Apr 30, 2026
Reverse-Mode Differentiation Across AD and Neural Nets
Apr 30, 2026
Stabilizing Efficient Reasoning with Step-Level Advantage Selection
Apr 29, 2026
Stochastic KV Routing for Cache Sharing
Apr 29, 2026
Deep Learning in Spiking Neural Networks
Apr 28, 2026
FPGA Neural Network Accelerators for Space
Apr 28, 2026
World-R1 Improves 3D Consistency in Text-to-Video
Apr 28, 2026
Long Short-Term Memory and Vanishing Gradients
Apr 27, 2026
Compressed Convolutional Attention in Latent Space
Apr 27, 2026
AgenticQwen and Small Industrial Tool Agents
Apr 27, 2026
Experience-Based Learning Beyond Human Data
Apr 26, 2026
Breiman's Two Cultures of Statistical Modeling
Apr 26, 2026
Muon Is Scalable for LLM Training
Apr 25, 2026
Kimi K2.5 and Visual Agent Swarms
Apr 25, 2026
ScoutAttention for Efficient KV Cache Offloading
Apr 25, 2026
DeepSeek-V4 and Practical Million-Token Context
Apr 25, 2026
Agentic Aggregation for Long-Horizon AI Tasks
Apr 23, 2026
TokenDance for Multi-Agent KV Cache Sharing
Apr 23, 2026
RetrievalAttention for Long-Context LLM Inference
Apr 22, 2026
DreamerV3 World Models Across 150 Tasks
Apr 22, 2026
Program Synthesis with Large Language Models
Apr 22, 2026
Test-time Scaling for Multi-Agent Collaborative Reasoning
Apr 22, 2026
TUMIX Multi-Agent Test-Time Scaling with Tools
Apr 22, 2026
Benchmarking Test-Time Scaling for General LLM Agents
Apr 22, 2026
Distilling Multi-Agent Reasoning into a Single LLM
Apr 22, 2026
Efficient KV Cache Sharing for Multi-LoRA Agents
Apr 22, 2026
CacheBlend for Fast RAG Serving
Apr 22, 2026
Directly Trained Spiking DQNs for Atari
Apr 22, 2026
Qwen3.5-Omni Thinker-Talker for Omnimodal Streaming
Apr 21, 2026
The Abstraction Fallacy and AI Consciousness
Apr 20, 2026
ContiguousKV for Faster LLM Prefill KV Reuse
Apr 20, 2026
Gated Delta Networks for Long-Context Retrieval
Apr 20, 2026
Prefill-as-a-Service for Cross-Datacenter KV Cache
Apr 20, 2026
Nemotron 3 Super Hybrid Mamba-Transformer MoE
Apr 20, 2026
Parallelizing DeltaNet Linear Transformers over Sequence Length
Apr 19, 2026
Gated Linear Attention for Efficient Long Sequences
Apr 19, 2026
Defining AGI as Adaptive Artificial Scientist
Apr 19, 2026
Mamba-3 for Efficient Sequence Modeling
Apr 16, 2026
Linear Classifier Probes for Intermediate Layers
Apr 16, 2026
KVSwap for Disk-Aware Long-Context On-Device Inference
Apr 16, 2026
Neural Chameleons and Evading Activation Monitors
Apr 15, 2026
SkillsBench for Evaluating Agent Skills
Apr 15, 2026
Experimental Comparison of Agentic and Enhanced RAG
Apr 14, 2026
Learning to Reason with 13 Parameters
Apr 14, 2026
Memory Intelligence Agents for Deep Research
Apr 13, 2026
GPU-Accelerated Dynamic Quantized ANNS Graph Search
Apr 13, 2026
Latent Space as a New Computational Paradigm
Apr 12, 2026
Learning Latent Action World Models from Video
Apr 12, 2026
KV Cache TTL for Multi-Turn Agent Scheduling
Apr 12, 2026
VL-JEPA for Vision-Language Semantic Prediction
Apr 12, 2026
ClawBench for Real-World Online AI Agents
Apr 12, 2026
FengHuang for Rack-Scale LLM Inference Memory
Apr 12, 2026
ASI-Evolve for Data, Architectures, and RL
Apr 12, 2026
In-Place Test-Time Training for Transformers
Apr 11, 2026
Neural Computers as Learned Latent Runtimes
Apr 11, 2026
DRAM-Free In-Flash Computing for LLM Inference
Apr 10, 2026
Computation-Bandwidth-Memory Trade-offs for AI Infrastructure
Apr 10, 2026
TriAttention for Efficient Long-Context KV Compression
Apr 08, 2026
When Spectral Gradient Updates Help Deep Learning
Apr 08, 2026
Memory Sparse Attention for 100M-Token Scaling
Apr 08, 2026
Cache Mechanism for Agent RAG Systems
Apr 08, 2026
Speculative Decoding in Real vLLM Serving
Apr 07, 2026
Real Context Size and Context Rot
Apr 07, 2026
MEMSEARCHER: Reinforcement Learning for LLM Memory Management
Apr 04, 2026
Recursive Language Models for Arbitrarily Long Prompts
Apr 04, 2026
Internal Safety Collapse in Frontier LLMs
Apr 04, 2026
Kosmos AI Scientist for Autonomous Discovery
Apr 04, 2026
QVCache for Semantic Caching in ANN Search
Apr 04, 2026
FlatAttention for Tile-Based Accelerator Inference
Apr 04, 2026
IMO-Bench for Robust Mathematical Reasoning
Apr 04, 2026
Batch-Aware Expert Routing for Faster MoE Decoding
Apr 03, 2026
CXL Computational Memory Offloading for Lower Runtime
Apr 03, 2026
OOD Shifts Make LLM Representations Sparser
Apr 02, 2026
Meta-Harness and the Power of LLM Plumbing
Apr 02, 2026
Emergent Social Risks in Multi-Agent Systems
Apr 02, 2026
AI Agent Traps and Prompt Injection
Apr 02, 2026
Simple Self-Distillation for Better Code Generation
Apr 02, 2026
MetaClaw: Just Talk and Continual Agent Adaptation
Mar 31, 2026
MAML and the Basics of Meta-Learning
Mar 30, 2026
Doc-to-LoRA: Internalizing Context as LoRA
Mar 30, 2026
Agentic AI and the Next Intelligence Explosion
Mar 29, 2026
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
Mar 28, 2026
Splitwise: Phase-Split LLM Inference
Mar 28, 2026
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
Mar 27, 2026
HyperAgents and Metacognitive Self-Improvement
Mar 27, 2026
LeWorldModel: Stable Joint-Embedding World Models from Pixels
Mar 26, 2026
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
Mar 26, 2026
Language Models are Injective and Hence Invertible
Mar 25, 2026
Jet-Nemotron and PostNAS for Faster Long Context
Mar 25, 2026
Speculative Speculative Decoding
Mar 25, 2026
Lookahead Q-Cache for Consistent KV Eviction
Mar 25, 2026
New Site Features: Conferences Tracking and How to Support the Show
Mar 20, 2026
Accelerating LLM Cold Starts with Programmable Page Cache
Mar 19, 2026
SolidAttention: Co-Designing Sparse Attention and SSD I/O
Mar 19, 2026
Generative File Systems: Replacing Code with Formal Specifications
Mar 19, 2026
Xerxes: CXL 3.0 Simulation for Scalable Memory Systems
Mar 19, 2026
Optimizing Mixture of Block Attention Through Statistical Theory
Mar 19, 2026
Predicting Task Performance with Context-aware Scaling Laws
Mar 17, 2026
Model-Aware Tokenizer Transfer for Multilingual LLMs
Mar 17, 2026
xLLM: Co-Locating Online and Offline LLM Inference
Mar 17, 2026
Memory in the Age of AI Agents: Forms, Functions, Dynamics
Mar 17, 2026
Qwen3Guard: Streaming Three-Way Safety Classification for LLMs
Mar 17, 2026
Bidaw: Computation-Storage Aware KV Caching for LLMs
Mar 17, 2026
New Voices, Same Nerds: The Kokoro TTS Episode
Mar 17, 2026
Rethinking Residual Connections: Learned Attention Across Transformer Depth
Mar 17, 2026
CacheSlide: Position-Aware KV Cache Reuse for Agent LLMs
Mar 17, 2026
The Art of Scaling Reinforcement Learning Compute for LLMs
Mar 16, 2026
LLM Agents Reason About Code Without Running It
Mar 15, 2026
Emergent Cooperation in Self-Interested Multi-Agent AI
Mar 13, 2026
50x KV Cache Compression in Seconds via Attention Matching
Mar 12, 2026
Gradient Descent at Inference Time for LLM Reasoning
Mar 11, 2026
MalGEN: Multi-Agent AI for Red Teaming Malware
Mar 08, 2026
Structured State Space Duality Unifies Transformers and SSMs
Mar 07, 2026
DualPath Breaks Storage Bandwidth Bottleneck in Agentic Inference
Mar 07, 2026
Why CARTRIDGE Works: Keys as Routers in KV Caches
Mar 07, 2026
Systematic Characterization of LLM Inference on GPUs
Mar 07, 2026
Tokenization Bias: The Hidden Flaw Breaking Language Models
Mar 07, 2026
NVIDIA Nemotron 3 Hybrid SSM Transformer Architecture
Mar 07, 2026
We're Open Source! New Home, Visualizations, and How to Shape Our Queue
Mar 06, 2026
FlashAttention-4 Conquers Asymmetric GPU Hardware Scaling
Mar 06, 2026
We're Open Source! New Home, Visualizations, and How to Shape Our Queue
Mar 06, 2026
Parallel Token Prediction: From ProphetNet to Dependent Multi-Token Generation
Mar 04, 2026
FlashOptim: Optimizers for Memory Efficient Training
Mar 02, 2026
Cognizant - New Work, New World 2026
Mar 01, 2026
Regular Fourier Features for Nonstationary Gaussian Processes
Mar 01, 2026
Unified Latents (UL): How to train your latents
Feb 28, 2026
QuantSpec: Hierarchical KV Cache for Self-Speculative Decoding
Feb 28, 2026
MatFormer: Nested Transformer for Elastic Inference
Feb 28, 2026
MagicDec: Breaking Latency-Throughput Tradeoffs via KV-Compressed Speculative Decoding
Feb 28, 2026
KV selection algorithms: static (SnapKV) Vs dynamic (PQCache)
Feb 28, 2026
Fast Inference from Transformers via Speculative Decoding
Feb 28, 2026
EAGLE: Evolution of Lossless Acceleration for LLM Inference
Feb 28, 2026
CXL-SpecKV: Bridging the LLM Memory Wall with Speculative FPGA Disaggregation
Feb 28, 2026
Building Production-Ready Speculative Decoding with TensorRT-LLM
Feb 28, 2026
Apple's Speculative Streaming: Fast LLM Inference without Auxiliary Models
Feb 28, 2026
Apple's Mirror Speculative Decoding: Parallel LLM Inference via Heterogeneous Accelerators
Feb 28, 2026
Adaptive Control for Batched Speculative Decoding in LLM Serving
Feb 28, 2026
Taming the Long-Tail: Efficient Reasoning RL with Adaptive Drafters
Feb 26, 2026
Optimizing Verification and Efficiency in Multi-Draft Speculative Decoding
Feb 26, 2026
MEDUSA: Parallel Decoding Heads for Accelerated LLM Inference
Feb 26, 2026
Measuring LLM Reasoning Effort via Deep-Thinking Tokens
Feb 26, 2026
Measuring AI Ability to Complete Long Tasks
Feb 26, 2026
FastGRPO: Concurrency-Aware Speculative Decoding for Policy Optimization
Feb 26, 2026
Evaluating Collective Behaviour of Hundreds of LLM Agents
Feb 26, 2026
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
Feb 26, 2026
Deep Learning Frameworks for Robust Quadrupedal Locomotion
Feb 26, 2026
Advancements in Efficient KV Cache Quantization and Management
Feb 26, 2026
Accelerating Large Language Model Decoding with Speculative Sampling
Feb 26, 2026
NeurIPS 2025: Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
Feb 25, 2026
FAST26: Bidaw: Enhancing Key-Value Caching for Interactive LLM Serving via Bidirectional
Feb 25, 2026
FAST26: CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving
Feb 25, 2026
Cortex: Semantic Knowledge Caching for Low-Latency LLM Agents
Feb 25, 2026
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Feb 25, 2026
1.3 Billion Agents by 2028: The $50 Billion Boom and the Hidden Enterprise Crisis
Feb 25, 2026
Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts
Feb 24, 2026
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Feb 24, 2026
PPDS: Achieving Persona Consistency Through Large-Scale Dialogue Data Engineering
Feb 24, 2026
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Feb 24, 2026
Personalized Dialogue Generation via Persona-Adaptive Attention
Feb 24, 2026
PersonaPKT: Parameter-Efficient Knowledge Transfer for Personalized Dialogue Agents
Feb 24, 2026
Persona Vectors: monitoring and controlling character traits on LLMs
Feb 24, 2026
Machine Learning for Electrophysiological Phenotyping of Schizophrenia and Bipolar Disorder
Feb 24, 2026
Evaluating LLM Embeddings for Psychometric Personality Prediction
Feb 24, 2026
Bloom: an open source tool for automated behavioral evaluations
Feb 24, 2026
AI and the Decline of Entry-Level Employment
Feb 24, 2026
2019 UNILM: Unified Language Model Pre-training for NLU and NLG
Feb 24, 2026
Procgen Benchmark: Measuring Generalization in Reinforcement Learning
Feb 20, 2026
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
Feb 20, 2026
GLM-5: Transitioning from Vibe Coding to Agentic Engineering
Feb 20, 2026
Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training
Feb 20, 2026
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Feb 20, 2026
A 2024 Survey Analyzing Generalization in Deep Reinforcement Learning
Feb 20, 2026
Voxtral Realtime: Native Streaming ASR with Sub-Second Latency
Feb 17, 2026
The Endless Gym: Training Terminal Agents
Feb 17, 2026
Teaching Models to Teach Themselves via Stepping Stone Curricula
Feb 17, 2026
Moltbook: The Heartbeat of Autonomy: Fingerprinting Human Influence in AI Societies
Feb 17, 2026
Jet-RL: Stable On-Policy Reinforcement Learning with Unified FP8 Flow
Feb 17, 2026
Information Bottleneck-based Causal Attention for Medical Image Recognition
Feb 17, 2026
Intelligent AI Delegation
Feb 17, 2026
DeepVerifier: Self-Evolving Research Agents via Rubric-Guided Verification
Feb 17, 2026
Advancing Mechanistic Interpretability with Sparse Autoencoders
Feb 17, 2026
Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
Feb 17, 2026
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models
Feb 13, 2026
Sapient Intelligence: Hierarchical Reasoning Model
Feb 11, 2026
Reinforced Attention Learning
Feb 11, 2026
LongCat: Scaling Embeddings Outperforms Scaling Experts in Language Models
Feb 11, 2026
DR. KERNEL: Reinforcement Learning for Optimized Triton Kernel Generation
Feb 11, 2026
Dario Amodei: The Adolescence of Technology
Feb 11, 2026
Dario Amodei: Machines of Loving Grace
Feb 11, 2026
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
Feb 11, 2026
Advances in Attention Distillation for Efficient Transformer Models
Feb 11, 2026
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Feb 11, 2026
Towards a Science of Scaling Agent Systems
Feb 09, 2026
Uncertainty-aware genomic deep learning with knowledge distillation
Feb 06, 2026
Reinforcement Learning via Self-Distillation
Feb 06, 2026
On-Policy Self-Distillation for Advanced LLM Reasoning
Feb 06, 2026
Moloch’s Bargain: Market Incentives and the Rise of AI Misalignment
Feb 06, 2026
Knowledge distillation to context distillation
Feb 06, 2026
Distilling GNN Knowledge into Non-Neural Cell Graph Student Models
Feb 06, 2026
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
Feb 06, 2026
Claude Opus 4.6 Technical Report and Agent Capabilities
Feb 06, 2026
Advancing regulatory variant effect prediction with AlphaGenome
Feb 06, 2026
2006 Model Compression: Ensembles
Feb 06, 2026
2015: Distilling the Knowledge in a Neural Network
Feb 06, 2026
Reasoning Models Generate Societies of Thought
Jan 28, 2026
NeurIPS 2025: L2M: Mutual Information Scaling Law for Long-Context Language Modeling
Jan 28, 2026
Long context: Dichotomy of Findings & Status of Research
Jan 28, 2026
Keel: Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Jan 28, 2026
SLDAgent: Evolutionary Discovery of Superhuman AI Scaling Laws
Jan 26, 2026
Sequoia Capital: AGI is here
Jan 24, 2026
OpenAI: Scaling PostgreSQL to 800 Million ChatGPT Users
Jan 24, 2026
Agentic Reasoning for Large Language Models: A Comprehensive Roadmap
Jan 24, 2026
MEMRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic
Jan 23, 2026
Google: R&D inference value on HBF + PNM + low latency interconnect
Jan 23, 2026
Storage-next: Do We Need New Hardware for AI Storage, or Just Better Layouts?
Jan 21, 2026
Meta's solution to massive DLRM inference through software defined memory
Jan 21, 2026
LeCun's AMI Energy-Based Models and the Path to Autonomous Intelligence
Jan 21, 2026
Process Reward Learning for LLM Reasoning Optimization
Jan 19, 2026
Parallel Context-of-Experts Decoding for Efficient RAG reasoning
Jan 19, 2026
OpenRouter 2025 Report: Analysis of Global LLM Usage Patterns
Jan 19, 2026
Multidimensional Safety Evaluation of Frontier AI Models
Jan 19, 2026
MemoBrain: Executive Memory for Tool-Augmented Reasoning Agents
Jan 19, 2026
MATTRL: Collaborative Test-Time Reinforcement Learning for Multi-Agent Reasoning
Jan 19, 2026
How to make LLMs better at sci-fi writing
Jan 19, 2026
H-net: End-to-End Hierarchical Sequence Modeling via Dynamic Chunking
Jan 19, 2026
GLM-Image: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Jan 19, 2026
CompassMem: Event-Centric Logic Maps for Agent Memory
Jan 19, 2026
A Chain of Thought reasoning academic brawl
Jan 19, 2026
Squisher: Approximating the Fisher Information Matrix and use cases
Jan 17, 2026
Scaling laws: long context length and in context learning
Jan 17, 2026
NVIDIA: TTT-E2E: Unlocking Long-Context Learning via End-to-End Test-Time Training
Jan 17, 2026
Attention with a bias
Jan 17, 2026
DeepSeek Engram: Scaling Large Language Models via Conditional Memory Lookup
Jan 14, 2026
PageANN: Scalable Disk ANNS with Page-Aligned Graphs
Dec 07, 2025
NeurIPS 2025: Homogeneous Keys, Heterogeneous Values
Dec 04, 2025
NeurIPS 2025: MoBA: Mixture of Block Attention for Long-Context LLMs
Nov 29, 2025
NeurIPS 2025: Reward Reasoning Model
Nov 29, 2025
NeurIPS 2025: Parallel Scaling Law for Language Models
Nov 29, 2025
NeurIPS 2025: Thinkless: LLM Learns When to Think
Nov 29, 2025
NeurIPS 2025: SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
Nov 29, 2025
NeurIPS 2025: Self-Adapting Language Models
Nov 29, 2025
NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Nov 29, 2025
NeurIPS 2025: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
Nov 29, 2025
NeurIPS 2025: Large Language Diffusion Models
Nov 29, 2025
NeurIPS 2025: DYNAACT: Large Language Model Reasoning with Dynamic Action Spaces
Nov 29, 2025
NeurIPS 2025: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Nov 29, 2025
NeurIPS 2025: FlashBias: Fast Computation of Attention with Bias
Nov 29, 2025
NeurIPS 2025: A-Mem: Agentic Memory for LLM Agents
Nov 29, 2025
Anthropic: Disrupting the First AI-Orchestrated Cyber Espionage Campaign
Nov 27, 2025
Neuromorphic computing: Brain-Inspired AI and Hardware
Nov 22, 2025
DeepSeek-OCR: Contexts Optical Compression
Nov 22, 2025
Anthropic: reward hacking & misalignment & sabotage
Nov 22, 2025
Meta: SAM 3
Nov 20, 2025
vAttention Vs Strata: advanced GPU memory management
Nov 19, 2025
MLP Mixer Models
Nov 19, 2025
Mixture-of-Depths: Dynamic Compute Allocation in Transformers
Nov 19, 2025
Mamba-360: State Space Models for Long Sequence Modeling
Nov 19, 2025
Marin: Open LLM Optimization & Diagnostics
Nov 19, 2025
AMD: Instella: Fully Open Language Models with Stellar Performance
Nov 16, 2025
Mechanistic interpretability: Decoding the AI's Inner Logic: Circuits and Sparse Features
Nov 15, 2025
Vidur: Simulation for Efficient LLM Inference Deployment
Nov 10, 2025
SYMPHONY: Memory Management for LLM Multi-Turn Inference
Nov 10, 2025
Tempo: SLO-Aware LLM Serving Maximizing Service Gain
Nov 10, 2025
Spectral Gap: Analysis of Attention Layers and Graph Transformers
Nov 10, 2025
Random Walk Methods for Graph Learning and Networks
Nov 10, 2025
Metacognition and Skill Discovery in LLM Math Reasoning
Nov 10, 2025
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow
Nov 10, 2025
DSPy and TextGrad: Compiling Language Model Systems
Nov 10, 2025
Doubly Stochastic Attention for Transformers
Nov 10, 2025
Continuous Autoregressive Language Models: CALM
Nov 10, 2025
Context Distillation for Language Models
Nov 10, 2025
Confucius: Intent-Driven Network Management with Multi-Agent LLMs
Nov 10, 2025
CARTRIDGE: Efficient In-Context Learning via Distillation
Nov 10, 2025
AlphaEvolve: Mathematical Discovery at Scale
Nov 10, 2025
AdaFlow: Variance-Adaptive Flow-Based Imitation Learning
Nov 10, 2025
A Framework for LLM Application Safety Evaluation
Nov 10, 2025
SuperBPE: Space Travel for Language Models
Nov 04, 2025
Samsung: zFLoRA: Zero-Latency Fused Low-Rank Adapters
Nov 04, 2025
PolicySmith: Automated Systems Heuristic Generation via LLMs
Nov 04, 2025
MorphKV: Constant-Sized KV Caches for LLM Inference
Nov 04, 2025
HALoS: Hierarchical Asynchronous LLM Training over Slow Networks
Nov 04, 2025
Gumbel-Softmax for Differentiable Categorical Reparameterization and Selective Networks
Nov 04, 2025
Google: Supervised Reinforcement Learning for Step-wise Reasoning in LLMs
Nov 04, 2025
Anchored Diffusion Language Model: Superior Generation and Reasoning
Nov 04, 2025
RetNet: Retentive Networks: Transformer Successor for Large Language Models
Nov 02, 2025
Kimi Linear: Efficient Expressive Attention Architecture
Nov 02, 2025
ALiBi: Attention with Linear Biases Enables Length Extrapolation
Nov 01, 2025
Small Versus Large Models for Requirements Classification
Oct 31, 2025
Quest: Query-Aware Sparsity for Efficient LLM Inference
Oct 31, 2025
Hyper-Scaling LLM Inference with KV Cache Compression
Oct 31, 2025
Flash-LLM: Efficient LLM Inference with Unstructured Sparsity on Tensor Cores
Oct 31, 2025
ELASTIC: Linear Attention for Sequential Interest Compression
Oct 31, 2025
Architectural Scaling Laws for Efficient LLMs
Oct 31, 2025
Anthropic: Introspective Awareness in LLMs
Oct 31, 2025
TxGNN: Foundation Model for Zero-Shot Drug Repurposing
Oct 29, 2025
Sentence-BERT: Siamese Networks for Sentence Embeddings
Oct 29, 2025
ATTENTION2D and lean attention: Distributed Self-Attention
Oct 29, 2025
The Free Transformer: VAE Extension for Decoders
Oct 26, 2025
Survey of Emerging Topics in AI and Robotics
Oct 26, 2025
Stuck in the Matrix: LLM Spatial Reasoning
Oct 26, 2025
Structural Understanding of LLM Overthinking
Oct 26, 2025
Strata: Efficient Hierarchical Context Caching for LLM Serving
Oct 26, 2025
STAR: Sub-Entry Sharing TLB for Multi-Instance GPU Efficiency
Oct 26, 2025
Ring-linear: Efficient Hybrid Architecture for Long-Context Reasoning
Oct 26, 2025
Open-o3 Video: Spatio-Temporal Grounded Reasoning
Oct 26, 2025
MASA: Meta-Awareness via Self-Alignment Reinforcement Learning
Oct 26, 2025
Lp-Reg: Low-Probability Tokens Sustain RL Exploration
Oct 26, 2025
LLMs Learning from Verbal Feedback Without Scalar Rewards
Oct 26, 2025
LithOS: Operating System for Efficient GPU Machine Learning
Oct 26, 2025
LLM-Empowered Knowledge Graph Construction: A Survey
Oct 26, 2025
LFM2-8B-A1B: Efficient On-Device Mixture-of-Experts
Oct 26, 2025
Latent Constituency in Humans and LLMs
Oct 26, 2025
Introducing MTEB v2: Multimodal Embedding Evaluation
Oct 26, 2025
Internal Mechanisms of a Large Language Model
Oct 26, 2025
GigaBrain-0: World Model-Powered Generalist Robots
Oct 26, 2025
FlashAttention: IO-Aware Fast and Memory-Efficient Attention
Oct 26, 2025
Cognitive Impact of AI and Search on Essay Writing
Oct 26, 2025
Cattell–Horn–Carroll Theory of Intelligence
Oct 26, 2025
Structural Understanding of LLM Overthinking
Oct 22, 2025
RoBERTa: Robustly Optimized BERT Pretraining Approach
Oct 22, 2025
REFRAG: v2 paper: Efficient RAG Decoding via Context Compression
Oct 22, 2025
RAG-Anything: Unified Multimodal Knowledge Retrieval Framework
Oct 22, 2025
LLM-Guided Hierarchical Retrieval: The LATTICE Framework
Oct 22, 2025
LightMem: Lightweight Efficient Memory-Augmented Generation
Oct 22, 2025
Inheritune: Efficient LLM Training via Attention Collapse
Oct 22, 2025
In-Context Learning as Implicit Learning Algorithms
Oct 22, 2025
EssenceBench: Compressing LLM Benchmarks via Redundancy and Genetic Algorithm
Oct 22, 2025
Elastic-Cache: Adaptive KV Caching for Diffusion LLMs
Oct 22, 2025
Dr.LLM: Dynamic Layer Routing in LLMs
Oct 22, 2025
A Psychometric Framework for Artificial General Intelligence
Oct 22, 2025
Mojo: Performance-Portable HPC Kernels on GPUs
Oct 18, 2025
Geometric Flows of Logic in LLM Representation Space
Oct 18, 2025
Scaling Reinforcement Learning Compute for LLMs
Oct 17, 2025
HBF: High Bandwidth Flash for AI Inferencing
Oct 15, 2025
COPA: Composable On-Package GPU Architecture for Domain Specialization
Oct 15, 2025
Architectural Migration to Multi-head Latent Attention
Oct 15, 2025
UniVideo: Unified Video Understanding, Generation, and Editing
Oct 11, 2025
Training-Free GRPO: Policy Optimization via Context Space
Oct 11, 2025
RAND: Securing AI Model Weights: Preventing Theft and Misuse
Oct 11, 2025
Performance of Confidential Computing for Large Language Models
Oct 11, 2025
Multi-Agent Tool-Integrated Policy Optimization (MATPO)
Oct 11, 2025
Google: Confidential Computing with Accelerated AI Workloads on GCE
Oct 11, 2025
AWS: Nitro System: Security, Enclaves, and Generative AI
Oct 11, 2025
Anthropic: Confidential Inference via Trusted Virtual Machines
Oct 11, 2025
Samsung: Less is More: Recursive Reasoning with Tiny Networks
Oct 10, 2025
Petri: Accelerating AI Safety Auditing
Oct 10, 2025
Low-Precision Transformer Failure in Flash Attention
Oct 10, 2025
Early Experience for Language Agent Improvement
Oct 10, 2025
Dragon Hatchling: Brain-Inspired AI Architecture
Oct 10, 2025
CLUE: Hidden-State Clustering for Non-parametric Verification
Oct 10, 2025
Agentic Context Engineering: Evolving Contexts for Self-Improving LLMs
Oct 10, 2025
AGENTFLOW: In-the-Flow Agentic System Optimization
Oct 10, 2025
Test-Time Reinforcement Learning for LLMs
Oct 08, 2025
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Oct 08, 2025
Reactive Transformer: Stateful Real-Time Language Models
Oct 08, 2025
RECAP: Safety Alignment via Counter-Aligned Prefilling
Oct 08, 2025
Paris: Decentralized Open-Weight Diffusion Model
Oct 08, 2025
ONNX Ecosystem, Optimization, and Deployment
Oct 08, 2025
NIST Evaluation of DeepSeek AI Models
Oct 08, 2025
MotionRAG: Retrieval-Augmented Image-to-Video Generation
Oct 08, 2025
LongCodeZip: Compress Long Code Context for LLMs
Oct 08, 2025
Imperceptible Jailbreaking Against Large Language Models
Oct 08, 2025
Implicit Dynamics of In-Context Learning
Oct 08, 2025
GNN101: Visual Learning of Graph Neural Networks
Oct 08, 2025
Emergent Abilities of Large Language Models
Oct 08, 2025
DC-VideoGen: Efficient Video Generation with Deep Compression
Oct 08, 2025
Contextual Blocks: Implicit Weight Updates and Federated Learning
Oct 08, 2025
CoDA: Collaborative Multi-Agent Data Visualization
Oct 08, 2025
Analog In-Memory Attention for Energy-Efficient LLMs
Oct 08, 2025
ACON: Optimizing Context Compression for LLM Agents
Oct 08, 2025
Regression Language Models for Code Metrics
Oct 03, 2025
Introducing RTEB: Retrieval Embedding Benchmark
Oct 03, 2025
CUDA Unified Memory and Heterogeneous Memory Management
Oct 02, 2025
Moravec's Paradox and AI Automation Limits
Oct 01, 2025
Characterizing LLM KV Cache Workloads in Production
Oct 01, 2025
BurstGPT: A Real-World LLM Serving Workload Dataset
Oct 01, 2025
Qwen3-Next & Qwen3-Omni technical report
Sep 30, 2025
Variational Reasoning Framework for Language Models
Sep 29, 2025
Schoenfeld Theory Applied to Large Reasoning Models
Sep 27, 2025
Federated Learning with Soft Embeddings for Retrieval
Sep 27, 2025
CWM: Code Generation with World Models
Sep 27, 2025
Tree-based Group Policy Optimization for LLM Agents
Sep 26, 2025
Seedream 4.0: Multimodal Image Generation System
Sep 26, 2025
GDPval: Measuring AI Performance on Real-World Work
Sep 26, 2025
EmbeddingGemma: Powerful Lightweight Text Representations
Sep 26, 2025
CE-GPPO: Controlling Entropy via Gradient-Preserving Policy Optimization
Sep 26, 2025
LLM-I: Interleaved Multimodal Creators via Tool-Use
Sep 20, 2025
Adaptive Compression Techniques for Efficient LLM Inference
Sep 20, 2025
THOR: Hierarchical RL for Mathematical Reasoning
Sep 19, 2025
The Uneven Diffusion of AI Adoption
Sep 19, 2025
Single-stream Policy Optimization for LLMs
Sep 19, 2025
SearchInstruct: Instruction Tuning with Dynamic Retrieval
Sep 19, 2025
FlowRL: Distribution Matching for LLM Reasoning
Sep 19, 2025
Evolving Language Models Without Labels: EVOL-RL
Sep 19, 2025
REFRAG: Rethinking RAG-based Decoding
Sep 18, 2025
Pre-computing & reusing KV caches to accelerate RAG inference
Sep 18, 2025
DeepSeek-R1: Reinforcing LLM Reasoning Through Self-Evolution
Sep 18, 2025
WebSailor-V2: Bridging Proprietary Agents with Synthetic Data and RL
Sep 17, 2025
TailorKV: Hybrid KV Cache Compression for LLMs
Sep 17, 2025
ShadowKV: High-Throughput Long-Context LLM Inference
Sep 17, 2025
QuantAgent: Multi-Agent LLM for High-Frequency Trading
Sep 17, 2025
MIRAGE: Optimizing LLM KV Cache with Parameter Remapping
Sep 17, 2025
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning
Sep 17, 2025
Infini-gram: Scaling Unbounded N-gram Language Models
Sep 17, 2025
Dynamic Chunking for Hierarchical Sequence Modeling
Sep 17, 2025
UQ: Unsolved Questions for Language Models
Sep 16, 2025
PETALS: Collaborative Large Language Model Inference and Fine-tuning
Sep 16, 2025
Non-Penetrative Tensor Partitioning for Collaborative AIoT Inference
Sep 16, 2025
Native Sparse Attention: Efficient Long-Context LLMs
Sep 16, 2025
Janus-Pro: Unified Multimodal AI with Scaled Improvements
Sep 16, 2025
Hierarchical Reasoning Model: Brain-Inspired AI for Complex Tasks
Sep 16, 2025
Generalist Reward Modeling with Inference-Time Scaling
Sep 16, 2025
Federated Post-Training LLMs: An Accessibility and Efficiency Survey
Sep 16, 2025
Collaborative Edge Inference with Dynamic Task Offloading and Early Exiting
Sep 16, 2025
CodeI/O: Reasoning Patterns Through Code Input-Output Prediction
Sep 16, 2025
Adaptive LLM Partitioning for Edge Inference
Sep 16, 2025
The Illusion of Diminishing Returns in LLM Execution
Sep 15, 2025
PyTorch FSDP: Scaling Fully Sharded Data Parallel
Sep 15, 2025
MetaGraph: knowledge graphs from financial NLP
Sep 15, 2025
Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Model
Sep 15, 2025
HybridServe: Efficient LLM Inference with Hybrid Caching
Sep 15, 2025
GraphSAGE: Inductive Representation Learning on Large Graphs
Sep 15, 2025
FlexGen: High-Throughput LLM Inference on a Single GPU
Sep 15, 2025
AWQ: On-Device LLM Compression and Acceleration
Sep 15, 2025
Llama 3: Architecture, Capabilities, and Safety
Sep 14, 2025
Graph Patterns of Knowledge in Large Language Models
Sep 14, 2025
Survey of Reinforcement Learning for Large Reasoning Models
Sep 13, 2025
Statistical Methods for Generative AI Reliability
Sep 13, 2025
SpikingBrain: Brain-Inspired LLMs for Efficient Long-Context Processing
Sep 13, 2025
EntiGraph: Scaling Language Models with Synthetic Pretraining
Sep 13, 2025
All for One: LLMs Solve Mental Math at the Last Token
Sep 13, 2025
Parallel-R1: Reinforcement Learning for Parallel Thinking in LLMs
Sep 12, 2025
NOVELTYBENCH: Evaluating Language Model Diversity
Sep 12, 2025
HyperController: Fast, Stable Reinforcement Learning Hyperparameter Optimization
Sep 12, 2025
Explaining AI for Digital Advertising with LLMs
Sep 11, 2025
ByteCheckpoint: A Unified LLM Checkpointing System
Sep 11, 2025
AdLlama: Boosting Ad Performance with Reinforcement Learning
Sep 11, 2025
Mini-o3: Scaling Reasoning for Visual Search
Sep 10, 2025
Masked Diffusion Models: Performance and Theory
Sep 10, 2025
K2-Think: A Parameter-Efficient Reasoning System
Sep 10, 2025
INF2: Near-Storage LLM Inference for High Throughput
Sep 10, 2025
Darling: Reinforcing Diversity and Quality in Language Models
Sep 10, 2025
BLEU: Automatic Machine Translation Evaluation
Sep 10, 2025
AlphaEvolve: AI for Scientific and Algorithmic Discovery
Sep 10, 2025
TraceRL: Reinforcement Learning for Diffusion Language Models
Sep 09, 2025
LLM Benchmark Robustness to Linguistic Variation
Sep 09, 2025
Behavioral Fingerprinting of Large Language Models
Sep 09, 2025
Offloading LLM Models and KV Caches to NVMe SSDs
Sep 08, 2025
SGLang: Efficient Language Model Program Execution
Sep 07, 2025
OpenELM: Apple's Open Language Model Family
Sep 07, 2025
GPT-NeoX: Large-Scale Autoregressive Language Modeling in PyTorch
Sep 07, 2025
FineVision: Open Data for Computer Vision
Sep 07, 2025
Evaluating Large Language Models Trained on Code
Sep 07, 2025
Eleuther: evaluating LLMs
Sep 07, 2025
Democratizing AI Compute: The Modular Vision
Sep 07, 2025
SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence
Sep 06, 2025
Limitations of Embedding-Based Retrieval
Sep 06, 2025
MTEB & MMTEB: The Massive Text Embedding Benchmark
Sep 05, 2025
Inverse IFEval: Unlearning LLM Cognitive Inertia
Sep 05, 2025
EmbeddingGemma: On-Device AI for High-Quality Embeddings
Sep 05, 2025
DeepResearch Arena: Benchmarking LLMs' Research Abilities
Sep 05, 2025
The Rise of Physical Neural Networks
Sep 04, 2025
Supervised Learning in DNA Neural Networks
Sep 04, 2025
FastVLM: Efficient Vision Encoding for Language Models
Sep 04, 2025
Apertus Tech Report Overview
Sep 04, 2025
Scientific LLMs: A Data-Centric Survey and Roadmap
Sep 03, 2025
rStar2-Agent: Smarter Math Reasoning Through Agentic RL
Sep 03, 2025
FusionANNS: Billion-Scale ANNS with SSD and GPU
Sep 03, 2025
Training Recurrent Neural Networks: Vanishing and Exploding Gradients
Aug 27, 2025
SPAM: Stabilizing LLM Training with Spike-Aware Optimization
Aug 27, 2025
Pimba: Processing-in-Memory for LLM Serving
Aug 27, 2025
Oaken: Fast, Efficient LLM Serving with Hybrid KV Cache Quantization
Aug 27, 2025
AdamW: Decoupled Weight Decay Regularization for Adaptive Gradient Algorithms
Aug 27, 2025
Adafactor: Memory-Efficient Adaptive Learning Rates
Aug 27, 2025
Quantizing Diffusion LLMs: A Systematic Study
Aug 26, 2025
ODYSSEY: Unified Mobile Manipulation for Agile Quadruped Robots
Aug 26, 2025
Google: Measuring AI's Environmental Impact at Scale
Aug 26, 2025
ComoRAG: Cognitively Inspired Narrative Reasoning
Aug 26, 2025
GPT-5 Spatial Intelligence: An Empirical Study
Aug 24, 2025
DeepSeek-V3.1: A Hybrid AI Model with Enhanced Reasoning
Aug 23, 2025
Compressed Experts: Efficient MoE Model Editing
Aug 23, 2025
Genie 3: A New Frontier for World Models
Aug 22, 2025
Los Alamos: overcoming the memory wall fighting sparse memory access
Aug 21, 2025
Switch Transformers: Trillion Parameter Models with Sparsity
Aug 20, 2025
Speed Always Wins: Efficient Large Language Model Architectures
Aug 20, 2025
Linear Transformers: Faster Than RNNs
Aug 20, 2025
Self-Search Reinforcement Learning for LLMs
Aug 18, 2025
Diffusion Language Models: Principles, Techniques, and Applications
Aug 18, 2025
Continuous Batching for LLM Inference: Throughput and Latency Gains
Aug 18, 2025
Atom: Low-Bit Quantization for LLM Serving
Aug 18, 2025
The Mapped Memory Mistake: Why DBMSs Should Avoid MMAP
Aug 13, 2025
Scaling PostgreSQL at OpenAI: Read-Heavy Workloads and Optimizations - PGConf.dev 2025
Aug 13, 2025
pNFS Flex Files
Aug 13, 2025
NVMe Offload on Colossal AI: Breaking the GPU Memory Wall
Aug 13, 2025
NVIDIA GDS, BAM Vs RocM solutions
Aug 13, 2025
fMoE: Fine-Grained Expert Offloading for MoE Serving
Aug 13, 2025
ELMo-Tune-V2: LLM-Assisted Auto-Tuning for Key-Value Stores
Aug 13, 2025
AiSAQ: DRAM-free ANNS with Product Quantization
Aug 13, 2025
Qwen-Image: Generation and Editing with Precision
Aug 12, 2025
Mem0: Scalable Long-Term Memory for AI Agents
Aug 12, 2025
Chain-of-Thought Reasoning: A Brittle Mirage?
Aug 11, 2025
Xavier Initialization: Deep Feedforward Networks: Training Difficulties and Solutions
Aug 08, 2025
ZeRO-Offload: Democratizing Billion-Scale Model Training
Aug 08, 2025
vLLM & PagedAttention: Efficient LLM Serving with Virtual Memory
Aug 08, 2025
TierTrain: Proactive Memory Tiering for DNN Training
Aug 08, 2025
Test-Time Scaling
Aug 08, 2025
The Elements of Differentiable Programming
Aug 08, 2025
Teraio: Cost-Efficient LLM Training via Lifetime-Aware Tensor Offloading
Aug 08, 2025
MoE Offloaded
Aug 08, 2025
Shared Virtual Memory: Design and Performance Implications
Aug 08, 2025
Reinforcement Learning
Aug 08, 2025
Reinforcement Pre-Training for Language Models
Aug 08, 2025
PartitionedVC: Optimizing External Memory Graph Analytics
Aug 08, 2025
Multiagent Debate Improves Language Model Reasoning
Aug 08, 2025
Multi Query Attention: PaLM: Scaling Language Modeling with Pathways
Aug 08, 2025
MUVERA: Efficient Multi-Vector Information Retrieval
Aug 08, 2025
Mu: The Windows Settings Agent Language Model
Aug 08, 2025
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Aug 08, 2025
Mistral 7B: Superior Performance in a Smaller Package
Aug 08, 2025
MetaScale: Test-Time Scaling with Evolving Meta-Thoughts
Aug 08, 2025
Microscaling Quantization for Large Language Models
Aug 08, 2025
Lost in the Middle: How Language Models Use Long Contexts
Aug 08, 2025
MEGABYTE: Multiscale Transformers for Million-byte Sequences
Aug 08, 2025
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Aug 08, 2025
LoRA: Low-Rank Adaptation of Large Language Models
Aug 08, 2025
LMCache: Supercharging LLM Performance with KV Cache Management
Aug 08, 2025
KVQuant: LLM Inference with KV Cache Quantization
Aug 08, 2025
Kaiming Initialization and PReLU
Aug 08, 2025
Gemma: Google DeepMind's Open Language Models
Aug 08, 2025
Fast Learning for Deep Belief Networks
Aug 08, 2025
FP8 Quantization
Aug 08, 2025
FlashAttention-2: Faster Attention with Better Parallelism
Aug 08, 2025
DyNN-Offload: Efficient Memory for Dynamic Neural Networks
Aug 08, 2025
Dynamic Tanh: Transformers Without Normalization
Aug 08, 2025
DroidSpeak: Cross-LLM KV Cache Sharing
Aug 08, 2025
DiMSUM: Image Generation with Diffusion Mamba
Aug 08, 2025
DeepSeek Safety Concerns
Aug 08, 2025
Diffusion Probabilitic Models: Deep Unsupervised Learning Through Diffusion Processes
Aug 08, 2025
Demystifying Mamba: Architecture and Capabilities
Aug 08, 2025
DeepSeek-R1: Incentivizing Reasoning in LLMs
Aug 08, 2025
DeepSeek-V3: A Technical Report
Aug 08, 2025
DeepSeekMoE: Scalable Mixture-of-Experts Language Models
Aug 08, 2025
DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis
Aug 08, 2025
Concept Drift
Aug 08, 2025
ColBERT and ColBERT v2
Aug 08, 2025
CODEGEN: Open Language Model for Code Synthesis
Aug 08, 2025
Chain of thought
Aug 08, 2025
AI and the Memory Wall: Overcoming Bottlenecks
Aug 08, 2025
AdamW: Decoupled Weight Decay Regularization for Adaptive Gradient Algorithms
Aug 08, 2025
Adam: A Method for Stochastic Optimization
Aug 08, 2025
Transformer Scaling
Aug 07, 2025
Scaling Laws
Aug 07, 2025
ResNets - residual block
Aug 07, 2025
RoPE
Aug 07, 2025
LSTM: the forget gate
Aug 07, 2025
Longformer: A Transformer for Long Documents
Aug 07, 2025
Linformer: Efficient Self-Attention with Linear Complexity
Aug 07, 2025
LeNet-5: Convolutional Networks for Character Recognition
Aug 07, 2025
Layer Normalization and Dual Patch Normalization
Aug 07, 2025
Learning from repeated data
Aug 07, 2025
GPT4 Technical Report
Aug 07, 2025
GQA: Grouped Query Attention
Aug 07, 2025
GPT3
Aug 07, 2025
GPT2
Aug 07, 2025
GELU
Aug 07, 2025
Dropout
Aug 07, 2025
Constitutional AI: Harmlessness Through Self-Improvement
Aug 07, 2025
Chinchilla: Optimal Language Model Scaling
Aug 07, 2025
Chamfer Matching: Image Registration and Medical Applications
Aug 07, 2025
BART
Aug 07, 2025
BERT
Aug 07, 2025
Batch Normalization
Aug 07, 2025
Attention is all you need
Aug 07, 2025