Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Episode | Date |
---|---|
[QA] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
|
Apr 30, 2025 |
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
|
Apr 30, 2025 |
[QA] ReasonIR: Training Retrievers for Reasoning Tasks
|
Apr 30, 2025 |
ReasonIR: Training Retrievers for Reasoning Tasks
|
Apr 30, 2025 |
[QA] Scaling Laws For Scalable Oversight
|
Apr 28, 2025 |
Scaling Laws For Scalable Oversight
|
Apr 28, 2025 |
[QA] Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models
|
Apr 28, 2025 |
Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models
|
Apr 28, 2025 |
[QA] Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity
|
Apr 27, 2025 |
Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity
|
Apr 27, 2025 |
[QA] Learning Adaptive Parallel Reasoning with Language Models
|
Apr 26, 2025 |
Learning Adaptive Parallel Reasoning with Language Models
|
Apr 26, 2025 |
[QA] Boosting Generative Image Modeling via Joint Image-Feature Synthesis
|
Apr 26, 2025 |
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
|
Apr 26, 2025 |
[QA] Step1X-Edit: A Practical Framework for General Image Editing
|
Apr 25, 2025 |
Step1X-Edit: A Practical Framework for General Image Editing
|
Apr 25, 2025 |
[QA] Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
|
Apr 25, 2025 |
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
|
Apr 25, 2025 |
[QA] Exploring How LLMs Capture and Represent Domain-Specific Knowledge
|
Apr 24, 2025 |
Exploring How LLMs Capture and Represent Domain-Specific Knowledge
|
Apr 24, 2025 |
[QA] I-Con: A Unifying Framework for Representation Learning
|
Apr 24, 2025 |
I-Con: A Unifying Framework for Representation Learning
|
Apr 24, 2025 |
[QA] Tina: Tiny Reasoning Models via LoRA
|
Apr 23, 2025 |
Tina: Tiny Reasoning Models via LoRA
|
Apr 23, 2025 |
[QA] LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
|
Apr 23, 2025 |
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
|
Apr 23, 2025 |
[QA] UFO2: The Desktop AgentOS
|
Apr 22, 2025 |
UFO2: The Desktop AgentOS
|
Apr 22, 2025 |
[QA] NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning
|
Apr 22, 2025 |
NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning
|
Apr 22, 2025 |
[QA] Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
|
Apr 21, 2025 |
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
|
Apr 21, 2025 |
[QA] Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
|
Apr 21, 2025 |
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
|
Apr 21, 2025 |
[QA] Reasoning Models Can Be Effective Without Thinking
|
Apr 20, 2025 |
Reasoning Models Can Be Effective Without Thinking
|
Apr 20, 2025 |
[QA] A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
|
Apr 20, 2025 |
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
|
Apr 20, 2025 |
[QA] CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
|
Apr 19, 2025 |
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
|
Apr 19, 2025 |
[QA] Antidistillation Sampling
|
Apr 18, 2025 |
Antidistillation Sampling
|
Apr 18, 2025 |
[QA] Position: The Most Expensive Part of an LLM should be its Training Data
|
Apr 18, 2025 |
Position: The Most Expensive Part of an LLM should be its Training Data
|
Apr 18, 2025 |
[QA] Activated LoRA: Fine-tuned LLMs for Intrinsics
|
Apr 18, 2025 |
Activated LoRA: Fine-tuned LLMs for Intrinsics
|
Apr 18, 2025 |
[QA] COLORBENCH: Can VLMs See and Understand the Colorful World?
|
Apr 17, 2025 |
COLORBENCH: Can VLMs See and Understand the Colorful World?
|
Apr 17, 2025 |
[QA] ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
|
Apr 17, 2025 |
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
|
Apr 17, 2025 |
[QA] Looking beyond the next token
|
Apr 16, 2025 |
Looking beyond the next token
|
Apr 16, 2025 |
[QA] How to Predict Best Pretraining Data with Small Experiments
|
Apr 16, 2025 |
How to Predict Best Pretraining Data with Small Experiments
|
Apr 16, 2025 |
[QA] Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
|
Apr 15, 2025 |
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
|
Apr 15, 2025 |
[QA] DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
|
Apr 15, 2025 |
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
|
Apr 15, 2025 |
[QA] Steering CLIP's vision transformer with sparse autoencoders
|
Apr 14, 2025 |
Steering CLIP's vision transformer with sparse autoencoders
|
Apr 14, 2025 |
[QA] Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
|
Apr 14, 2025 |
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
|
Apr 14, 2025 |
[QA] Rethinking Reflection in Pre-Training
|
Apr 13, 2025 |
Rethinking Reflection in Pre-Training
|
Apr 13, 2025 |
[QA] Self-Steering Language Models
|
Apr 13, 2025 |
Self-Steering Language Models
|
Apr 13, 2025 |
[QA] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
|
Apr 12, 2025 |
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
|
Apr 12, 2025 |
DDT: Decoupled Diffusion Transformer
|
Apr 12, 2025 |
DDT: Decoupled Diffusion Transformer
|
Apr 12, 2025 |
[QA] Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
|
Apr 11, 2025 |
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
|
Apr 11, 2025 |
[QA] Scaling Laws for Native Multimodal Models
|
Apr 11, 2025 |
Scaling Laws for Native Multimodal Models
|
Apr 11, 2025 |
[QA] OLMOTRACE: Tracing Language Model Outputs Back to Trillions of Training Tokens
|
Apr 10, 2025 |
OLMOTRACE: Tracing Language Model Outputs Back to Trillions of Training Tokens
|
Apr 10, 2025 |
[QA] Wanting to be Understood
|
Apr 10, 2025 |
Wanting to be Understood
|
Apr 10, 2025 |
[QA] A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
|
Apr 10, 2025 |
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
|
Apr 10, 2025 |
[QA] From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
|
Apr 09, 2025 |
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
|
Apr 09, 2025 |
[QA] Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
|
Apr 09, 2025 |
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
|
Apr 09, 2025 |
[QA] Can ChatGPT Learn My Life From a Week of First-Person Video?
|
Apr 08, 2025 |
Can ChatGPT Learn My Life From a Week of First-Person Video?
|
Apr 08, 2025 |
[QA] Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
|
Apr 08, 2025 |
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
|
Apr 08, 2025 |
[QA] Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
|
Apr 07, 2025 |
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
|
Apr 07, 2025 |
[QA] Agentic Knowledgeable Self-awareness
|
Apr 07, 2025 |
Agentic Knowledgeable Self-awareness
|
Apr 07, 2025 |
[QA] Inference-Time Scaling for Generalist Reward Modeling
|
Apr 05, 2025 |
Inference-Time Scaling for Generalist Reward Modeling
|
Apr 05, 2025 |
[QA] Multi-Token Attention
|
Apr 05, 2025 |
Multi-Token Attention
|
Apr 05, 2025 |
[QA] Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
|
Mar 30, 2025 |
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
|
Mar 30, 2025 |
[QA] Wan: Open and Advanced Large-Scale Video Generative Models
|
Mar 29, 2025 |
Wan: Open and Advanced Large-Scale Video Generative Models
|
Mar 29, 2025 |
[QA] UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
|
Mar 29, 2025 |
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
|
Mar 29, 2025 |
[QA] SWI: Speaking with Intent in Large Language Models
|
Mar 28, 2025 |
SWI: Speaking with Intent in Large Language Models
|
Mar 28, 2025 |
[QA] Unified Multimodal Discrete Diffusion
|
Mar 28, 2025 |
Unified Multimodal Discrete Diffusion
|
Mar 28, 2025 |
[QA] Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
|
Mar 27, 2025 |
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
|
Mar 27, 2025 |
[QA] Open Deep Search: Democratizing Search with Open-source Reasoning Agents
|
Mar 27, 2025 |
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
|
Mar 27, 2025 |
[QA] LookAhead Tuning: Safer Language Models via Partial Answer Previews
|
Mar 26, 2025 |
LookAhead Tuning: Safer Language Models via Partial Answer Previews
|
Mar 26, 2025 |
[QA] ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
|
Mar 26, 2025 |
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
|
Mar 26, 2025 |
[QA] FFN Fusion: Rethinking Sequential Computation in Large Language Models
|
Mar 25, 2025 |
FFN Fusion: Rethinking Sequential Computation in Large Language Models
|
Mar 25, 2025 |
[QA] Modifying Large Language Model Post-Training for Diverse Creative Writing
|
Mar 24, 2025 |
Modifying Large Language Model Post-Training for Diverse Creative Writing
|
Mar 24, 2025 |
[QA] Users Favor LLM-Generated Content—Until They Know It's AI
|
Mar 24, 2025 |
Users Favor LLM-Generated Content—Until They Know It's AI
|
Mar 24, 2025 |
[QA] Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
|
Mar 24, 2025 |
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
|
Mar 24, 2025 |
[QA] DAPO: An Open-Source LLM Reinforcement Learning System at Scale
|
Mar 23, 2025 |
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
|
Mar 23, 2025 |
[QA] SynCity: Training-Free Generation of 3D Worlds
|
Mar 23, 2025 |
SynCity: Training-Free Generation of 3D Worlds
|
Mar 23, 2025 |
[QA] TULIP: Towards Unified Language-Image Pretraining
|
Mar 22, 2025 |
TULIP: Towards Unified Language-Image Pretraining
|
Mar 22, 2025 |
[QA] Causal Emergence 2.0: Quantifying emergent complexity
|
Mar 22, 2025 |
Causal Emergence 2.0: Quantifying emergent complexity
|
Mar 22, 2025 |
[QA] Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them
|
Mar 21, 2025 |
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them
|
Mar 21, 2025 |
[QA] Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
|
Mar 21, 2025 |
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
|
Mar 21, 2025 |
[QA] Cube: A Roblox View of 3D Intelligence
|
Mar 20, 2025 |
Cube: A Roblox View of 3D Intelligence
|
Mar 20, 2025 |
[QA] SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
|
Mar 20, 2025 |
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
|
Mar 20, 2025 |
[QA] Measuring AI Ability to Complete Long Tasks
|
Mar 19, 2025 |
Measuring AI Ability to Complete Long Tasks
|
Mar 19, 2025 |
[QA] Impossible Videos
|
Mar 19, 2025 |
Impossible Videos
|
Mar 19, 2025 |
[QA] SuperBPE: Space Travel for Language Models
|
Mar 18, 2025 |
SuperBPE: Space Travel for Language Models
|
Mar 18, 2025 |
[QA] xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
|
Mar 18, 2025 |
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
|
Mar 18, 2025 |
[QA] PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
|
Mar 17, 2025 |
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
|
Mar 17, 2025 |
[QA] Auditing language models for hidden objectives
|
Mar 17, 2025 |
Auditing language models for hidden objectives
|
Mar 17, 2025 |
[QA] Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
|
Mar 16, 2025 |
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
|
Mar 16, 2025 |
[QA] New Trends for Modern Machine Translation with Large Reasoning Models
|
Mar 16, 2025 |
New Trends for Modern Machine Translation with Large Reasoning Models
|
Mar 16, 2025 |
[QA] Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
|
Mar 15, 2025 |
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
|
Mar 15, 2025 |
[QA] Long Context Tuning for Video Generation
|
Mar 15, 2025 |
Long Context Tuning for Video Generation
|
Mar 15, 2025 |
[QA] Transformers without Normalization
|
Mar 14, 2025 |
Transformers without Normalization
|
Mar 14, 2025 |
[QA] Charting and Navigating Hugging Face's Model Atlas
|
Mar 14, 2025 |
Charting and Navigating Hugging Face's Model Atlas
|
Mar 14, 2025 |
[QA] I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
|
Mar 13, 2025 |
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
|
Mar 13, 2025 |
[QA] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
|
Mar 13, 2025 |
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
|
Mar 13, 2025 |
[QA] Gemini Embedding: Generalizable Embeddings from Gemini
|
Mar 12, 2025 |
Gemini Embedding: Generalizable Embeddings from Gemini
|
Mar 12, 2025 |
[QA] Inductive Moment Matching
|
Mar 11, 2025 |
Inductive Moment Matching
|
Mar 11, 2025 |
[QA] Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
|
Mar 11, 2025 |
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
|
Mar 11, 2025 |
[QA] Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
|
Mar 10, 2025 |
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
|
Mar 10, 2025 |
[QA] Continual Pre-training of MoEs: How robust is your router?
|
Mar 10, 2025 |
Continual Pre-training of MoEs: How robust is your router?
|
Mar 10, 2025 |
[QA] HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
|
Mar 09, 2025 |
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
|
Mar 09, 2025 |
[QA] Boosting Blockchain Throughput: Parallel EVM Execution with Asynchronous Storage for Reddio
|
Mar 09, 2025 |
Boosting Blockchain Throughput: Parallel EVM Execution with Asynchronous Storage for Reddio
|
Mar 09, 2025 |
[QA] Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
|
Mar 08, 2025 |
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
|
Mar 08, 2025 |
[QA] L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
|
Mar 08, 2025 |
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
|
Mar 08, 2025 |
[QA] PokéChamp: an Expert-level Minimax Language Agent
|
Mar 07, 2025 |
PokéChamp: an Expert-level Minimax Language Agent
|
Mar 07, 2025 |
[QA] Token-Efficient Long Video Understanding for Multimodal LLMs
|
Mar 07, 2025 |
Token-Efficient Long Video Understanding for Multimodal LLMs
|
Mar 07, 2025 |
[QA] Position: Model Collapse Does Not Mean What You Think
|
Mar 06, 2025 |
Position: Model Collapse Does Not Mean What You Think
|
Mar 06, 2025 |
[QA] Towards Understanding Distilled Reasoning Models: A Representational Approach
|
Mar 06, 2025 |
Towards Understanding Distilled Reasoning Models: A Representational Approach
|
Mar 06, 2025 |
[QA] Weak-to-Strong Generalization Even in Random Feature Networks, Provably
|
Mar 05, 2025 |
Weak-to-Strong Generalization Even in Random Feature Networks, Provably
|
Mar 05, 2025 |
[QA] Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
|
Mar 05, 2025 |
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
|
Mar 05, 2025 |
[QA] Chain of Draft: Thinking Faster by Writing Less
|
Mar 03, 2025 |
Chain of Draft: Thinking Faster by Writing Less
|
Mar 03, 2025 |
[QA] Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
|
Mar 03, 2025 |
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
|
Mar 03, 2025 |
[QA] Implicit Search via Discrete Diffusion: A Study on Chess
|
Mar 02, 2025 |
Implicit Search via Discrete Diffusion: A Study on Chess
|
Mar 02, 2025 |
[QA] Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars
|
Mar 02, 2025 |
Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars
|
Mar 02, 2025 |
[QA] LightThinker: Thinking Step-by-Step Compression
|
Mar 01, 2025 |
LightThinker: Thinking Step-by-Step Compression
|
Mar 01, 2025 |
[QA] LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
|
Mar 01, 2025 |
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
|
Mar 01, 2025 |
[QA] Self-rewarding correction for mathematical reasoning
|
Feb 28, 2025 |
Self-rewarding correction for mathematical reasoning
|
Feb 28, 2025 |
[QA] I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning
|
Feb 27, 2025 |
I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning
|
Feb 27, 2025 |
[QA] Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
|
Feb 27, 2025 |
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
|
Feb 27, 2025 |
[QA] DeepSeek vs. ChatGPT: A Comparative Study for Scientific Computing and Scientific Machine Learning Tasks
|
Feb 26, 2025 |
DeepSeek vs. ChatGPT: A Comparative Study for Scientific Computing and Scientific Machine Learning Tasks
|
Feb 26, 2025 |
[QA] Yes, Q-learning Helps Offline In-Context RL
|
Feb 26, 2025 |
Yes, Q-learning Helps Offline In-Context RL
|
Feb 26, 2025 |
[QA] Fractal Generative Models
|
Feb 25, 2025 |
Fractal Generative Models
|
Feb 25, 2025 |
[QA] Improving the Scaling Laws of Synthetic Data with Deliberate Practice
|
Feb 24, 2025 |
Improving the Scaling Laws of Synthetic Data with Deliberate Practice
|
Feb 24, 2025 |
[QA] Idiosyncrasies in Large Language Models
|
Feb 23, 2025 |
Idiosyncrasies in Large Language Models
|
Feb 23, 2025 |
[QA] SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
|
Feb 22, 2025 |
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
|
Feb 22, 2025 |
[QA] Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
|
Feb 21, 2025 |
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
|
Feb 21, 2025 |
[QA] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
|
Feb 21, 2025 |
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
|
Feb 21, 2025 |
[QA] Autellix: An Efficient Serving Engine for LLM Agents as General Programs
|
Feb 20, 2025 |
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
|
Feb 20, 2025 |
[QA] Small Models Struggle to Learn from Strong Reasoners
|
Feb 20, 2025 |
Small Models Struggle to Learn from Strong Reasoners
|
Feb 20, 2025 |
[QA] TokenSkip: Controllable Chain-of-Thought Compression in LLMs
|
Feb 19, 2025 |
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
|
Feb 19, 2025 |
[QA] Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
|
Feb 19, 2025 |
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
|
Feb 19, 2025 |
[QA] (How) Can Transformers Predict Pseudo-Random Numbers?
|
Feb 17, 2025 |
(How) Can Transformers Predict Pseudo-Random Numbers?
|
Feb 17, 2025 |
[QA] Do Large Language Models Reason Causally Like Us? Even Better?
|
Feb 17, 2025 |
Do Large Language Models Reason Causally Like Us? Even Better?
|
Feb 17, 2025 |
[QA] Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting
|
Feb 16, 2025 |
Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting
|
Feb 16, 2025 |
[QA] Fino1: On the Transferability of Reasoning‑Enhanced LLMs to Finance
|
Feb 16, 2025 |
Fino1: On the Transferability of Reasoning‑Enhanced LLMs to Finance
|
Feb 16, 2025 |
[QA] The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models
|
Feb 15, 2025 |
The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models
|
Feb 15, 2025 |
[QA] SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
|
Feb 14, 2025 |
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
|
Feb 14, 2025 |
[QA] LLM Pretraining with Continuous Concepts
|
Feb 13, 2025 |
LLM Pretraining with Continuous Concepts
|
Feb 13, 2025 |
[QA] Distillation Scaling Laws
|
Feb 13, 2025 |
Distillation Scaling Laws
|
Feb 13, 2025 |
[QA] Competitive Programming with Large Reasoning Models
|
Feb 12, 2025 |
Competitive Programming with Large Reasoning Models
|
Feb 12, 2025 |
[QA] Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
|
Feb 12, 2025 |
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
|
Feb 12, 2025 |
[QA] DeepCrossAttention: Supercharging Transformer Residual Connections
|
Feb 11, 2025 |
DeepCrossAttention: Supercharging Transformer Residual Connections
|
Feb 11, 2025 |
[QA] Matryoshka Quantization
|
Feb 11, 2025 |
Matryoshka Quantization
|
Feb 11, 2025 |
[QA] When One LLM Drools, Multi-LLM Collaboration Rules
|
Feb 10, 2025 |
When One LLM Drools, Multi-LLM Collaboration Rules
|
Feb 10, 2025 |
[QA] Self-Regulation and Requesting Interventions
|
Feb 10, 2025 |
Self-Regulation and Requesting Interventions
|
Feb 10, 2025 |
[QA] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
|
Feb 10, 2025 |
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
|
Feb 10, 2025 |
[QA] Value-Based Deep RL Scales Predictably
|
Feb 09, 2025 |
Value-Based Deep RL Scales Predictably
|
Feb 09, 2025 |
[QA] Demystifying Long Chain-of-Thought Reasoning in LLMs
|
Feb 09, 2025 |
Demystifying Long Chain-of-Thought Reasoning in LLMs
|
Feb 09, 2025 |
[QA] ULTRAIF: Advancing Instruction Following from the Wild
|
Feb 08, 2025 |
ULTRAIF: Advancing Instruction Following from the Wild
|
Feb 08, 2025 |
[QA] Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
|
Feb 08, 2025 |
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
|
Feb 08, 2025 |
[QA] Examining Two Hop Reasoning Through Information Content Scaling
|
Feb 07, 2025 |
Examining Two Hop Reasoning Through Information Content Scaling
|
Feb 07, 2025 |
[QA] Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
|
Feb 07, 2025 |
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
|
Feb 07, 2025 |
[QA] Do Large Language Model Benchmarks Test Reliability?
|
Feb 06, 2025 |
Do Large Language Model Benchmarks Test Reliability?
|
Feb 06, 2025 |
Detecting Strategic Deception Using Linear Probes
|
Feb 06, 2025 |
[QA] Evaluation of Large Language Models via Coupled Token Generation
|
Feb 05, 2025 |
Evaluation of Large Language Models via Coupled Token Generation
|
Feb 05, 2025 |
[QA] Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
|
Feb 05, 2025 |
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
|
Feb 05, 2025 |
[QA] Should You Use Your Large Language Model to Explore or Exploit?
|
Feb 04, 2025 |
Should You Use Your Large Language Model to Explore or Exploit?
|
Feb 04, 2025 |
[QA] Harmonic Loss Trains Interpretable AI Models
|
Feb 04, 2025 |
Harmonic Loss Trains Interpretable AI Models
|
Feb 04, 2025 |
[QA] Trading inference-time compute for adversarial robustness.
|
Feb 03, 2025 |
Trading inference-time compute for adversarial robustness.
|
Feb 03, 2025 |
[QA] LLMs can see and hear without any training
|
Feb 02, 2025 |
LLMs can see and hear without any training
|
Feb 02, 2025 |
[QA] o3-mini vs DeepSeek-R1: Which One is Safer?
|
Feb 02, 2025 |
o3-mini vs DeepSeek-R1: Which One is Safer?
|
Feb 02, 2025 |
[QA] Optimizing Large Language Model Training Using FP4 Quantization
|
Feb 01, 2025 |
Optimizing Large Language Model Training Using FP4 Quantization
|
Feb 01, 2025 |
[QA] People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
|
Feb 01, 2025 |
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
|
Feb 01, 2025 |
[QA] Large Language Models Think Too Fast To Explore Effectively
|
Jan 31, 2025 |
Large Language Models Think Too Fast To Explore Effectively
|
Jan 31, 2025 |
[QA] Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
|
Jan 31, 2025 |
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
|
Jan 31, 2025 |
[QA] Early External Safety Testing of OpenAI’s o3-mini: Insights from Pre-Deployment Evaluation
|
Jan 30, 2025 |
Early External Safety Testing of OpenAI’s o3-mini: Insights from Pre-Deployment Evaluation
|
Jan 30, 2025 |
[QA] Dynamics of Transient Structure in In-Context Linear Regression Transformers
|
Jan 30, 2025 |
Dynamics of Transient Structure in In-Context Linear Regression Transformers
|
Jan 30, 2025 |
[QA] Context is Key for Agent Security
|
Jan 29, 2025 |
Context is Key for Agent Security
|
Jan 29, 2025 |
[QA] AXBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
|
Jan 29, 2025 |
AXBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
|
Jan 29, 2025 |
[QA] Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
|
Jan 28, 2025 |
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
|
Jan 28, 2025 |
[QA] Feasible Learning
|
Jan 28, 2025 |
Feasible Learning
|
Jan 28, 2025 |
[QA] Humanity's Last Exam
|
Jan 27, 2025 |
Humanity's Last Exam
|
Jan 27, 2025 |
[QA] GaussMark: A Practical Approach for Structural Watermarking of Language Models
|
Jan 27, 2025 |
GaussMark: A Practical Approach for Structural Watermarking of Language Models
|
Jan 27, 2025 |
[QA] Kimi k1.5: Scaling Reinforcement Learning with LLMs
|
Jan 26, 2025 |
Kimi k1.5: Scaling Reinforcement Learning with LLMs
|
Jan 26, 2025 |
[QA] Can We Generate Images with CoT?
|
Jan 26, 2025 |
Can We Generate Images with CoT?
|
Jan 26, 2025 |
[QA] Physics of Skill Learning
|
Jan 25, 2025 |
Physics of Skill Learning
|
Jan 25, 2025 |
[QA] Hallucinations Can Improve Large Language Models in Drug Discovery
|
Jan 25, 2025 |
Hallucinations Can Improve Large Language Models in Drug Discovery
|
Jan 25, 2025 |
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
|
Jan 24, 2025 |
[QA] Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
|
Jan 24, 2025 |
Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
|
Jan 24, 2025 |
[QA] FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
|
Jan 23, 2025 |
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
|
Jan 23, 2025 |
[QA] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
|
Jan 23, 2025 |
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
|
Jan 23, 2025 |
[QA] Reasoning Language Models: A Blueprint
|
Jan 22, 2025 |
Reasoning Language Models: A Blueprint
|
Jan 22, 2025 |
[QA] Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
|
Jan 22, 2025 |
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
|
Jan 22, 2025 |
[QA] Evolving Deeper LLM Thinking
|
Jan 20, 2025 |
Evolving Deeper LLM Thinking
|
Jan 20, 2025 |
PaSa: An LLM Agent for Comprehensive Academic Paper Search
|
Jan 20, 2025 |
[QA] Enhancing Generalization in Chain of Thought Reasoning for Smaller Models
|
Jan 20, 2025 |
[QA] How GPT Learns Layer by Layer
|
Jan 19, 2025 |
How GPT Learns Layer by Layer
|
Jan 19, 2025 |
[QA] FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
|
Jan 19, 2025 |
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
|
Jan 19, 2025 |
[QA] Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
|
Jan 18, 2025 |
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
|
Jan 18, 2025 |
[QA] LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
|
Jan 18, 2025 |
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
|
Jan 18, 2025 |
[QA] Towards Understanding Extrapolation: a Causal Lens
|
Jan 17, 2025 |
Towards Understanding Extrapolation: a Causal Lens
|
Jan 17, 2025 |
[QA] Do generative video models learn physical principles from watching videos?
|
Jan 17, 2025 |
Do generative video models learn physical principles from watching videos?
|
Jan 17, 2025 |
[QA] Joint Learning of Depth and Appearance for Portrait Image Animation
|
Jan 16, 2025 |
Joint Learning of Depth and Appearance for Portrait Image Animation
|
Jan 16, 2025 |
[QA] Dissecting a Small Artificial Neural Network
|
Jan 16, 2025 |
Dissecting a Small Artificial Neural Network
|
Jan 16, 2025 |
[QA] Diffusion Adversarial Post-Training for One-Step Video Generation
|
Jan 15, 2025 |
Diffusion Adversarial Post-Training for One-Step Video Generation
|
Jan 15, 2025 |
[QA] Inference-Time-Compute: More Faithful? A Research Note
|
Jan 15, 2025 |
Inference-Time-Compute: More Faithful? A Research Note
|
Jan 15, 2025 |
[QA] Transformer: Self-adaptive LLMs
|
Jan 14, 2025 |
Transformer: Self-adaptive LLMs
|
Jan 14, 2025 |
[QA] Soup to go: mitigating forgetting during continual learning with model averaging
|
Jan 13, 2025 |
Soup to go: mitigating forgetting during continual learning with model averaging
|
Jan 13, 2025 |
[QA] Emergent Symbol-like Number Variables in Artificial Neural Networks
|
Jan 13, 2025 |
Emergent Symbol-like Number Variables in Artificial Neural Networks
|
Jan 13, 2025 |
[QA] Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
|
Jan 11, 2025 |
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
|
Jan 11, 2025 |
[QA] Representing Long Volumetric Video with Temporal Gaussian Hierarchy
|
Jan 11, 2025 |
Representing Long Volumetric Video with Temporal Gaussian Hierarchy
|
Jan 11, 2025 |
[QA] Uncertainty-aware Knowledge Tracing
|
Jan 10, 2025 |
Uncertainty-aware Knowledge Tracing
|
Jan 10, 2025 |
[QA] The GAN is dead; long live the GAN! A Modern Baseline GAN
|
Jan 10, 2025 |
The GAN is dead; long live the GAN! A Modern Baseline GAN
|
Jan 10, 2025 |
[QA] Supervision-free Vision-Language Alignment
|
Jan 09, 2025 |
Supervision-free Vision-Language Alignment
|
Jan 09, 2025 |
[QA] Grokking at the Edge of Numerical Stability
|
Jan 09, 2025 |
Grokking at the Edge of Numerical Stability
|
Jan 09, 2025 |
[QA] ComMer: a Framework for Compressing and Merging User Data for Personalization
|
Jan 08, 2025 |
ComMer: a Framework for Compressing and Merging User Data for Personalization
|
Jan 08, 2025 |
[QA] Entropy-Guided Attention for Private LLMs
|
Jan 08, 2025 |
Entropy-Guided Attention for Private LLMs
|
Jan 08, 2025 |
[QA] Easing Optimization Paths: a Circuit Perspective
|
Jan 07, 2025 |
Easing Optimization Paths: a Circuit Perspective
|
Jan 07, 2025 |
[QA] Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
|
Jan 07, 2025 |
Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
|
Jan 07, 2025 |
[QA] Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
|
Jan 06, 2025 |
Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
|
Jan 06, 2025 |
[QA] Predicting the Performance of Black-box LLMs through Self-Queries
|
Jan 06, 2025 |
Predicting the Performance of Black-box LLMs through Self-Queries
|
Jan 06, 2025 |
[QA] On Unifying Video Generation and Camera Pose Estimation
|
Jan 04, 2025 |
On Unifying Video Generation and Camera Pose Estimation
|
Jan 04, 2025 |
[QA] An analytic theory of creativity in convolutional diffusion models
|
Jan 04, 2025 |
An analytic theory of creativity in convolutional diffusion models
|
Jan 04, 2025 |
[QA] Finding Missed Code Size Optimizations in Compilers using LLMs
|
Jan 03, 2025 |
Finding Missed Code Size Optimizations in Compilers using LLMs
|
Jan 03, 2025 |
[QA] Titans: Learning to Memorize at Test Time
|
Jan 03, 2025 |
Titans: Learning to Memorize at Test Time
|
Jan 03, 2025 |
[QA] Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
|
Dec 31, 2024 |
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
|
Dec 31, 2024 |
[QA] Functional Risk Minimization
|
Dec 31, 2024 |
Functional Risk Minimization
|
Dec 31, 2024 |
[QA] HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
|
Dec 30, 2024 |
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
|
Dec 30, 2024 |
InfAlign: Inference-aware language model alignment
|
Dec 30, 2024 |
[QA] Consistency Checks for Language Model Forecasters
|
Dec 25, 2024 |
Consistency Checks for Language Model Forecasters
|
Dec 25, 2024 |
[QA] Deliberation in Latent Space via Differentiable Cache Augmentation
|
Dec 24, 2024 |
Deliberation in Latent Space via Differentiable Cache Augmentation
|
Dec 24, 2024 |
[QA] Automating the Search for Artificial Life with Foundation Models
|
Dec 24, 2024 |
Automating the Search for Artificial Life with Foundation Models
|
Dec 24, 2024 |
[QA] LLMs for Literature Review: Are we there yet?
|
Dec 23, 2024 |
LLMs for Literature Review: Are we there yet?
|
Dec 23, 2024 |
[QA] WebLLM: A High-Performance In-Browser LLM Inference Engine
|
Dec 23, 2024 |
WebLLM: A High-Performance In-Browser LLM Inference Engine
|
Dec 23, 2024 |
[QA] A Survey on LLM Inference-Time Self-Improvement
|
Dec 22, 2024 |
A Survey on LLM Inference-Time Self-Improvement
|
Dec 22, 2024 |
[QA] Tokenisation is NP-Complete
|
Dec 22, 2024 |
Tokenisation is NP-Complete
|
Dec 22, 2024 |
[QA] The Open-Source Advantage in Large Language Models (LLMs)
|
Dec 21, 2024 |
The Open-Source Advantage in Large Language Models (LLMs)
|
Dec 21, 2024 |
[QA] Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
|
Dec 21, 2024 |
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
|
Dec 21, 2024 |
[QA] DriveGPT: Scaling Autoregressive Behavior Models for Driving
|
Dec 20, 2024 |
DriveGPT: Scaling Autoregressive Behavior Models for Driving
|
Dec 20, 2024 |
[QA] MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
|
Dec 20, 2024 |
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
|
Dec 20, 2024 |
[QA] Byte Latent Transformer: Patches Scale Better Than Tokens
|
Dec 18, 2024 |
Byte Latent Transformer: Patches Scale Better Than Tokens
|
Dec 18, 2024 |
[QA] Transformers Struggle to Learn to Search
|
Dec 09, 2024 |
Transformers Struggle to Learn to Search
|
Dec 09, 2024 |
[QA] Navigation World Models
|
Dec 07, 2024 |
Navigation World Models
|
Dec 07, 2024 |
[QA] Motion Prompting: Controlling Video Generation with Motion Trajectories
|
Dec 07, 2024 |
Motion Prompting: Controlling Video Generation with Motion Trajectories
|
Dec 07, 2024 |
[QA] Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
|
Dec 06, 2024 |
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
|
Dec 06, 2024 |
[QA] NVILA: Efficient Frontier Visual Language Models
|
Dec 06, 2024 |
NVILA: Efficient Frontier Visual Language Models
|
Dec 06, 2024 |
[QA] o1-Coder: an o1 Replication for Coding
|
Dec 03, 2024 |
o1-Coder: an o1 Replication for Coding
|
Dec 03, 2024 |
[QA] Efficient Track Anything
|
Dec 03, 2024 |
Efficient Track Anything
|
Dec 03, 2024 |
[QA] Reverse Thinking Makes LLMs Stronger Reasoners
|
Dec 02, 2024 |
Reverse Thinking Makes LLMs Stronger Reasoners
|
Dec 02, 2024 |
[QA] Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM’s Reasoning Capability
|
Dec 02, 2024 |
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM’s Reasoning Capability
|
Dec 02, 2024 |
[QA] JetFormer: an autoregressive generative model of raw images and text
|
Dec 02, 2024 |
JetFormer: an autoregressive generative model of raw images and text
|
Dec 02, 2024 |
[QA] CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
|
Nov 29, 2024 |
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
|
Nov 29, 2024 |
Attamba: Attending To Multi-Token States
|
Nov 28, 2024 |
[QA] Star Attention: Efficient LLM Inference over Long Sequences
|
Nov 28, 2024 |
Star Attention: Efficient LLM Inference over Long Sequences
|
Nov 28, 2024 |
[QA] ROICtrl: Boosting Instance Control for Visual Generation
|
Nov 28, 2024 |
ROICtrl: Boosting Instance Control for Visual Generation
|
Nov 28, 2024 |
[QA] Solaris: A Foundation Model of the Sun
|
Nov 26, 2024 |
Solaris: A Foundation Model of the Sun
|
Nov 26, 2024 |
[QA] Even Sparser Graph Transformers
|
Nov 26, 2024 |
Even Sparser Graph Transformers
|
Nov 26, 2024 |
[QA] Understanding LLM Embeddings for Regression
|
Nov 25, 2024 |
Understanding LLM Embeddings for Regression
|
Nov 25, 2024 |
[QA] Loss-to-Loss Prediction: Scaling Laws for All Datasets
|
Nov 24, 2024 |
Loss-to-Loss Prediction: Scaling Laws for All Datasets
|
Nov 24, 2024 |
[QA] When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
|
Nov 23, 2024 |
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
|
Nov 23, 2024 |
[QA] SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
|
Nov 23, 2024 |
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
|
Nov 23, 2024 |
[QA] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
|
Nov 22, 2024 |
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
|
Nov 22, 2024 |
[QA] Hymba: A Hybrid-head Architecture for Small Language Models
|
Nov 22, 2024 |
Hymba: A Hybrid-head Architecture for Small Language Models
|
Nov 22, 2024 |
[QA] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
|
Nov 21, 2024 |
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
|
Nov 21, 2024 |
[QA] Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
|
Nov 20, 2024 |
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
|
Nov 20, 2024 |
[QA] Introducing Milabench: Benchmarking Accelerators for AI
|
Nov 20, 2024 |
Introducing Milabench: Benchmarking Accelerators for AI
|
Nov 20, 2024 |
[QA] Does Prompt Formatting Have Any Impact on LLM Performance?
|
Nov 19, 2024 |
Does Prompt Formatting Have Any Impact on LLM Performance?
|
Nov 19, 2024 |
[QA] Steering Language Model Refusal with Sparse Autoencoders
|
Nov 19, 2024 |
Steering Language Model Refusal with Sparse Autoencoders
|
Nov 19, 2024 |
[QA] LLaVA-o1: Let Vision Language Models Reason Step-by-Step
|
Nov 18, 2024 |
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
|
Nov 18, 2024 |
[QA] Refusal in LLMs is an Affine Function
|
Nov 15, 2024 |
Refusal in LLMs is an Affine Function
|
Nov 15, 2024 |
[QA] Cut Your Losses in Large-Vocabulary Language Models
|
Nov 15, 2024 |
Cut Your Losses in Large-Vocabulary Language Models
|
Nov 15, 2024 |
[QA] Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
|
Nov 12, 2024 |
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
|
Nov 12, 2024 |
[QA] Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
|
Nov 12, 2024 |
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
|
Nov 12, 2024 |
[QA] Aioli: A unified optimization framework for language model data mixing
|
Nov 11, 2024 |
Aioli: A unified optimization framework for language model data mixing
|
Nov 11, 2024 |
[QA] BALANCING PIPELINE PARALLELISM WITH VOCABULARY PARALLELISM
|
Nov 11, 2024 |
BALANCING PIPELINE PARALLELISM WITH VOCABULARY PARALLELISM
|
Nov 11, 2024 |
[QA] Can Transformers Smell Like Humans?
|
Nov 10, 2024 |
Can Transformers Smell Like Humans?
|
Nov 10, 2024 |
[QA] Mixtures of In-Context Learners
|
Nov 10, 2024 |
Mixtures of In-Context Learners
|
Nov 10, 2024 |
[QA] How Far Is Video Generation from World Model: A Physical Law Perspective
|
Nov 09, 2024 |
How Far Is Video Generation from World Model: A Physical Law Perspective
|
Nov 09, 2024 |
[QA] ADOPT: Modified Adam Can Converge with Any with the Optimal Rate
|
Nov 09, 2024 |
ADOPT: Modified Adam Can Converge with Any with the Optimal Rate
|
Nov 09, 2024 |
[QA] Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
|
Nov 08, 2024 |
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
|
Nov 08, 2024 |
[QA] Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
|
Nov 08, 2024 |
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
|
Nov 08, 2024 |
[QA] Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
|
Nov 07, 2024 |
Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
|
Nov 07, 2024 |
[QA] How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
|
Nov 07, 2024 |
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
|
Nov 07, 2024 |
[QA] Discovering Data Structures: Nearest Neighbor Search and Beyond
|
Nov 06, 2024 |
Discovering Data Structures: Nearest Neighbor Search and Beyond
|
Nov 06, 2024 |
[QA] BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?
|
Nov 06, 2024 |
BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?
|
Nov 06, 2024 |
[QA] Adapting Language Models via Token Translation
|
Nov 04, 2024 |
Adapting Language Models via Token Translation
|
Nov 04, 2024 |
[QA] Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
|
Nov 04, 2024 |
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
|
Nov 04, 2024 |
[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
|
Nov 02, 2024 |
Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
|
Nov 02, 2024 |
[QA] $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
|
Nov 02, 2024 |
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
|
Nov 02, 2024 |
[QA] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
|
Nov 01, 2024 |
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
|
Nov 01, 2024 |
[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
|
Oct 31, 2024 |
Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
|
Oct 31, 2024 |
[QA] Where Do Large Learning Rates Lead Us?
|
Oct 30, 2024 |
Where Do Large Learning Rates Lead Us?
|
Oct 30, 2024 |
[QA] Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
|
Oct 30, 2024 |
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
|
Oct 30, 2024 |
[QA] LoRA vs Full Fine-tuning: An Illusion of Equivalence
|
Oct 29, 2024 |
LoRA vs Full Fine-tuning: An Illusion of Equivalence
|
Oct 29, 2024 |
[QA] Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
|
Oct 28, 2024 |
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
|
Oct 28, 2024 |
[QA] Computational Bottlenecks of Training Small-scale Large Language Models
|
Oct 28, 2024 |
Computational Bottlenecks of Training Small-scale Large Language Models
|
Oct 28, 2024 |
[QA] Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees
|
Oct 26, 2024 |
Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees
|
Oct 26, 2024 |
[QA] A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
|
Oct 26, 2024 |
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
|
Oct 26, 2024 |
[QA] LEGO: Language Model Building Blocks
|
Oct 25, 2024 |
LEGO: Language Model Building Blocks
|
Oct 25, 2024 |
[QA] Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data
|
Oct 25, 2024 |
Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data
|
Oct 25, 2024 |
[QA] Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers
|
Oct 24, 2024 |
Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers
|
Oct 24, 2024 |
[QA] ALTA: Compiler-Based Analysis of Transformers
|
Oct 24, 2024 |
ALTA: Compiler-Based Analysis of Transformers
|
Oct 24, 2024 |
[QA] UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
|
Oct 23, 2024 |
UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
|
Oct 23, 2024 |
[QA] Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
|
Oct 23, 2024 |
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
|
Oct 23, 2024 |
[QA] Generative Reward Models
|
Oct 22, 2024 |
Generative Reward Models
|
Oct 22, 2024 |
[QA] Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks
|
Oct 21, 2024 |
Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks
|
Oct 21, 2024 |
[QA] Decomposing The Dark Matter of Sparse Autoencoders
|
Oct 21, 2024 |
Decomposing The Dark Matter of Sparse Autoencoders
|
Oct 21, 2024 |
[QA] A Hitchhiker's Guide to Scaling Law Estimation
|
Oct 20, 2024 |
A Hitchhiker's Guide to Scaling Law Estimation
|
Oct 20, 2024 |
[QA] Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
|
Oct 20, 2024 |
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
|
Oct 20, 2024 |
[QA] Looking Inward: Language Models Can Learn About Themselves by Introspection
|
Oct 19, 2024 |
Looking Inward: Language Models Can Learn About Themselves by Introspection
|
Oct 19, 2024 |
[QA] Thinking LLMs: General Instruction Following with Thought Generation
|
Oct 19, 2024 |
Thinking LLMs: General Instruction Following with Thought Generation
|
Oct 19, 2024 |
[QA] Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
|
Oct 18, 2024 |
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
|
Oct 18, 2024 |
[QA] MOVIE GEN: A Cast of Media Foundation Models
|
Oct 18, 2024 |
MOVIE GEN: A Cast of Media Foundation Models
|
Oct 18, 2024 |
[QA] One Step Diffusion via Shortcut Models
|
Oct 17, 2024 |
One Step Diffusion via Shortcut Models
|
Oct 17, 2024 |
[QA] Inference Scaling for Long-Context Retrieval Augmented Generation
|
Oct 17, 2024 |
Inference Scaling for Long-Context Retrieval Augmented Generation
|
Oct 17, 2024 |
[QA] What Matters in Transformers? Not All Attention is Needed
|
Oct 16, 2024 |
What Matters in Transformers? Not All Attention is Needed
|
Oct 16, 2024 |
[QA] Language Models Encode Numbers Using Digit Representations in Base 10
|
Oct 16, 2024 |
Language Models Encode Numbers Using Digit Representations in Base 10
|
Oct 16, 2024 |
[QA] Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs
|
Oct 14, 2024 |
Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs
|
Oct 14, 2024 |
[QA] Do Unlearning Methods Remove Information from Language Model Weights?
|
Oct 14, 2024 |
Do Unlearning Methods Remove Information from Language Model Weights?
|
Oct 14, 2024 |
[QA] MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
|
Oct 13, 2024 |
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
|
Oct 13, 2024 |
[QA] Pixtral 12B
|
Oct 13, 2024 |
Pixtral 12B
|
Oct 13, 2024 |
[QA] Differential Transformer
|
Oct 12, 2024 |
Differential Transformer
|
Oct 12, 2024 |
[QA] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
|
Oct 12, 2024 |
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
|
Oct 12, 2024 |
[QA] Efficient Dictionary Learning with Switch Sparse Autoencoders
|
Oct 11, 2024 |
Efficient Dictionary Learning with Switch Sparse Autoencoders
|
Oct 11, 2024 |
[QA] Visual Scratchpads: Enabling Global Reasoning in Vision
|
Oct 11, 2024 |
Visual Scratchpads: Enabling Global Reasoning in Vision
|
Oct 11, 2024 |
[QA] RL, but don't do anything I wouldn't do
|
Oct 10, 2024 |
RL, but don't do anything I wouldn't do
|
Oct 10, 2024 |
[QA] Restructuring Vector Quantization with the Rotation Trick
|
Oct 10, 2024 |
Restructuring Vector Quantization with the Rotation Trick
|
Oct 10, 2024 |
[QA] EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM?
|
Oct 08, 2024 |
EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM?
|
Oct 08, 2024 |
[QA] Density estimation with LLMs: a geometric investigation of in-context learning trajectories
|
Oct 08, 2024 |
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
|
Oct 08, 2024 |
[QA] Teaching Transformers Modular Arithmetic at Scale
|
Oct 07, 2024 |
Teaching Transformers Modular Arithmetic at Scale
|
Oct 07, 2024 |
[QA] What Matters for Model Merging at Scale?
|
Oct 07, 2024 |
What Matters for Model Merging at Scale?
|
Oct 07, 2024 |
[QA] Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
|
Oct 05, 2024 |
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
|
Oct 05, 2024 |
[QA] Were RNNs All We Needed?
|
Oct 05, 2024 |
Were RNNs All We Needed?
|
Oct 05, 2024 |
[QA] OOD-CHAMELEON: Is Algorithm Selection for OOD Generalization Learnable?
|
Oct 04, 2024 |
OOD-CHAMELEON: Is Algorithm Selection for OOD Generalization Learnable?
|
Oct 04, 2024 |
[QA] Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
|
Oct 04, 2024 |
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
|
Oct 04, 2024 |
[QA] Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
|
Oct 03, 2024 |
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
|
Oct 03, 2024 |
[QA] Not All LLM Reasoners Are Created Equal
|
Oct 03, 2024 |
Not All LLM Reasoners Are Created Equal
|
Oct 03, 2024 |
[QA] Law of the Weakest Link: Cross Capabilities of Large Language Models
|
Oct 02, 2024 |
Law of the Weakest Link: Cross Capabilities of Large Language Models
|
Oct 02, 2024 |
[QA] Realistic Evaluation of Model Merging for Compositional Generalization
|
Oct 01, 2024 |
Realistic Evaluation of Model Merging for Compositional Generalization
|
Oct 01, 2024 |
[QA] Emu3: Next-Token Prediction is All You Need
|
Sep 30, 2024 |
Emu3: Next-Token Prediction is All You Need
|
Sep 30, 2024 |
[QA] MIO: A Foundation Model on Multimodal Tokens
|
Sep 30, 2024 |
MIO: A Foundation Model on Multimodal Tokens
|
Sep 30, 2024 |
[QA] A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor ?
|
Sep 29, 2024 |
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor ?
|
Sep 29, 2024 |
[QA] Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
|
Sep 29, 2024 |
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
|
Sep 29, 2024 |
[QA] Making Text Embedders Few-Shot Learners
|
Sep 28, 2024 |
Making Text Embedders Few-Shot Learners
|
Sep 28, 2024 |
[QA] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
|
Sep 28, 2024 |
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
|
Sep 28, 2024 |
[QA] Infer Human's Intentions Before Following Natural Language Instruction
|
Sep 27, 2024 |
Infer Human's Intentions Before Following Natural Language Instruction
|
Sep 27, 2024 |
[QA] MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
|
Sep 27, 2024 |
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
|
Sep 27, 2024 |
[QA] Counterfactual Token Generation in Large Language Models
|
Sep 26, 2024 |
Counterfactual Token Generation in Large Language Models
|
Sep 26, 2024 |
[QA] Characterizing stable regions in the residual stream of LLMs
|
Sep 26, 2024 |
Characterizing stable regions in the residual stream of LLMs
|
Sep 26, 2024 |
[QA] Watch Your Steps: Observable and Modular Chains of Thought
|
Sep 25, 2024 |
Watch Your Steps: Observable and Modular Chains of Thought
|
Sep 25, 2024 |
[QA] Seeing Faces in Things: A Model and Dataset for Pareidolia
|
Sep 25, 2024 |
Seeing Faces in Things: A Model and Dataset for Pareidolia
|
Sep 25, 2024 |
[QA] Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
|
Sep 24, 2024 |
Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
|
Sep 24, 2024 |
[QA] Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking
|
Sep 24, 2024 |
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking
|
Sep 24, 2024 |
[QA] LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
|
Sep 23, 2024 |
LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
|
Sep 23, 2024 |
[QA] Embedding Geometries of Contrastive Language-Image Pre-Training
|
Sep 23, 2024 |
Embedding Geometries of Contrastive Language-Image Pre-Training
|
Sep 23, 2024 |
[QA] Kolmogorov–Arnold Transformer
|
Sep 21, 2024 |
Kolmogorov–Arnold Transformer
|
Sep 21, 2024 |
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
|
Sep 21, 2024 |
[QA] Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
|
Sep 21, 2024 |
[QA] Re-Introducing LayerNorm: Geometric Meaning, Irreversibility and a Comparative Study with RMSNorm
|
Sep 20, 2024 |
Re-Introducing LayerNorm: Geometric Meaning, Irreversibility and a Comparative Study with RMSNorm
|
Sep 20, 2024 |
[QA] Is Tokenization Needed for Masked Particle Modelling?
|
Sep 20, 2024 |
Is Tokenization Needed for Masked Particle Modelling?
|
Sep 20, 2024 |
[QA] Finetuning Language Models to Emit Linguistic Expressions of Uncertainty
|
Sep 19, 2024 |
Finetuning Language Models to Emit Linguistic Expressions of Uncertainty
|
Sep 19, 2024 |
[QA] To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
|
Sep 19, 2024 |
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
|
Sep 19, 2024 |
[QA] On the limits of agency in agent-based models
|
Sep 18, 2024 |
On the limits of agency in agent-based models
|
Sep 18, 2024 |
[QA] Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
|
Sep 18, 2024 |
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
|
Sep 18, 2024 |
[QA] Finetuning CLIP to Reason about Pairwise Differences
|
Sep 17, 2024 |
Finetuning CLIP to Reason about Pairwise Differences
|
Sep 17, 2024 |
[QA] Think Twice Before You Act: Improving Inverse Problem Solving With MCMC
|
Sep 16, 2024 |
Think Twice Before You Act: Improving Inverse Problem Solving With MCMC
|
Sep 16, 2024 |
[QA] Explaining Datasets in Words: Statistical Models with Natural Language Parameters
|
Sep 16, 2024 |
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
|
Sep 16, 2024 |
LLMs Will Always Hallucinate, and We Need to Live With This
|
Sep 15, 2024 |
[QA] PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
|
Sep 15, 2024 |
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
|
Sep 15, 2024 |
[QA] LLaMA-Omni: Seamless Speech Interaction with Large Language Models
|
Sep 14, 2024 |
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
|
Sep 14, 2024 |
[QA] WINDOWS AGENT ARENA: Evaluating Multi-Modal OS Agents at Scale
|
Sep 14, 2024 |
WINDOWS AGENT ARENA: Evaluating Multi-Modal OS Agents at Scale
|
Sep 14, 2024 |
[QA] What Makes a Maze Look Like a Maze?
|
Sep 13, 2024 |
What Makes a Maze Look Like a Maze?
|
Sep 13, 2024 |
[QA] Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
|
Sep 13, 2024 |
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
|
Sep 13, 2024 |
[QA] Synthetic continued pretraining
|
Sep 12, 2024 |
Synthetic continued pretraining
|
Sep 12, 2024 |
[QA] Agent Workflow Memory
|
Sep 12, 2024 |
Agent Workflow Memory
|
Sep 12, 2024 |
[QA] Programming Refusal with Conditional Activation Steering
|
Sep 11, 2024 |
Programming Refusal with Conditional Activation Steering
|
Sep 11, 2024 |
[QA] Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
|
Sep 11, 2024 |
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
|
Sep 11, 2024 |
[QA] LoCa: Logit Calibration for Knowledge Distillation
|
Sep 10, 2024 |
LoCa: Logit Calibration for Knowledge Distillation
|
Sep 10, 2024 |
[QA] Theory, Analysis, and Best Practices for Sigmoid Self-Attention
|
Sep 09, 2024 |
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
|
Sep 09, 2024 |
[QA] Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
|
Sep 09, 2024 |
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
|
Sep 09, 2024 |
[QA] xLAM: A Family of Large Action Models to Empower AI Agent Systems
|
Sep 07, 2024 |
xLAM: A Family of Large Action Models to Empower AI Agent Systems
|
Sep 07, 2024 |
[QA] In Defense of RAG in the Era of Long-Context Language Models
|
Sep 07, 2024 |
In Defense of RAG in the Era of Long-Context Language Models
|
Sep 07, 2024 |
[QA] Building Math Agents with Multi-Turn Iterative Preference Learning
|
Sep 07, 2024 |
Building Math Agents with Multi-Turn Iterative Preference Learning
|
Sep 07, 2024 |
[QA] Attention Heads of Large Language Models: A Survey
|
Sep 07, 2024 |
Attention Heads of Large Language Models: A Survey
|
Sep 07, 2024 |
[QA] The AdEMAMix Optimizer: Better, Faster, Older
|
Sep 06, 2024 |
The AdEMAMix Optimizer: Better, Faster, Older
|
Sep 06, 2024 |
[QA] Planning In Natural Language Improves LLM Search For Code Generation
|
Sep 06, 2024 |
Planning In Natural Language Improves LLM Search For Code Generation
|
Sep 06, 2024 |
[QA] MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
|
Sep 05, 2024 |
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
|
Sep 05, 2024 |
[QA] Sample what you can't compress
|
Sep 05, 2024 |
Sample what you can't compress
|
Sep 05, 2024 |
[QA] CONTEXTCITE: Attributing Model Generation to Context
|
Sep 04, 2024 |
CONTEXTCITE: Attributing Model Generation to Context
|
Sep 04, 2024 |
[QA] FLUX that Plays Music
|
Sep 04, 2024 |
FLUX that Plays Music
|
Sep 04, 2024 |
[QA] Modularity in Transformers: Investigating Neuron Separability & Specialization
|
Sep 03, 2024 |
Modularity in Transformers: Investigating Neuron Separability & Specialization
|
Sep 03, 2024 |
[QA] Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
|
Sep 02, 2024 |
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
|
Sep 02, 2024 |
[QA] Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
|
Aug 29, 2024 |
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
|
Aug 29, 2024 |
[QA] CycleGAN with Better Cycles
|
Aug 29, 2024 |
CycleGAN with Better Cycles
|
Aug 29, 2024 |
[QA] The Mamba in the Llama: Distilling and Accelerating Hybrid Models
|
Aug 28, 2024 |
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
|
Aug 28, 2024 |
[QA] Generative Verifiers: Reward Modeling as Next-Token Prediction
|
Aug 28, 2024 |
Generative Verifiers: Reward Modeling as Next-Token Prediction
|
Aug 28, 2024 |
[QA] Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
|
Aug 27, 2024 |
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
|
Aug 27, 2024 |
[QA] A Law of Next-Token Prediction in Large Language Models
|
Aug 27, 2024 |
A Law of Next-Token Prediction in Large Language Models
|
Aug 27, 2024 |
[QA] SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection
|
Aug 26, 2024 |
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection
|
Aug 26, 2024 |
[QA] How Diffusion Models Learn to Factorize and Compose
|
Aug 26, 2024 |
How Diffusion Models Learn to Factorize and Compose
|
Aug 26, 2024 |
[QA] FERRET: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
|
Aug 25, 2024 |
FERRET: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
|
Aug 25, 2024 |
[QA] Scalable Autoregressive Image Generation with Mamba
|
Aug 25, 2024 |
Scalable Autoregressive Image Generation with Mamba
|
Aug 25, 2024 |
[QA] TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
|
Aug 25, 2024 |
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
|
Aug 25, 2024 |
[QA] FocusLLM: Scaling LLM's Context by Parallel Decoding
|
Aug 25, 2024 |
FocusLLM: Scaling LLM's Context by Parallel Decoding
|
Aug 25, 2024 |
[QA] Sapiens: Foundation for Human Vision Models
|
Aug 24, 2024 |
Sapiens: Foundation for Human Vision Models
|
Aug 24, 2024 |
[QA] Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
|
Aug 24, 2024 |
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
|
Aug 24, 2024 |
[QA] Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
|
Aug 23, 2024 |
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
|
Aug 23, 2024 |
[QA] Hermes 3 Technical Report
|
Aug 23, 2024 |
Hermes 3 Technical Report
|
Aug 23, 2024 |
[QA] LLM Pruning and Distillation in Practice: The Minitron Approach
|
Aug 22, 2024 |
LLM Pruning and Distillation in Practice: The Minitron Approach
|
Aug 22, 2024 |
[QA] Approaching Deep Learning through the Spectral Dynamics of Weights
|
Aug 22, 2024 |
Approaching Deep Learning through the Spectral Dynamics of Weights
|
Aug 22, 2024 |
[QA] Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
|
Aug 21, 2024 |
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
|
Aug 21, 2024 |
[QA] Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
|
Aug 21, 2024 |
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
|
Aug 21, 2024 |
[QA] Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
|
Aug 20, 2024 |
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
|
Aug 20, 2024 |
[QA] JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
|
Aug 19, 2024 |
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
|
Aug 19, 2024 |
[QA] TextCAVs: Debugging vision models using text
|
Aug 19, 2024 |
TextCAVs: Debugging vision models using text
|
Aug 19, 2024 |
[QA] Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
|
Aug 18, 2024 |
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
|
Aug 18, 2024 |
[QA] Towards flexible perception with visual memory
|
Aug 17, 2024 |
Towards flexible perception with visual memory
|
Aug 17, 2024 |
[QA] I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
|
Aug 17, 2024 |
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
|
Aug 17, 2024 |
[QA] BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
|
Aug 16, 2024 |
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
|
Aug 16, 2024 |
[QA] Can Large Language Models Understand Symbolic Graphics Programs?
|
Aug 16, 2024 |
Can Large Language Models Understand Symbolic Graphics Programs?
|
Aug 16, 2024 |
[QA] Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
|
Aug 15, 2024 |
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
|
Aug 15, 2024 |
[QA] Generative Photomontage
|
Aug 15, 2024 |
Generative Photomontage
|
Aug 15, 2024 |
[QA] Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
|
Aug 14, 2024 |
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
|
Aug 14, 2024 |
[QA] Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction
|
Aug 14, 2024 |
Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction
|
Aug 14, 2024 |
[QA] The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
|
Aug 13, 2024 |
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
|
Aug 13, 2024 |
[QA] Body Transformer: Leveraging Robot Embodiment for Policy Learning
|
Aug 13, 2024 |
Body Transformer: Leveraging Robot Embodiment for Policy Learning
|
Aug 13, 2024 |
[QA] VITA: Towards Open-Source Interactive Omni Multimodal LLM
|
Aug 12, 2024 |
VITA: Towards Open-Source Interactive Omni Multimodal LLM
|
Aug 12, 2024 |
[QA] Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
|
Aug 12, 2024 |
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
|
Aug 12, 2024 |
[QA] Diffusion Guided Language Modeling
|
Aug 11, 2024 |
Diffusion Guided Language Modeling
|
Aug 11, 2024 |
[QA] Conversational Prompt Engineering
|
Aug 11, 2024 |
Conversational Prompt Engineering
|
Aug 11, 2024 |
[QA] Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
|
Aug 10, 2024 |
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
|
Aug 10, 2024 |
[QA] Better Alignment with Instruction Back-and-Forth Translation
|
Aug 10, 2024 |
Better Alignment with Instruction Back-and-Forth Translation
|
Aug 10, 2024 |
[QA] The Ungrounded Alignment Problem
|
Aug 09, 2024 |
The Ungrounded Alignment Problem
|
Aug 09, 2024 |
[QA] Prioritize Alignment in Dataset Distillation
|
Aug 08, 2024 |
Prioritize Alignment in Dataset Distillation
|
Aug 08, 2024 |
[QA] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
|
Aug 08, 2024 |
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
|
Aug 08, 2024 |
[QA] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
|
Aug 07, 2024 |
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
|
Aug 07, 2024 |
[QA] Language Model Can Listen While Speaking
|
Aug 06, 2024 |
Language Model Can Listen While Speaking
|
Aug 06, 2024 |
[QA] Self-Taught Evaluators
|
Aug 06, 2024 |
Self-Taught Evaluators
|
Aug 06, 2024 |
[QA] Conditional LoRA Parameter Generation
|
Aug 05, 2024 |
Conditional LoRA Parameter Generation
|
Aug 05, 2024 |
[QA] Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
|
Aug 05, 2024 |
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
|
Aug 05, 2024 |
[QA] Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
|
Aug 04, 2024 |
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
|
Aug 04, 2024 |
[QA] MindSearch : Mimicking Human Minds Elicits Deep AI Searcher
|
Aug 04, 2024 |
MindSearch : Mimicking Human Minds Elicits Deep AI Searcher
|
Aug 04, 2024 |
[QA] Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
|
Aug 03, 2024 |
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
|
Aug 03, 2024 |
[QA] Gemma 2: Improving Open Language Models at a Practical Size
|
Aug 03, 2024 |
Gemma 2: Improving Open Language Models at a Practical Size
|
Aug 03, 2024 |
[QA] AI-Assisted Generation of Difficult Math Questions
|
Aug 02, 2024 |
AI-Assisted Generation of Difficult Math Questions
|
Aug 02, 2024 |
[QA] SAM 2: Segment Anything in Images and Videos
|
Aug 02, 2024 |
SAM 2: Segment Anything in Images and Videos
|
Aug 02, 2024 |
[QA] MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
|
Aug 01, 2024 |
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
|
Aug 01, 2024 |
[QA] Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
|
Aug 01, 2024 |
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
|
Aug 01, 2024 |
[QA] Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
|
Jul 31, 2024 |
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
|
Jul 31, 2024 |
[QA] How to Measure the Intelligence of Large Language Models?
|
Jul 31, 2024 |
How to Measure the Intelligence of Large Language Models?
|
Jul 31, 2024 |
[QA] CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
|
Jul 28, 2024 |
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
|
Jul 28, 2024 |
[QA] Recursive Introspection: Teaching Language Model Agents How to Self-Improve
|
Jul 28, 2024 |
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
|
Jul 28, 2024 |
[QA] Course-Correction: Safety Alignment Using Synthetic Preferences
|
Jul 27, 2024 |
Course-Correction: Safety Alignment Using Synthetic Preferences
|
Jul 27, 2024 |
[QA] Solving The Travelling Salesman Problem Using A Single Qubit
|
Jul 27, 2024 |
Solving The Travelling Salesman Problem Using A Single Qubit
|
Jul 27, 2024 |
[QA] Exploring Scaling Trends in LLM Robustness
|
Jul 26, 2024 |
Exploring Scaling Trends in LLM Robustness
|
Jul 26, 2024 |
[QA] VILA2: VILA Augmented VILA
|
Jul 25, 2024 |
VILA2: VILA Augmented VILA
|
Jul 25, 2024 |
[QA] OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
|
Jul 25, 2024 |
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
|
Jul 25, 2024 |
[QA] MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
|
Jul 24, 2024 |
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
|
Jul 24, 2024 |
[QA] KAN or MLP: A Fairer Comparison
|
Jul 24, 2024 |
KAN or MLP: A Fairer Comparison
|
Jul 24, 2024 |
[QA] Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
|
Jul 23, 2024 |
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
|
Jul 23, 2024 |
[QA] BOND: Aligning LLMs with Best-of-N Distillation
|
Jul 23, 2024 |
BOND: Aligning LLMs with Best-of-N Distillation
|
Jul 23, 2024 |
[QA] Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
|
Jul 23, 2024 |
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
|
Jul 23, 2024 |
[QA] LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
|
Jul 23, 2024 |
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
|
Jul 23, 2024 |
[QA] Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
|
Jul 21, 2024 |
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
|
Jul 21, 2024 |
[QA] NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
|
Jul 21, 2024 |
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
|
Jul 21, 2024 |
[QA] Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
|
Jul 20, 2024 |
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
|
Jul 20, 2024 |
[QA] Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
|
Jul 20, 2024 |
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
|
Jul 20, 2024 |
[QA] Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
|
Jul 19, 2024 |
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
|
Jul 19, 2024 |
[QA] Beyond KV Caching: Shared Attention for Efficient LLMs
|
Jul 19, 2024 |
Beyond KV Caching: Shared Attention for Efficient LLMs
|
Jul 19, 2024 |
[QA] Private prediction for large-scale synthetic text generation
|
Jul 18, 2024 |
Private prediction for large-scale synthetic text generation1
|
Jul 18, 2024 |
[QA] Chip Placement with Diffusion
|
Jul 18, 2024 |
Chip Placement with Diffusion
|
Jul 18, 2024 |
[QA] Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness
|
Jul 17, 2024 |
Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness
|
Jul 17, 2024 |
[QA] Does Refusal Training in LLMs Generalize to the Past Tense?
|
Jul 17, 2024 |
Does Refusal Training in LLMs Generalize to the Past Tense?
|
Jul 17, 2024 |
[QA] No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations
|
Jul 16, 2024 |
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations
|
Jul 16, 2024 |
[QA] LLM Circuit Analyses Are Consistent Across Training and Scale
|
Jul 16, 2024 |
LLM Circuit Analyses Are Consistent Across Training and Scale
|
Jul 16, 2024 |
[QA] SPREADSHEETLLM: Encoding Spreadsheets for Large Language Models
|
Jul 15, 2024 |
SPREADSHEETLLM: Encoding Spreadsheets for Large Language Models
|
Jul 15, 2024 |
[QA] Transformer Layers as Painters
|
Jul 15, 2024 |
Transformer Layers as Painters
|
Jul 15, 2024 |
[QA] Lynx: An Open Source Hallucination Evaluation Model
|
Jul 14, 2024 |
Lynx: An Open Source Hallucination Evaluation Model
|
Jul 14, 2024 |
[QA] Deconstructing What Makes a Good Optimizer for Language Models
|
Jul 14, 2024 |
Deconstructing What Makes a Good Optimizer for Language Models
|
Jul 14, 2024 |
[QA] Distilling System 2 into System 1
|
Jul 13, 2024 |
Distilling System 2 into System 1
|
Jul 13, 2024 |
[QA] Video Diffusion Alignment via Reward Gradients
|
Jul 13, 2024 |
Video Diffusion Alignment via Reward Gradients
|
Jul 13, 2024 |
[QA] Gradient Boosting Reinforcement Learning
|
Jul 12, 2024 |
Gradient Boosting Reinforcement Learning
|
Jul 12, 2024 |
[QA] PaliGemma: A versatile 3B VLM for transfer
|
Jul 12, 2024 |
PaliGemma: A versatile 3B VLM for transfer
|
Jul 12, 2024 |
[QA] Transformer Alignment in Large Language Models
|
Jul 11, 2024 |
Transformer Alignment in Large Language Models
|
Jul 11, 2024 |
[QA] Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformers
|
Jul 11, 2024 |
Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformers
|
Jul 11, 2024 |
[QA] Composable Interventions for Language Models
|
Jul 10, 2024 |
Composable Interventions for Language Models
|
Jul 10, 2024 |
[QA] Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
|
Jul 10, 2024 |
Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
|
Jul 10, 2024 |
[QA] Stepping on the Edge: Curvature Aware Learning Rate Tuners
|
Jul 09, 2024 |
Stepping on the Edge: Curvature Aware Learning Rate Tuners
|
Jul 09, 2024 |
[QA] Unveiling Encoder-Free Vision-Language Models
|
Jul 09, 2024 |
Unveiling Encoder-Free Vision-Language Models
|
Jul 09, 2024 |
[QA] Learning to (Learn at Test Time): RNNs with Expressive Hidden States
|
Jul 08, 2024 |
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
|
Jul 08, 2024 |
[QA] On scalable oversight with weak LLMs judging strong LLMs
|
Jul 08, 2024 |
On scalable oversight with weak LLMs judging strong LLMs
|
Jul 08, 2024 |
[QA] Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
|
Jul 07, 2024 |
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
|
Jul 07, 2024 |
[QA] Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
|
Jul 07, 2024 |
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
|
Jul 07, 2024 |
[QA] InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
|
Jul 05, 2024 |
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
|
Jul 05, 2024 |
[QA] Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
|
Jul 05, 2024 |
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
|
Jul 05, 2024 |
[QA] LLM-Select: Feature Selection with Large Language Models
|
Jul 04, 2024 |
LLM-Select: Feature Selection with Large Language Models
|
Jul 04, 2024 |
[QA] Universal Length Generalization with Turing Programs
|
Jul 04, 2024 |
Universal Length Generalization with Turing Programs
|
Jul 04, 2024 |
[QA] Searching for Best Practices in Retrieval-Augmented Generation
|
Jul 03, 2024 |
Searching for Best Practices in Retrieval-Augmented Generation
|
Jul 03, 2024 |
[QA] AI Agents That Matter
|
Jul 02, 2024 |
AI Agents That Matter
|
Jul 02, 2024 |
[QA] Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
|
Jul 01, 2024 |
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
|
Jul 01, 2024 |
[QA] Scaling Synthetic Data Creation with 1,000,000,000 Personas
|
Jul 01, 2024 |
Scaling Synthetic Data Creation with 1,000,000,000 Personas
|
Jul 01, 2024 |
[QA] Is Programming by Example solved by LLMs?
|
Jun 30, 2024 |
Is Programming by Example solved by LLMs?
|
Jun 30, 2024 |
[QA] Can LLMs Learn by Teaching? A Preliminary Study
|
Jun 30, 2024 |
Can LLMs Learn by Teaching? A Preliminary Study
|
Jun 30, 2024 |
[QA] Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
|
Jun 29, 2024 |
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
|
Jun 29, 2024 |
[QA] Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction
|
Jun 29, 2024 |
Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction
|
Jun 29, 2024 |
[QA] REVISION MATTERS: Generative Design Guided by Revision Edits
|
Jun 28, 2024 |
REVISION MATTERS: Generative Design Guided by Revision Edits
|
Jun 28, 2024 |
[QA] The Remarkable Robustness of LLMs: Stages of Inference?
|
Jun 28, 2024 |
The Remarkable Robustness of LLMs: Stages of Inference?
|
Jun 28, 2024 |
[QA] Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
|
Jun 27, 2024 |
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
|
Jun 27, 2024 |
[QA] Data curation via joint example selection further accelerates multimodal learning
|
Jun 27, 2024 |
Data curation via joint example selection further accelerates multimodal learning
|
Jun 27, 2024 |
[QA] Large Language Models are Interpretable Learners
|
Jun 26, 2024 |
Large Language Models are Interpretable Learners
|
Jun 26, 2024 |
[QA] Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
|
Jun 26, 2024 |
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
|
Jun 26, 2024 |
[QA] Adam-mini: Use Fewer Learning Rates To Gain More
|
Jun 25, 2024 |
Adam-mini: Use Fewer Learning Rates To Gain More
|
Jun 25, 2024 |
[QA] Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
|
Jun 25, 2024 |
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
|
Jun 25, 2024 |
[QA] Evaluating Numerical Reasoning in Text-to-Image Models
|
Jun 24, 2024 |
Evaluating Numerical Reasoning in Text-to-Image Models
|
Jun 24, 2024 |
[QA] Advantage Alignment Algorithms
|
Jun 24, 2024 |
Advantage Alignment Algorithms
|
Jun 24, 2024 |
[QA] Transcendence: Generative Models Can Outperform The Experts That Train Them
|
Jun 23, 2024 |
Transcendence: Generative Models Can Outperform The Experts That Train Them
|
Jun 23, 2024 |
[QA] Refusal in Language Models Is Mediated by a Single Direction
|
Jun 23, 2024 |
Refusal in Language Models Is Mediated by a Single Direction
|
Jun 23, 2024 |
[QA] Instruction Pre-Training: Language Models are Supervised Multitask Learners
|
Jun 22, 2024 |
Instruction Pre-Training: Language Models are Supervised Multitask Learners
|
Jun 22, 2024 |
[QA] Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
|
Jun 22, 2024 |
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
|
Jun 22, 2024 |
[QA] RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
|
Jun 21, 2024 |
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
|
Jun 21, 2024 |
[QA] Consistency Models Made Easy
|
Jun 21, 2024 |
Consistency Models Made Easy
|
Jun 21, 2024 |
[QA] What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
|
Jun 19, 2024 |
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
|
Jun 19, 2024 |
[QA] Adversarial Attacks on Multimodal Agents
|
Jun 19, 2024 |
Adversarial Attacks on Multimodal Agents
|
Jun 19, 2024 |
[QA] Can Go AIs be adversarially robust?
|
Jun 19, 2024 |
Can Go AIs be adversarially robust?
|
Jun 19, 2024 |
[QA] Autoregressive Image Generation without Vector Quantization
|
Jun 18, 2024 |
Autoregressive Image Generation without Vector Quantization
|
Jun 18, 2024 |
[QA] Measuring memorization in RLHF for code completion
|
Jun 18, 2024 |
Measuring memorization in RLHF for code completion
|
Jun 18, 2024 |
[QA] Bootstrapping Language Models with DPO Implicit Rewards
|
Jun 17, 2024 |
Bootstrapping Language Models with DPO Implicit Rewards
|
Jun 17, 2024 |
[QA] Ad Auctions for LLMs via Retrieval Augmented Generation
|
Jun 17, 2024 |
Ad Auctions for LLMs via Retrieval Augmented Generation
|
Jun 17, 2024 |
[QA] An Empirical Study of Mamba-based Language Models
|
Jun 16, 2024 |
An Empirical Study of Mamba-based Language Models
|
Jun 16, 2024 |
[QA] Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
|
Jun 16, 2024 |
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
|
Jun 16, 2024 |
[QA] What If We Recaption Billions of Web Images with LLaMA-3?
|
Jun 15, 2024 |
What If We Recaption Billions of Web Images with LLaMA-3?
|
Jun 15, 2024 |
[QA] SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
|
Jun 15, 2024 |
SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
|
Jun 15, 2024 |
[QA] Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
|
Jun 14, 2024 |
Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
|
Jun 14, 2024 |
[QA] An Image is Worth More Than 1616 Patches: Exploring Transformers on Individual Pixels
|
Jun 14, 2024 |
An Image is Worth More Than 1616 Patches: Exploring Transformers on Individual Pixels
|
Jun 14, 2024 |
[QA] Large Language Models Must Be Taught to Know What They Don't Know
|
Jun 13, 2024 |
Large Language Models Must Be Taught to Know What They Don't Know
|
Jun 13, 2024 |
[QA] State Soup: In-Context Skill Learning, Retrieval and Mixing
|
Jun 13, 2024 |
State Soup: In-Context Skill Learning, Retrieval and Mixing
|
Jun 13, 2024 |
[QA] Estimating the Hallucination Rate of Generative AI
|
Jun 12, 2024 |
Estimating the Hallucination Rate of Generative AI
|
Jun 12, 2024 |
[QA] Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
|
Jun 12, 2024 |
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
|
Jun 12, 2024 |
Attention as a Hypernetwork
|
Jun 11, 2024 |
[QA] Distributional Preference Alignment of LLMs via Optimal Transport
|
Jun 11, 2024 |
Distributional Preference Alignment of LLMs via Optimal Transport
|
Jun 11, 2024 |
[QA] How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
|
Jun 11, 2024 |
How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
|
Jun 11, 2024 |
[QA] Mixture-of-Agents Enhances Large Language Model Capabilities
|
Jun 10, 2024 |
Mixture-of-Agents Enhances Large Language Model Capabilities
|
Jun 10, 2024 |
[QA] Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
|
Jun 10, 2024 |
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
|
Jun 10, 2024 |
[QA] Improving Alignment and Robustness with Short Circuiting
|
Jun 09, 2024 |
Improving Alignment and Robustness with Short Circuiting
|
Jun 09, 2024 |
[QA] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
|
Jun 09, 2024 |
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
|
Jun 09, 2024 |
[QA] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
|
Jun 08, 2024 |
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
|
Jun 08, 2024 |
[QA] Block Transformer: Global-to-Local Language Modeling for Fast Inference
|
Jun 08, 2024 |
Block Transformer: Global-to-Local Language Modeling for Fast Inference
|
Jun 08, 2024 |
[CHAT] The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
|
Jun 07, 2024 |
[QA] The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
|
Jun 07, 2024 |
The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
|
Jun 07, 2024 |
[CHAT] Verbalized Machine Learning: Revisiting Machine Learning with Language Models
|
Jun 07, 2024 |
[QA] Verbalized Machine Learning: Revisiting Machine Learning with Language Models
|
Jun 07, 2024 |
Verbalized Machine Learning: Revisiting Machine Learning with Language Models
|
Jun 07, 2024 |
[QA] How Truncating Weights Improves Reasoning in Language Models
|
Jun 06, 2024 |
How Truncating Weights Improves Reasoning in Language Models
|
Jun 06, 2024 |
[QA] Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need
|
Jun 06, 2024 |
Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need
|
Jun 06, 2024 |
[QA] Does your data spark joy? Performance gains from domain upsampling at the end of training
|
Jun 06, 2024 |
Does your data spark joy? Performance gains from domain upsampling at the end of training
|
Jun 06, 2024 |
[QA] Guiding a Diffusion Model with a Bad Version of Itself
|
Jun 05, 2024 |
Guiding a Diffusion Model with a Bad Version of Itself
|
Jun 05, 2024 |
[QA] To Believe or Not to Believe Your LLM
|
Jun 05, 2024 |
To Believe or Not to Believe Your LLM
|
Jun 05, 2024 |
[QA] Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
|
Jun 05, 2024 |
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
|
Jun 05, 2024 |
[QA] Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
|
Jun 04, 2024 |
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
|
Jun 04, 2024 |
[QA] What makes unlearning hard and what to do about it
|
Jun 04, 2024 |
What makes unlearning hard and what to do about it
|
Jun 04, 2024 |
[QA] Learning to Play Atari in a World of Tokens
|
Jun 04, 2024 |
Learning to Play Atari in a World of Tokens
|
Jun 04, 2024 |
[QA] Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
|
Jun 03, 2024 |
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
|
Jun 03, 2024 |
[QA] Contextual Position Encoding: Learning to Count What's Important
|
Jun 02, 2024 |
Contextual Position Encoding: Learning to Count What's Important
|
Jun 02, 2024 |
[QA] JINA CLIP: Your CLIP Model Is Also Your Text Retriever
|
Jun 02, 2024 |
JINA CLIP: Your CLIP Model Is Also Your Text Retriever
|
Jun 02, 2024 |
[QA] Large Language Models Can Self-Improve At Web Agent Tasks
|
Jun 01, 2024 |
Large Language Models Can Self-Improve At Web Agent Tasks
|
Jun 01, 2024 |
[QA] Is In-Context Learning Sufficient for Instruction Following in LLMs?
|
Jun 01, 2024 |
Is In-Context Learning Sufficient for Instruction Following in LLMs?
|
Jun 01, 2024 |
[QA] Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities
|
May 31, 2024 |
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities
|
May 31, 2024 |
[QA] COSY: Evaluating Textual Explanations of Neurons
|
May 31, 2024 |
COSY: Evaluating Textual Explanations of Neurons
|
May 31, 2024 |
[QA] Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
|
May 30, 2024 |
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
|
May 30, 2024 |
[QA] Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
|
May 30, 2024 |
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
|
May 30, 2024 |
[QA] Phased Consistency Model
|
May 29, 2024 |
Phased Consistency Model
|
May 29, 2024 |
[QA] Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
|
May 29, 2024 |
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
|
May 29, 2024 |
[QA] On the Origin of Llamas: Model Tree Heritage Recovery
|
May 29, 2024 |
On the Origin of Llamas: Model Tree Heritage Recovery
|
May 29, 2024 |
[QA] Transformers Can Do Arithmetic with the Right Embeddings
|
May 28, 2024 |
Transformers Can Do Arithmetic with the Right Embeddings
|
May 28, 2024 |
[QA] EM Distillation for One-step Diffusion Models
|
May 28, 2024 |
EM Distillation for One-step Diffusion Models
|
May 28, 2024 |
[QA] Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
|
May 27, 2024 |
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
|
May 27, 2024 |
[QA] Are Long-LLMs A Necessity For Long-Context Tasks?
|
May 27, 2024 |
Are Long-LLMs A Necessity For Long-Context Tasks?
|
May 27, 2024 |
[QA] AGILE: A Novel Framework of LLM Agents
|
May 26, 2024 |
AGILE: A Novel Framework of LLM Agents
|
May 26, 2024 |
[QA] Thermodynamic Natural Gradient Descent
|
May 26, 2024 |
Thermodynamic Natural Gradient Descent
|
May 26, 2024 |
[QA] DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
|
May 26, 2024 |
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
|
May 26, 2024 |
[QA] Lessons from the Trenches on Reproducible Evaluation of Language Models
|
May 25, 2024 |
Lessons from the Trenches on Reproducible Evaluation of Language Models
|
May 25, 2024 |
[QA] Not All Language Model Features Are Linear
|
May 24, 2024 |
Not All Language Model Features Are Linear
|
May 24, 2024 |
[QA] Implicit In-context Learning
|
May 24, 2024 |
Implicit In-context Learning
|
May 24, 2024 |
[QA] (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
|
May 23, 2024 |
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
|
May 23, 2024 |
[QA] MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
|
May 23, 2024 |
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
|
May 23, 2024 |
[QA] Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
|
May 22, 2024 |
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
|
May 22, 2024 |
[QA] Your Transformer is Secretly Linear
|
May 22, 2024 |
Your Transformer is Secretly Linear
|
May 22, 2024 |
[QA] Training Data Attribution via Approximate Unrolled Differentation
|
May 21, 2024 |
Training Data Attribution via Approximate Unrolled Differentation
|
May 21, 2024 |
[QA] Information Leakage from Embedding in Large Language Models
|
May 21, 2024 |
Information Leakage from Embedding in Large Language Models
|
May 21, 2024 |
[QA] Layer-Condensed KV Cache for Efficient Inference of Large Language Models
|
May 20, 2024 |
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
|
May 20, 2024 |
[QA] Observational Scaling Laws and the Predictability of Language Model Performance
|
May 20, 2024 |
Observational Scaling Laws and the Predictability of Language Model Performance
|
May 20, 2024 |
[QA] Zero-Shot Tokenizer Transfer
|
May 18, 2024 |
Zero-Shot Tokenizer Transfer
|
May 18, 2024 |
[QA] Many-Shot In-Context Learning in Multimodal Foundation Models
|
May 18, 2024 |
Many-Shot In-Context Learning in Multimodal Foundation Models
|
May 18, 2024 |
[QA] Chameleon: Mixed-Modal Early-Fusion Foundation Models
|
May 17, 2024 |
Chameleon: Mixed-Modal Early-Fusion Foundation Models
|
May 17, 2024 |
[QA] LoRA Learns Less and Forgets Less
|
May 17, 2024 |
LoRA Learns Less and Forgets Less
|
May 17, 2024 |
[QA] The Platonic Representation Hypothesis
|
May 16, 2024 |
The Platonic Representation Hypothesis
|
May 16, 2024 |
[QA] Improving Transformers using Faithful Positional Encoding
|
May 16, 2024 |
Improving Transformers using Faithful Positional Encoding
|
May 16, 2024 |
[QA] Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
|
May 15, 2024 |
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
|
May 15, 2024 |
[QA] Energy-based Hopfield Boosting for Out-of-Distribution Detection
|
May 15, 2024 |
Energy-based Hopfield Boosting for Out-of-Distribution Detection
|
May 15, 2024 |
[QA] RLHF Workflow: From Reward Modeling to Online RLHF
|
May 14, 2024 |
RLHF Workflow: From Reward Modeling to Online RLHF
|
May 14, 2024 |
[QA] SUTRA: Scalable Multilingual Language Model Architecture
|
May 14, 2024 |
SUTRA: Scalable Multilingual Language Model Architecture
|
May 14, 2024 |
[QA] Memory Mosaics
|
May 13, 2024 |
Memory Mosaics
|
May 13, 2024 |
[QA] Linearizing Large Language Models
|
May 13, 2024 |
Linearizing Large Language Models
|
May 13, 2024 |
[QA] From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control
|
May 12, 2024 |
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control
|
May 12, 2024 |
[QA] Distilling Diffusion Models into Conditional GANs
|
May 12, 2024 |
Distilling Diffusion Models into Conditional GANs
|
May 12, 2024 |
[QA] AlphaMath Almost Zero: process Supervision without process
|
May 11, 2024 |
AlphaMath Almost Zero: process Supervision without process
|
May 11, 2024 |
[QA] Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models
|
May 11, 2024 |
Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models
|
May 11, 2024 |
[QA] Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
|
May 10, 2024 |
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
|
May 10, 2024 |
[QA] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
|
May 10, 2024 |
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
|
May 10, 2024 |
[QA] Towards a Theoretical Understanding of the `Reversal Curse' via Training Dynamics
|
May 09, 2024 |
Towards a Theoretical Understanding of the `Reversal Curse' via Training Dynamics
|
May 09, 2024 |
[QA] Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
|
May 09, 2024 |
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
|
May 09, 2024 |
[QA] Custom Gradient Estimators are Straight-Through Estimators in Disguise
|
May 09, 2024 |
Custom Gradient Estimators are Straight-Through Estimators in Disguise
|
May 09, 2024 |
[QA] The Curse of Diversity in Ensemble-Based Exploration
|
May 08, 2024 |
The Curse of Diversity in Ensemble-Based Exploration
|
May 08, 2024 |
[QA] ImageInWords: Unlocking Hyper-Detailed Image Descriptions
|
May 07, 2024 |
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
|
May 07, 2024 |
[QA] Why is SAM Robust to Label Noise?
|
May 07, 2024 |
Why is SAM Robust to Label Noise?
|
May 07, 2024 |
[QA] Is Flash Attention Stable?
|
May 07, 2024 |
Is Flash Attention Stable?
|
May 07, 2024 |
[QA] Understanding LLMs Requires More Than Statistical Generalization
|
May 06, 2024 |
Understanding LLMs Requires More Than Statistical Generalization
|
May 06, 2024 |
[QA] Mitigating LLM Hallucinations via Conformal Abstention
|
May 06, 2024 |
Mitigating LLM Hallucinations via Conformal Abstention
|
May 06, 2024 |
[QA] Structural Pruning of Pre-trained Language Models via Neural Architecture Search
|
May 06, 2024 |
Structural Pruning of Pre-trained Language Models via Neural Architecture Search
|
May 06, 2024 |
[QA] Capabilities of Gemini Models in Medicine
|
May 05, 2024 |
Capabilities of Gemini Models in Medicine
|
May 05, 2024 |
[QA] StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
|
May 04, 2024 |
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
|
May 04, 2024 |
[QA] In-Context Learning with Long-Context Models: An In-Depth Exploration
|
May 04, 2024 |
In-Context Learning with Long-Context Models: An In-Depth Exploration
|
May 04, 2024 |
[QA] WILDCHAT: 1M ChatGPT Interaction Logs in the Wild
|
May 03, 2024 |
WILDCHAT: 1M ChatGPT Interaction Logs in the Wild
|
May 03, 2024 |
[QA] NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
|
May 03, 2024 |
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
|
May 03, 2024 |
[QA] PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models
|
May 03, 2024 |
PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models
|
May 03, 2024 |
[QA] A Careful Examination of Large Language Model Performance on Grade School Arithmetic
|
May 02, 2024 |
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
|
May 02, 2024 |
[QA] Self-Play Preference Optimization for Language Model Alignment
|
May 02, 2024 |
Self-Play Preference Optimization for Language Model Alignment
|
May 02, 2024 |
[QA] Is Bigger Edit Batch Size Always Better? - An Empirical Study on Model Editing with Llama-3
|
May 02, 2024 |
Is Bigger Edit Batch Size Always Better? - An Empirical Study on Model Editing with Llama-3
|
May 02, 2024 |
[QA] Iterative Reasoning Preference Optimization
|
May 01, 2024 |
Iterative Reasoning Preference Optimization
|
May 01, 2024 |
[QA] Harmonic LLMs are Trustworthy
|
May 01, 2024 |
Harmonic LLMs are Trustworthy
|
May 01, 2024 |
[QA] Better & Faster Large Language Models via Multi-token Prediction
|
May 01, 2024 |
Better & Faster Large Language Models via Multi-token Prediction
|
May 01, 2024 |
[QA] Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
|
Apr 30, 2024 |
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
|
Apr 30, 2024 |
[QA] Stylus: Automatic Adapter Selection for Diffusion Models
|
Apr 30, 2024 |
Stylus: Automatic Adapter Selection for Diffusion Models
|
Apr 30, 2024 |
[QA] DPO Meets PPO: Reinforced Token Optimization for RLHF
|
Apr 30, 2024 |
DPO Meets PPO: Reinforced Token Optimization for RLHF
|
Apr 30, 2024 |
[QA] Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
|
Apr 30, 2024 |
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
|
Apr 30, 2024 |
[QA] AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
|
Apr 29, 2024 |
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
|
Apr 29, 2024 |
[QA] Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
|
Apr 29, 2024 |
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
|
Apr 29, 2024 |
[QA] Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models
|
Apr 28, 2024 |
Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models
|
Apr 28, 2024 |
[QA] Retrieval Head Mechanistically Explains Long-Context Factuality
|
Apr 28, 2024 |
Retrieval Head Mechanistically Explains Long-Context Factuality
|
Apr 28, 2024 |
[QA] AUTOCRAWLER : A Progressive Understanding Web Agent for Web Crawler Generation
|
Apr 28, 2024 |
AUTOCRAWLER : A Progressive Understanding Web Agent for Web Crawler Generation
|
Apr 28, 2024 |
[QA] Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
|
Apr 27, 2024 |
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
|
Apr 27, 2024 |
[QA] LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
|
Apr 27, 2024 |
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
|
Apr 27, 2024 |
[QA] Make Your LLM Fully Utilize the Context
|
Apr 27, 2024 |
Make Your LLM Fully Utilize the Context
|
Apr 27, 2024 |
[QA] Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically
|
Apr 26, 2024 |
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically
|
Apr 26, 2024 |
[QA] AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models
|
Apr 26, 2024 |
AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models
|
Apr 26, 2024 |
[QA] Weak-to-Strong Extrapolation Expedites Alignment
|
Apr 26, 2024 |
Weak-to-Strong Extrapolation Expedites Alignment
|
Apr 26, 2024 |
[QA] Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts
|
Apr 25, 2024 |
Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts
|
Apr 25, 2024 |
[QA] Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
|
Apr 25, 2024 |
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
|
Apr 25, 2024 |
[QA] OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
|
Apr 24, 2024 |
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
|
Apr 24, 2024 |
[QA] Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners
|
Apr 24, 2024 |
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners
|
Apr 24, 2024 |
[QA] SnapKV: LLM Knows What You are Looking for Before Generation
|
Apr 24, 2024 |
SnapKV: LLM Knows What You are Looking for Before Generation
|
Apr 24, 2024 |
[QA] Multi-Head Mixture-of-Experts
|
Apr 24, 2024 |
Multi-Head Mixture-of-Experts
|
Apr 24, 2024 |
[QA] The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
|
Apr 23, 2024 |
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
|
Apr 23, 2024 |
[QA] Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
|
Apr 23, 2024 |
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
|
Apr 23, 2024 |
[QA] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
|
Apr 23, 2024 |
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
|
Apr 23, 2024 |
[QA] Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction
|
Apr 22, 2024 |
Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction
|
Apr 22, 2024 |
[QA] HalluciBot: Is There No Such Thing as a Bad Question?
|
Apr 22, 2024 |
HalluciBot: Is There No Such Thing as a Bad Question?
|
Apr 22, 2024 |
[QA] Stronger Random Baselines for In-Context Learning
|
Apr 22, 2024 |
Stronger Random Baselines for In-Context Learning
|
Apr 22, 2024 |
[QA] Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
|
Apr 21, 2024 |
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
|
Apr 21, 2024 |
[QA] The Illusion of State in State-Space Models
|
Apr 21, 2024 |
The Illusion of State in State-Space Models
|
Apr 21, 2024 |
[QA] Chinchilla Scaling: A replication attempt
|
Apr 21, 2024 |
Chinchilla Scaling: A replication attempt
|
Apr 21, 2024 |
[QA] From R to Q: Your Language Model is Secretly a Q-Function
|
Apr 20, 2024 |
From R to Q: Your Language Model is Secretly a Q-Function
|
Apr 20, 2024 |
[QA] Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
|
Apr 20, 2024 |
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
|
Apr 20, 2024 |
[QA] Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
|
Apr 20, 2024 |
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
|
Apr 20, 2024 |
[QA] TRIFORCE: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
|
Apr 19, 2024 |
TRIFORCE: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
|
Apr 19, 2024 |
[QA] Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
|
Apr 19, 2024 |
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
|
Apr 19, 2024 |
[QA] BLINK: Multimodal Large Language Models Can See but Not Perceive
|
Apr 19, 2024 |
BLINK: Multimodal Large Language Models Can See but Not Perceive
|
Apr 19, 2024 |
[QA] Fewer Truncations Improve Language Modeling
|
Apr 18, 2024 |
Fewer Truncations Improve Language Modeling
|
Apr 18, 2024 |
[QA] Many-Shot In-Context Learning
|
Apr 18, 2024 |
Many-Shot In-Context Learning
|
Apr 18, 2024 |
[QA] Can Language Models Solve Olympiad Programming?
|
Apr 18, 2024 |
Can Language Models Solve Olympiad Programming?
|
Apr 18, 2024 |
[QA] Social Choice for Al Alignment: Dealing with Diverse Human Feedback
|
Apr 17, 2024 |
[short] Social Choice for Al Alignment: Dealing with Diverse Human Feedback
|
Apr 17, 2024 |
Social Choice for Al Alignment: Dealing with Diverse Human Feedback
|
Apr 17, 2024 |
[QA] Self-playing Adversarial Language Game Enhances LLM Reasoning
|
Apr 17, 2024 |
[short] Self-playing Adversarial Language Game Enhances LLM Reasoning
|
Apr 17, 2024 |
Self-playing Adversarial Language Game Enhances LLM Reasoning
|
Apr 17, 2024 |
[QA] Transformer FAM: Feedback attention is working memory
|
Apr 16, 2024 |
Transformer FAM: Feedback attention is working memory
|
Apr 16, 2024 |
[QA] Learn Your Reference Model for Real Good Alignment
|
Apr 16, 2024 |
Learn Your Reference Model for Real Good Alignment
|
Apr 16, 2024 |
[QA] Compression Represents Intelligence Linearly
|
Apr 16, 2024 |
Compression Represents Intelligence Linearly
|
Apr 16, 2024 |
[QA] Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
|
Apr 16, 2024 |
[short] Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
|
Apr 16, 2024 |
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
|
Apr 16, 2024 |
[QA] Reducing hallucination in structured outputs via Retrieval-Augmented Generation
|
Apr 16, 2024 |
[short] Reducing hallucination in structured outputs via Retrieval-Augmented Generation
|
Apr 16, 2024 |
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
|
Apr 16, 2024 |
[QA] Pre-training Small Base LMs with Fewer Tokens
|
Apr 15, 2024 |
Pre-training Small Base LMs with Fewer Tokens
|
Apr 15, 2024 |
[QA] Probing the 3D Awareness of Visual Foundation Models
|
Apr 15, 2024 |
Probing the 3D Awareness of Visual Foundation Models
|
Apr 15, 2024 |
[QA] Is ChatGPT Transforming Academics' Writing Style?
|
Apr 15, 2024 |
[short] Is ChatGPT Transforming Academics' Writing Style?
|
Apr 15, 2024 |
Is ChatGPT Transforming Academics' Writing Style?
|
Apr 15, 2024 |
[QA] JetMoE: Reaching Llama2 Performance with 0.1M Dollars
|
Apr 14, 2024 |
[short] JetMoE: Reaching Llama2 Performance with 0.1M Dollars
|
Apr 14, 2024 |
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
|
Apr 14, 2024 |
[QA] Explaining Explainability: Understanding Concept Activation Vectors
|
Apr 14, 2024 |
[short] Explaining Explainability: Understanding Concept Activation Vectors
|
Apr 14, 2024 |
Explaining Explainability: Understanding Concept Activation Vectors
|
Apr 14, 2024 |
[QA] Manipulating Large Language Models to Increase Product Visibility
|
Apr 14, 2024 |
[short] Manipulating Large Language Models to Increase Product Visibility
|
Apr 14, 2024 |
Manipulating Large Language Models to Increase Product Visibility
|
Apr 14, 2024 |
[QA] Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
|
Apr 13, 2024 |
[short] Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
|
Apr 13, 2024 |
Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
|
Apr 13, 2024 |
[QA] Adapting LLaMA Decoder to Vision Transformer
|
Apr 13, 2024 |
[short] Adapting LLaMA Decoder to Vision Transformer
|
Apr 13, 2024 |
Adapting LLaMA Decoder to Vision Transformer
|
Apr 13, 2024 |
[QA] RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
|
Apr 13, 2024 |
[short] RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
|
Apr 13, 2024 |
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
|
Apr 13, 2024 |
[QA] Best Practices and Lessons Learned on Synthetic Data for Language Models
|
Apr 12, 2024 |
[QA] ControlNet: Improving Conditional Controls with Efficient Consistency Feedback
|
Apr 12, 2024 |
[short] ControlNet: Improving Conditional Controls with Efficient Consistency Feedback
|
Apr 12, 2024 |
ControlNet: Improving Conditional Controls with Efficient Consistency Feedback
|
Apr 12, 2024 |
[QA] RHO-1: Not All Tokens Are What You Need
|
Apr 12, 2024 |
[short] RHO-1: Not All Tokens Are What You Need
|
Apr 12, 2024 |
RHO-1: Not All Tokens Are What You Need
|
Apr 12, 2024 |
[QA] LLoCO: Learning Long Contexts Offline
|
Apr 12, 2024 |
[short] LLoCO: Learning Long Contexts Offline
|
Apr 12, 2024 |
LLoCO: Learning Long Contexts Offline
|
Apr 12, 2024 |
[QA] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
|
Apr 11, 2024 |
[short] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
|
Apr 11, 2024 |
Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
|
Apr 11, 2024 |
[QA] Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
|
Apr 11, 2024 |
[short] Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
|
Apr 11, 2024 |
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
|
Apr 11, 2024 |
[QA] LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
|
Apr 10, 2024 |
[short] LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
|
Apr 10, 2024 |
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
|
Apr 10, 2024 |
[QA] Simultaneous linear connectivity of neural networks modulo permutation
|
Apr 10, 2024 |
[short] Simultaneous linear connectivity of neural networks modulo permutation
|
Apr 10, 2024 |
Simultaneous linear connectivity of neural networks modulo permutation
|
Apr 10, 2024 |
[QA] Does Transformer Interpretability Transfer to RNNs?
|
Apr 10, 2024 |
[short] Does Transformer Interpretability Transfer to RNNs?
|
Apr 10, 2024 |
Does Transformer Interpretability Transfer to RNNs?
|
Apr 10, 2024 |
[QA] Finding Visual Task Vectors
|
Apr 09, 2024 |
[short] Finding Visual Task Vectors
|
Apr 09, 2024 |
Finding Visual Task Vectors
|
Apr 09, 2024 |
[QA] PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
|
Apr 09, 2024 |
[short] PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
|
Apr 09, 2024 |
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
|
Apr 09, 2024 |
[QA] Stream of Search (SoS): Learning to Search in Language
|
Apr 08, 2024 |
[short] Stream of Search (SoS): Learning to Search in Language
|
Apr 08, 2024 |
Stream of Search (SoS): Learning to Search in Language
|
Apr 08, 2024 |
[QA] No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
|
Apr 08, 2024 |
[short] No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
|
Apr 08, 2024 |
No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
|
Apr 08, 2024 |
[QA] AIOS: LLM Agent Operating System
|
Apr 07, 2024 |
[short] AIOS: LLM Agent Operating System
|
Apr 07, 2024 |
AIOS: LLM Agent Operating System
|
Apr 07, 2024 |
[QA] Evaluating LLMs at Detecting Errors in LLM Responses
|
Apr 07, 2024 |
[short] Evaluating LLMs at Detecting Errors in LLM Responses
|
Apr 07, 2024 |
Evaluating LLMs at Detecting Errors in LLM Responses
|
Apr 07, 2024 |
[QA] AI and the Problem of Knowledge Collapse
|
Apr 06, 2024 |
[short] AI and the Problem of Knowledge Collapse
|
Apr 06, 2024 |
AI and the Problem of Knowledge Collapse
|
Apr 06, 2024 |
[QA] Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
|
Apr 06, 2024 |
[short] Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
|
Apr 06, 2024 |
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
|
Apr 06, 2024 |
[QA] Do Language Models Plan for Future Tokens?
|
Apr 06, 2024 |
[short] Do Language Models Plan for Future Tokens?
|
Apr 06, 2024 |
Do Language Models Plan for Future Tokens?
|
Apr 06, 2024 |
[QA] AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
|
Apr 05, 2024 |
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
|
Apr 05, 2024 |
[QA] LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
|
Apr 05, 2024 |
[short] LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
|
Apr 05, 2024 |
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
|
Apr 05, 2024 |
[QA] ReFT: Representation Finetuning for Language Models
|
Apr 05, 2024 |
[short] ReFT: Representation Finetuning for Language Models
|
Apr 05, 2024 |
ReFT: Representation Finetuning for Language Models
|
Apr 05, 2024 |
[QA] Linear Attention Sequence Parallelism
|
Apr 04, 2024 |
[short] Linear Attention Sequence Parallelism
|
Apr 04, 2024 |
Linear Attention Sequence Parallelism
|
Apr 04, 2024 |
[QA] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
|
Apr 04, 2024 |
[short] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
|
Apr 04, 2024 |
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
|
Apr 04, 2024 |
[QA] ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
|
Apr 04, 2024 |
[short] ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
|
Apr 04, 2024 |
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
|
Apr 04, 2024 |
[QA] Advancing LLM Reasoning Generalists with Preference Trees
|
Apr 03, 2024 |
[short] Advancing LLM Reasoning Generalists with Preference Trees
|
Apr 03, 2024 |
Advancing LLM Reasoning Generalists with Preference Trees
|
Apr 03, 2024 |
[QA] Long-context LLMs Struggle with Long In-context Learning
|
Apr 03, 2024 |
[short] Long-context LLMs Struggle with Long In-context Learning
|
Apr 03, 2024 |
Long-context LLMs Struggle with Long In-context Learning
|
Apr 03, 2024 |
[QA] Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
|
Apr 03, 2024 |
[short] Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
|
Apr 03, 2024 |
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
|
Apr 03, 2024 |
[QA] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution
|
Apr 02, 2024 |
[short] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution
|
Apr 02, 2024 |
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution
|
Apr 02, 2024 |
[QA] Localizing Paragraph Memorization in Language Models
|
Apr 02, 2024 |
[short] Localizing Paragraph Memorization in Language Models
|
Apr 02, 2024 |
Localizing Paragraph Memorization in Language Models
|
Apr 02, 2024 |
[QA] ReALM: Reference Resolution As Language Modeling
|
Apr 01, 2024 |
[short] ReALM: Reference Resolution As Language Modeling
|
Apr 01, 2024 |
ReALM: Reference Resolution As Language Modeling
|
Apr 01, 2024 |
[QA] Jamba: A Hybrid Transformer-Mamba Language Model
|
Apr 01, 2024 |
[short] Jamba: A Hybrid Transformer-Mamba Language Model
|
Apr 01, 2024 |
Jamba: A Hybrid Transformer-Mamba Language Model
|
Apr 01, 2024 |
[QA] Gecko: Versatile Text Embeddings Distilled from Large Language Models
|
Apr 01, 2024 |
[short] Gecko: Versatile Text Embeddings Distilled from Large Language Models
|
Apr 01, 2024 |
Gecko: Versatile Text Embeddings Distilled from Large Language Models
|
Apr 01, 2024 |
[QA] LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
|
Apr 01, 2024 |
[short] LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
|
Apr 01, 2024 |
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
|
Apr 01, 2024 |
[short] LLAMAFACTORY: Unified Efficient Fine-Tuning of 100+ Language Models
|
Mar 31, 2024 |
LLAMAFACTORY: Unified Efficient Fine-Tuning of 100+ Language Models
|
Mar 31, 2024 |
[short] MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
|
Mar 31, 2024 |
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
|
Mar 31, 2024 |
[short] A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
|
Mar 30, 2024 |
A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
|
Mar 30, 2024 |
[short] ViTAR: Vision Transformer with Any Resolution
|
Mar 30, 2024 |
ViTAR: Vision Transformer with Any Resolution
|
Mar 30, 2024 |
sDPO: Don't Use Your Data All at Once
|
Mar 30, 2024 |
[short] LITA: Language Instructed Temporal-Localization Assistant
|
Mar 29, 2024 |
LITA: Language Instructed Temporal-Localization Assistant
|
Mar 29, 2024 |
[short] Model Stock: All we need is just a few fine-tuned models
|
Mar 29, 2024 |
Model Stock: All we need is just a few fine-tuned models
|
Mar 29, 2024 |
[short] BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
|
Mar 28, 2024 |
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
|
Mar 28, 2024 |
[short] Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
|
Mar 28, 2024 |
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
|
Mar 28, 2024 |
[short] Long-form factuality in large language models
|
Mar 28, 2024 |
Long-form factuality in large language models
|
Mar 28, 2024 |
[short] Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs
|
Mar 27, 2024 |
Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs
|
Mar 27, 2024 |
[short] The Unreasonable Ineffectiveness of the Deeper Layers
|
Mar 27, 2024 |
The Unreasonable Ineffectiveness of the Deeper Layers
|
Mar 27, 2024 |
[short] Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
|
Mar 26, 2024 |
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
|
Mar 26, 2024 |
[short] LLM Agent Operating System
|
Mar 26, 2024 |
LLM Agent Operating System
|
Mar 26, 2024 |
[short] FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
|
Mar 25, 2024 |
FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
|
Mar 25, 2024 |
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
|
Mar 25, 2024 |
[short] Can large language models explore in-context?
|
Mar 25, 2024 |
Can large language models explore in-context?
|
Mar 25, 2024 |
[short] Detoxifying Large Language Models via Knowledge Editing
|
Mar 24, 2024 |
Detoxifying Large Language Models via Knowledge Editing
|
Mar 24, 2024 |
[short] Evolutionary Optimization of Model Merging Recipes
|
Mar 23, 2024 |
Evolutionary Optimization of Model Merging Recipes
|
Mar 23, 2024 |
[short] Recourse for Reclamation: Chatting with Generative Language Models
|
Mar 23, 2024 |
Recourse for Reclamation: Chatting with Generative Language Models
|
Mar 23, 2024 |
[short] A Unified Framework for Model Editing
|
Mar 22, 2024 |
A Unified Framework for Model Editing
|
Mar 22, 2024 |
[short] Language Models Can Reduce Asymmetry in Information Markets
|
Mar 22, 2024 |
Language Models Can Reduce Asymmetry in Information Markets
|
Mar 22, 2024 |
[short] AI and Memory Wall
|
Mar 22, 2024 |
AI and Memory Wall
|
Mar 22, 2024 |
RewardBench: Evaluating Reward Models for Language Modeling
|
Mar 21, 2024 |
Evaluating Frontier Models for Dangerous Capabilities
|
Mar 21, 2024 |
Reverse Training to Nurse the Reversal Curse
|
Mar 21, 2024 |
[short] Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
|
Mar 20, 2024 |
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
|
Mar 20, 2024 |
[short] Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
|
Mar 20, 2024 |
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
|
Mar 20, 2024 |
[short] Yell At Your Robot Improving On-the-Fly from Language Corrections
|
Mar 20, 2024 |
Yell At Your Robot Improving On-the-Fly from Language Corrections
|
Mar 20, 2024 |
[short] Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
|
Mar 19, 2024 |
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
|
Mar 19, 2024 |
[short] Larimar: Large Language Models with Episodic Memory Control
|
Mar 19, 2024 |
Larimar: Large Language Models with Episodic Memory Control
|
Mar 19, 2024 |
[short] Mind Eye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
|
Mar 19, 2024 |
Mind Eye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
|
Mar 19, 2024 |
[short] Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
|
Mar 18, 2024 |
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
|
Mar 18, 2024 |
[short] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
|
Mar 18, 2024 |
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
|
Mar 18, 2024 |
Monitoring AI-Modified Content at Scale
|
Mar 18, 2024 |
AutoDev: Automated AI-Driven Development
|
Mar 17, 2024 |
Is Cosine-Similarity of Embeddings Really About Similarity?
|
Mar 17, 2024 |
[short] AutoEval Done Right: Using Synthetic Data for Model Evaluation
|
Mar 17, 2024 |
AutoEval Done Right: Using Synthetic Data for Model Evaluation
|
Mar 17, 2024 |
Language Model Inversion
|
Mar 16, 2024 |
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
|
Mar 16, 2024 |
[short] Logits of API-Protected LLMs Leak Proprietary Information
|
Mar 16, 2024 |
Logits of API-Protected LLMs Leak Proprietary Information
|
Mar 16, 2024 |
[short] Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
|
Mar 15, 2024 |
Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
|
Mar 15, 2024 |
[short] MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
|
Mar 15, 2024 |
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
|
Mar 15, 2024 |
Gemma: Open Models Based on Gemini Research and Technology
|
Mar 14, 2024 |
[short] Language models scale reliably with over-training and on downstream tasks
|
Mar 14, 2024 |
Language models scale reliably with over-training and on downstream tasks
|
Mar 14, 2024 |
[short] Human Alignment of Large Language Models through Online Preference Optimisation
|
Mar 14, 2024 |
Human Alignment of Large Language Models through Online Preference Optimisation
|
Mar 14, 2024 |
[short] Synth: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
|
Mar 13, 2024 |
Synth: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
|
Mar 13, 2024 |
[short] Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
|
Mar 13, 2024 |
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
|
Mar 13, 2024 |
Multistep Consistency Models
|
Mar 12, 2024 |
[short] VideoMamba: State Space Model for Efficient Video Understanding
|
Mar 12, 2024 |
VideoMamba: State Space Model for Efficient Video Understanding
|
Mar 12, 2024 |
[short] Stealing Part of a Production Language Model
|
Mar 12, 2024 |
Stealing Part of a Production Language Model
|
Mar 12, 2024 |
[short] Stacking as Accelerated Gradient Descent
|
Mar 11, 2024 |
Stacking as Accelerated Gradient Descent
|
Mar 11, 2024 |
[short] ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
|
Mar 11, 2024 |
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
|
Mar 11, 2024 |
[short] Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
|
Mar 10, 2024 |
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
|
Mar 10, 2024 |
[short] GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
|
Mar 09, 2024 |
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
|
Mar 09, 2024 |
[short] Yi: Open Foundation Models by 01.AI
|
Mar 08, 2024 |
Yi: Open Foundation Models by 01.AI
|
Mar 08, 2024 |
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
|
Mar 08, 2024 |
[short] Teaching Large Language Models to Reason with Reinforcement Learning
|
Mar 08, 2024 |
Teaching Large Language Models to Reason with Reinforcement Learning
|
Mar 08, 2024 |
[short] Backtracing: Retrieving the Cause of the Query
|
Mar 07, 2024 |
Backtracing: Retrieving the Cause of the Query
|
Mar 07, 2024 |
[short] Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
|
Mar 07, 2024 |
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
|
Mar 07, 2024 |
[short] Design2Code: How Far Are We From Automating Front-End Engineering?
|
Mar 06, 2024 |
Design2Code: How Far Are We From Automating Front-End Engineering?
|
Mar 06, 2024 |
[short] Greed is All You Need: An Evaluation of Tokenizer Inference Methods
|
Mar 06, 2024 |
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
|
Mar 06, 2024 |
[short] Learning and Leveraging World Models in Visual Representation Learning
|
Mar 04, 2024 |
Learning and Leveraging World Models in Visual Representation Learning
|
Mar 04, 2024 |
[short] Simple linear attention language models balance the recall-throughput tradeoff
|
Mar 03, 2024 |
Simple linear attention language models balance the recall-throughput tradeoff
|
Mar 03, 2024 |
[short] Beyond Language Models: Byte Models are Digital World Simulators
|
Mar 03, 2024 |
Beyond Language Models: Byte Models are Digital World Simulators
|
Mar 03, 2024 |
[short] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
|
Mar 02, 2024 |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
|
Mar 02, 2024 |
[short] Do Transformer World Models Give Better Policy Gradients?
|
Mar 02, 2024 |
Do Transformer World Models Give Better Policy Gradients?
|
Mar 02, 2024 |
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
|
Mar 01, 2024 |
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
|
Mar 01, 2024 |
[short] Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
|
Feb 29, 2024 |
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
|
Feb 29, 2024 |
[short] Approaching Human-Level Forecasting with Language Models
|
Feb 29, 2024 |
Approaching Human-Level Forecasting with Language Models
|
Feb 29, 2024 |
[short] Massive Activations in Large Language Models
|
Feb 28, 2024 |
Massive Activations in Large Language Models
|
Feb 28, 2024 |
[short] Video as the New Language for Real-World Decision Making
|
Feb 28, 2024 |
Video as the New Language for Real-World Decision Making
|
Feb 28, 2024 |
[short] ChatMusician: Understanding and Generating Music Intrinsically with LLM
|
Feb 27, 2024 |
ChatMusician: Understanding and Generating Music Intrinsically with LLM
|
Feb 27, 2024 |
[short] Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding
|
Feb 27, 2024 |
Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding
|
Feb 27, 2024 |
[short] Watermarking Makes Language Models Radioactive
|
Feb 26, 2024 |
Watermarking Makes Language Models Radioactive
|
Feb 26, 2024 |
[short] Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
|
Feb 26, 2024 |
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
|
Feb 26, 2024 |
[short] Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
|
Feb 26, 2024 |
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
|
Feb 26, 2024 |
[short] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
|
Feb 25, 2024 |
[short] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
|
Feb 25, 2024 |
[short] Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
|
Feb 24, 2024 |
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
|
Feb 24, 2024 |
[short] OmniPred: Language Models as Universal Regressors
|
Feb 23, 2024 |
OmniPred: Language Models as Universal Regressors
|
Feb 23, 2024 |
[short] T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
|
Feb 23, 2024 |
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
|
Feb 23, 2024 |
[short] Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
|
Feb 23, 2024 |
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
|
Feb 23, 2024 |
[short] In deep reinforcement learning, a pruned network is a good network
|
Feb 22, 2024 |
In deep reinforcement learning, a pruned network is a good network
|
Feb 22, 2024 |
[short] Coercing LLMs to do and reveal (almost) anything
|
Feb 22, 2024 |
Coercing LLMs to do and reveal (almost) anything
|
Feb 22, 2024 |
[short] Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
|
Feb 21, 2024 |
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
|
Feb 21, 2024 |
[short] Neural Network Diffusion
|
Feb 21, 2024 |
Neural Network Diffusion
|
Feb 21, 2024 |
[short] SEQUOIA: Scalable, Robust, and Hardware-aware Speculative Decoding
|
Feb 20, 2024 |
SEQUOIA: Scalable, Robust, and Hardware-aware Speculative Decoding
|
Feb 20, 2024 |
[short] Reformatted Alignment
|
Feb 20, 2024 |
Reformatted Alignment
|
Feb 20, 2024 |
[short] PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter
|
Feb 19, 2024 |
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter
|
Feb 19, 2024 |
[short] In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
|
Feb 19, 2024 |
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
|
Feb 19, 2024 |
[short] DoRA: Weight-Decomposed Low-Rank Adaptation
|
Feb 18, 2024 |
DoRA: Weight-Decomposed Low-Rank Adaptation
|
Feb 18, 2024 |
[short] A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
|
Feb 17, 2024 |
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
|
Feb 17, 2024 |
[short] On Limitations of the Transformer Architecture
|
Feb 17, 2024 |
On Limitations of the Transformer Architecture
|
Feb 17, 2024 |
[short] Chain-of-Thought Reasoning Without Prompting
|
Feb 16, 2024 |
Chain-of-Thought Reasoning Without Prompting
|
Feb 16, 2024 |
[short] How to Train Data-Efficient LLMs
|
Feb 16, 2024 |
How to Train Data-Efficient LLMs
|
Feb 16, 2024 |
[short] Data Engineering for Scaling Language Models to 128K Context
|
Feb 16, 2024 |
Data Engineering for Scaling Language Models to 128K Context
|
Feb 16, 2024 |
[short] Premise Order Matters in Reasoning with Large Language Models
|
Feb 15, 2024 |
Premise Order Matters in Reasoning with Large Language Models
|
Feb 15, 2024 |
[short] HiRE: High Recall Approximate Top-k Estimation for Efficient LLM Inference
|
Feb 15, 2024 |
HiRE: High Recall Approximate Top-k Estimation for Efficient LLM Inference
|
Feb 15, 2024 |
[short] Transformers Can Achieve Length Generalization But Not Robustly
|
Feb 15, 2024 |
Transformers Can Achieve Length Generalization But Not Robustly
|
Feb 15, 2024 |
Tandem Transformers for Inference Efficient LLMs
|
Feb 14, 2024 |
Tandem Transformers for Inference Efficient LLMs
|
Feb 14, 2024 |
Mixtures of Experts Unlock Parameter Scaling for Deep RL
|
Feb 14, 2024 |
Mixtures of Experts Unlock Parameter Scaling for Deep RL
|
Feb 14, 2024 |
[short] A Tale of Tails: Model Collapse as a Change of Scaling Laws
|
Feb 13, 2024 |
A Tale of Tails: Model Collapse as a Change of Scaling Laws
|
Feb 13, 2024 |
[short] Scaling Laws for Fine-Grained Mixture of Experts
|
Feb 13, 2024 |
Scaling Laws for Fine-Grained Mixture of Experts
|
Feb 13, 2024 |
[short] Suppressing Pink Elephants with Direct Principle Feedback
|
Feb 13, 2024 |
Suppressing Pink Elephants with Direct Principle Feedback
|
Feb 13, 2024 |
V-STaR: Training Verifiers for Self-Taught Reasoners
|
Feb 12, 2024 |
V-STaR: Training Verifiers for Self-Taught Reasoners
|
Feb 12, 2024 |
[short] Feedback Loops With Language Models Drive In-Context Reward Hacking
|
Feb 12, 2024 |
Feedback Loops With Language Models Drive In-Context Reward Hacking
|
Feb 12, 2024 |
The boundary of neural network trainability is fractal
|
Feb 12, 2024 |
[short] More Agents Is All You Need
|
Feb 11, 2024 |
More Agents Is All You Need
|
Feb 11, 2024 |
[short] Buffer Overflow in Mixture of Experts
|
Feb 11, 2024 |
Buffer Overflow in Mixture of Experts
|
Feb 11, 2024 |
[short] Grandmaster-Level Chess Without Search
|
Feb 10, 2024 |
Grandmaster-Level Chess Without Search
|
Feb 10, 2024 |
[short] Driving Everywhere with Large Language Model Policy Adaptation
|
Feb 10, 2024 |
Driving Everywhere with Large Language Model Policy Adaptation
|
Feb 10, 2024 |
[short] An Interactive Agent Foundation Model
|
Feb 09, 2024 |
An Interactive Agent Foundation Model
|
Feb 09, 2024 |
[short] Learning to Route Among Specialized Experts for Zero-Shot Generalization
|
Feb 09, 2024 |
Learning to Route Among Specialized Experts for Zero-Shot Generalization
|
Feb 09, 2024 |
[short] Hydragen: High-Throughput LLM Inference with Shared Prefixes
|
Feb 08, 2024 |
Hydragen: High-Throughput LLM Inference with Shared Prefixes
|
Feb 08, 2024 |
[short] Direct Language Model Alignment from Online AI Feedback
|
Feb 08, 2024 |
Direct Language Model Alignment from Online AI Feedback
|
Feb 08, 2024 |
[short] Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
|
Feb 08, 2024 |
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
|
Feb 08, 2024 |
[short] SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures
|
Feb 07, 2024 |
SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures
|
Feb 07, 2024 |
[short] Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
|
Feb 06, 2024 |
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
|
Feb 06, 2024 |
[short] Decoding-time Realignment of Language Models
|
Feb 06, 2024 |
Decoding-time Realignment of Language Models
|
Feb 06, 2024 |
[short] Specialized Language Models with Cheap Inference from Limited Domain Data
|
Feb 05, 2024 |
Specialized Language Models with Cheap Inference from Limited Domain Data
|
Feb 05, 2024 |
[short] Repeat After Me: Transformers are Better than State Space Models at Copying
|
Feb 05, 2024 |
Repeat After Me: Transformers are Better than State Space Models at Copying
|
Feb 05, 2024 |
[short] Unlearnable Algorithms for In-context Learning
|
Feb 04, 2024 |
Unlearnable Algorithms for In-context Learning
|
Feb 04, 2024 |
[short] Can Large Language Models Understand Context?
|
Feb 03, 2024 |
Can Large Language Models Understand Context?
|
Feb 03, 2024 |
[short] Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI
|
Feb 03, 2024 |
Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI
|
Feb 03, 2024 |
[short] Transforming and Combining Rewards for Aligning Large Language Models
|
Feb 02, 2024 |
Transforming and Combining Rewards for Aligning Large Language Models
|
Feb 02, 2024 |
[short] Efficient Exploration for LLMs
|
Feb 02, 2024 |
Efficient Exploration for LLMs
|
Feb 02, 2024 |
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
|
Feb 02, 2024 |
[short] Infini-gram: Scaling Unbounded -gram Language Models to a Trillion Tokens
|
Feb 01, 2024 |
Infini-gram: Scaling Unbounded -gram Language Models to a Trillion Tokens
|
Feb 01, 2024 |
[short] Arrows of Time for Large Language Models
|
Feb 01, 2024 |
Arrows of Time for Large Language Models
|
Feb 01, 2024 |
[short] WEAVER: Foundation Models for Creative Writing
|
Jan 31, 2024 |
WEAVER: Foundation Models for Creative Writing
|
Jan 31, 2024 |
[short] H2O-Danube-1.8B Technical Report
|
Jan 31, 2024 |
H2O-Danube-1.8B Technical Report
|
Jan 31, 2024 |
[short] Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
|
Jan 30, 2024 |
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
|
Jan 30, 2024 |
[short] SYNCHFORMER: EFFICIENT SYNCHRONIZATION FROM SPARSE CUES
|
Jan 30, 2024 |
SYNCHFORMER: EFFICIENT SYNCHRONIZATION FROM SPARSE CUES
|
Jan 30, 2024 |
[short] MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
|
Jan 30, 2024 |
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
|
Jan 30, 2024 |
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
|
Jan 29, 2024 |
[short] From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
|
Jan 29, 2024 |
[short] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
|
Jan 29, 2024 |
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
|
Jan 29, 2024 |
[short] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
|
Jan 28, 2024 |
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
|
Jan 28, 2024 |
Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research Visibility
|
Jan 26, 2024 |
[short] Rethinking Patch Dependence for Masked Autoencoders
|
Jan 26, 2024 |
Rethinking Patch Dependence for Masked Autoencoders
|
Jan 26, 2024 |
[short] Deconstructing Denoising Diffusion Models for Self-Supervised Learning
|
Jan 26, 2024 |
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
|
Jan 26, 2024 |
[short] MambaByte: Token-free Selective State Space Model
|
Jan 25, 2024 |
MambaByte: Token-free Selective State Space Model
|
Jan 25, 2024 |
[short] Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
|
Jan 24, 2024 |
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
|
Jan 24, 2024 |
[short] Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
|
Jan 24, 2024 |
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
|
Jan 24, 2024 |
[short] WARM: On the Benefits of Weight Averaged Reward Models
|
Jan 23, 2024 |
WARM: On the Benefits of Weight Averaged Reward Models
|
Jan 23, 2024 |
[short] Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
|
Jan 23, 2024 |
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
|
Jan 23, 2024 |
Zero Bubble Pipeline Parallelism
|
Jan 22, 2024 |
[short] MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
|
Jan 22, 2024 |
MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
|
Jan 22, 2024 |
[short] Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
|
Jan 22, 2024 |
Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
|
Jan 22, 2024 |
Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability
|
Jan 21, 2024 |
VMamba: Visual State Space Model
|
Jan 21, 2024 |
[short] Training Neural Networks is NP-Hard in Fixed Dimension
|
Jan 20, 2024 |
Training Neural Networks is NP-Hard in Fixed Dimension
|
Jan 20, 2024 |
[short] Rethinking FID: Towards a Better Evaluation Metric for Image Generation
|
Jan 20, 2024 |
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
|
Jan 20, 2024 |
[short] ChatQA: Building GPT-4 Level Conversational QA Models
|
Jan 19, 2024 |
ChatQA: Building GPT-4 Level Conversational QA Models
|
Jan 19, 2024 |
[short] Improving fine-grained understanding in image-text pre-training
|
Jan 19, 2024 |
Improving fine-grained understanding in image-text pre-training
|
Jan 19, 2024 |
[short] Self-Rewarding Language Models
|
Jan 19, 2024 |
Self-Rewarding Language Models
|
Jan 19, 2024 |
[short] REFT: Reasoning with REinforced Fine-Tuning
|
Jan 18, 2024 |
REFT: Reasoning with REinforced Fine-Tuning
|
Jan 18, 2024 |
[short] Asynchronous Local-SGD Training for Language Modeling
|
Jan 18, 2024 |
Asynchronous Local-SGD Training for Language Modeling
|
Jan 18, 2024 |
[short] DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
|
Jan 18, 2024 |
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
|
Jan 18, 2024 |
[short] Tuning Language Models by Proxy
|
Jan 17, 2024 |
Tuning Language Models by Proxy
|
Jan 17, 2024 |
[short] Scalable Pre-training of Large Autoregressive Image Models
|
Jan 17, 2024 |
Scalable Pre-training of Large Autoregressive Image Models
|
Jan 17, 2024 |
[short] A Closer Look at AUROC and AUPRC under Class Imbalance
|
Jan 16, 2024 |
A Closer Look at AUROC and AUPRC under Class Imbalance
|
Jan 16, 2024 |
[short] Nash Learning from Human Feedback
|
Jan 16, 2024 |
Nash Learning from Human Feedback
|
Jan 16, 2024 |
[short] The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
|
Jan 15, 2024 |
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
|
Jan 15, 2024 |
[short] AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
|
Jan 15, 2024 |
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
|
Jan 15, 2024 |
PALP: Prompt Aligned Personalization of Text-to-Image Models
|
Jan 14, 2024 |
Secrets of RLHF in Large Language Models Part II: Reward Modeling
|
Jan 14, 2024 |
[short] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
|
Jan 12, 2024 |
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
|
Jan 12, 2024 |
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
|
Jan 12, 2024 |
[short] Towards Conversational Diagnostic AI
|
Jan 12, 2024 |
Towards Conversational Diagnostic AI
|
Jan 12, 2024 |
[short] Transformers are Multi-State RNNs
|
Jan 12, 2024 |
Transformers are Multi-State RNNs
|
Jan 12, 2024 |
[short] PIXART: Fast and Controllable Image Generation with Latent Consistency Models
|
Jan 11, 2024 |
PIXART: Fast and Controllable Image Generation with Latent Consistency Models
|
Jan 11, 2024 |
[short] The Impact of Reasoning Step Length on Large Language Models
|
Jan 11, 2024 |
The Impact of Reasoning Step Length on Large Language Models
|
Jan 11, 2024 |
[short] Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
|
Jan 10, 2024 |
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
|
Jan 10, 2024 |
[short] Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
|
Jan 09, 2024 |
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
|
Jan 09, 2024 |
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
|
Jan 09, 2024 |
Mixtral of Experts
|
Jan 09, 2024 |
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
|
Jan 09, 2024 |
[short] Progressive Knowledge Distillation of Stable Diffusion XL using Layer Level Loss
|
Jan 08, 2024 |
Progressive Knowledge Distillation of Stable Diffusion XL using Layer Level Loss
|
Jan 08, 2024 |
[short] Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
|
Jan 08, 2024 |
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
|
Jan 08, 2024 |
[short] Can AI Be as Creative as Humans?
|
Jan 06, 2024 |
Can AI Be as Creative as Humans?
|
Jan 06, 2024 |
[short] Improving Text Embeddings with Large Language Models
|
Jan 06, 2024 |
Improving Text Embeddings with Large Language Models
|
Jan 06, 2024 |
[short] LLAMA PRO: Progressive LLaMA with Block Expansion
|
Jan 06, 2024 |
LLAMA PRO: Progressive LLaMA with Block Expansion
|
Jan 06, 2024 |
[short] Instruct-Imagen: Image Generation with Multi-modal Instruction
|
Jan 05, 2024 |
Instruct-Imagen: Image Generation with Multi-modal Instruction
|
Jan 05, 2024 |
[short] LLM Augmented LLMs: Expanding Capabilities through Composition
|
Jan 05, 2024 |
LLM Augmented LLMs: Expanding Capabilities through Composition
|
Jan 05, 2024 |
[short] TinyLlama: An Open-Source Small Language Model
|
Jan 05, 2024 |
TinyLlama: An Open-Source Small Language Model
|
Jan 05, 2024 |
[short] GPT-4V(ision) is a Generalist Web Agent, if Grounded
|
Jan 04, 2024 |
GPT-4V(ision) is a Generalist Web Agent, if Grounded
|
Jan 04, 2024 |
[short] aMUSEd: An open MUSE reproduction
|
Jan 04, 2024 |
aMUSEd: An open MUSE reproduction
|
Jan 04, 2024 |
[short] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
|
Jan 03, 2024 |
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
|
Jan 03, 2024 |
[short] Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
|
Jan 02, 2024 |
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
|
Jan 02, 2024 |
[short] MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
|
Jan 02, 2024 |
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
|
Jan 02, 2024 |
[short] Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
|
Jan 01, 2024 |
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
|
Jan 01, 2024 |
[short] Fast Inference of Mixture-of-Experts Language Models with Offloading
|
Dec 30, 2023 |
Fast Inference of Mixture-of-Experts Language Models with Offloading
|
Dec 30, 2023 |
[short] MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices
|
Dec 29, 2023 |
MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices
|
Dec 29, 2023 |
[short] From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
|
Dec 28, 2023 |
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
|
Dec 28, 2023 |
[short] Supervised Knowledge Makes Large Language Models Better In-context Learners
|
Dec 27, 2023 |
Supervised Knowledge Makes Large Language Models Better In-context Learners
|
Dec 27, 2023 |
[short] Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
|
Dec 26, 2023 |
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
|
Dec 26, 2023 |
[short] The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
|
Dec 25, 2023 |
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
|
Dec 25, 2023 |
[short] LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
|
Dec 24, 2023 |
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
|
Dec 24, 2023 |
[short] Time is Encoded in the Weights of Finetuned Language Models
|
Dec 23, 2023 |
Time is Encoded in the Weights of Finetuned Language Models
|
Dec 23, 2023 |
[short] DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
|
Dec 22, 2023 |
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
|
Dec 22, 2023 |
[short] Mini-GPTs: Efficient Large Language Models through Contextual Pruning
|
Dec 21, 2023 |
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
|
Dec 21, 2023 |
[short] LLM in a flash: Efficient Large Language Model Inference with Limited Memory
|
Dec 20, 2023 |
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
|
Dec 20, 2023 |
[short] G-LLaVA : Solving Geometric Problem with Multi-Modal Large Language Model
|
Dec 19, 2023 |
G-LLaVA : Solving Geometric Problem with Multi-Modal Large Language Model
|
Dec 19, 2023 |
[short] Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
|
Dec 18, 2023 |
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
|
Dec 18, 2023 |
[short] Self-Evaluation Improves Selective Generation in Large Language Models
|
Dec 18, 2023 |
Self-Evaluation Improves Selective Generation in Large Language Models
|
Dec 18, 2023 |
[short] Weight Subcloning: Direct Initialization of Transformers Using Larger Pretrained Ones
|
Dec 18, 2023 |
Weight Subcloning: Direct Initialization of Transformers Using Larger Pretrained Ones
|
Dec 18, 2023 |
[short] TinyGSM: achieving on GSM8k with small language models
|
Dec 16, 2023 |
TinyGSM: achieving on GSM8k with small language models
|
Dec 16, 2023 |
[short] Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
|
Dec 15, 2023 |
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
|
Dec 15, 2023 |
[short] Vision-Language Models as a Source of Rewards
|
Dec 15, 2023 |
Vision-Language Models as a Source of Rewards
|
Dec 15, 2023 |
[short] SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
|
Dec 14, 2023 |
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
|
Dec 14, 2023 |
[short] Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
|
Dec 13, 2023 |
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
|
Dec 13, 2023 |
[short] Steering Llama 2 via Contrastive Activation Addition
|
Dec 13, 2023 |
Steering Llama 2 via Contrastive Activation Addition
|
Dec 13, 2023 |
[short] LLM360: Towards Fully Transparent Open-Source LLMs
|
Dec 12, 2023 |
LLM360: Towards Fully Transparent Open-Source LLMs
|
Dec 12, 2023 |
[short] Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models
|
Dec 12, 2023 |
Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models
|
Dec 12, 2023 |
[short] Context Tuning for Retrieval Augmented Generation
|
Dec 12, 2023 |
Context Tuning for Retrieval Augmented Generation
|
Dec 12, 2023 |
[short] Using Captum to Explain Generative Language Models
|
Dec 12, 2023 |
Using Captum to Explain Generative Language Models
|
Dec 12, 2023 |
[short] HALO: An Ontology for Representing Hallucinations in Generative Models
|
Dec 11, 2023 |
HALO: An Ontology for Representing Hallucinations in Generative Models
|
Dec 11, 2023 |
[short] How to Guess a Gradient
|
Dec 11, 2023 |
How to Guess a Gradient
|
Dec 11, 2023 |
[short] Using Large Language Models for Hyperparameter Optimization
|
Dec 09, 2023 |
Using Large Language Models for Hyperparameter Optimization
|
Dec 09, 2023 |
[short] Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
|
Dec 08, 2023 |
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
|
Dec 08, 2023 |
[short] Efficient Monotonic Multihead Attention
|
Dec 08, 2023 |
Efficient Monotonic Multihead Attention
|
Dec 08, 2023 |
[short] Self-conditioned Image Generation via Generating Representations
|
Dec 07, 2023 |
Self-conditioned Image Generation via Generating Representations
|
Dec 07, 2023 |
[short] Relightable Gaussian Codec Avatars
|
Dec 07, 2023 |
Relightable Gaussian Codec Avatars
|
Dec 07, 2023 |
[short] Training Chain-of-Thought via Latent-Variable Inference
|
Dec 06, 2023 |
Training Chain-of-Thought via Latent-Variable Inference
|
Dec 06, 2023 |
[short] Eliciting Latent Knowledge from Quirky Language Models
|
Dec 05, 2023 |
Eliciting Latent Knowledge from Quirky Language Models
|
Dec 05, 2023 |
[short] The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
|
Dec 05, 2023 |
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
|
Dec 05, 2023 |
[short] GIVT: Generative Infinite-Vocabulary Transformers
|
Dec 05, 2023 |
GIVT: Generative Infinite-Vocabulary Transformers
|
Dec 05, 2023 |
[short] Object Recognition as Next Token Prediction
|
Dec 05, 2023 |
Object Recognition as Next Token Prediction
|
Dec 05, 2023 |
[short] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
|
Dec 04, 2023 |
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
|
Dec 04, 2023 |
[short] Instruction-tuning Aligns LLMs to the Human Brain
|
Dec 04, 2023 |
Instruction-tuning Aligns LLMs to the Human Brain
|
Dec 04, 2023 |
[short] Dataset Distillation in Large Data Era
|
Dec 03, 2023 |
Dataset Distillation in Large Data Era
|
Dec 03, 2023 |
[short] One-step Diffusion with Distribution Matching Distillation
|
Dec 01, 2023 |
One-step Diffusion with Distribution Matching Distillation
|
Dec 01, 2023 |
[short] Initializing Models with Larger Ones
|
Dec 01, 2023 |
Initializing Models with Larger Ones
|
Dec 01, 2023 |
[short] SODA: Bottleneck Diffusion Models for Representation Learning
|
Nov 30, 2023 |
SODA: Bottleneck Diffusion Models for Representation Learning
|
Nov 30, 2023 |
[short] No Representation Rules Them All in Category Discovery
|
Nov 29, 2023 |
No Representation Rules Them All in Category Discovery
|
Nov 29, 2023 |
[short] Manifold Preserving Guided Diffusion
|
Nov 29, 2023 |
Manifold Preserving Guided Diffusion
|
Nov 29, 2023 |
[short] On the Long Range Abilities of Transformers
|
Nov 29, 2023 |
On the Long Range Abilities of Transformers
|
Nov 29, 2023 |
[short] Scalable Extraction of Training Data from (Production) Language Models
|
Nov 29, 2023 |
Scalable Extraction of Training Data from (Production) Language Models
|
Nov 29, 2023 |
[short] Scalable AI Safety via Doubly-Efficient Debate
|
Nov 27, 2023 |
Scalable AI Safety via Doubly-Efficient Debate
|
Nov 27, 2023 |
[short] Let's Verify Step by Step
|
Nov 24, 2023 |
Let's Verify Step by Step
|
Nov 24, 2023 |
PaSS: Parallel Speculative Sampling
|
Nov 24, 2023 |
[short] In-Context Learning Functions with Varying Number of Minima
|
Nov 22, 2023 |
In-Context Learning Functions with Varying Number of Minima
|
Nov 22, 2023 |
[short] Orca 2: Teaching Small Language Models How to Reason
|
Nov 21, 2023 |
Orca 2: Teaching Small Language Models How to Reason
|
Nov 21, 2023 |
[short] Exponentially Faster Language Modeling
|
Nov 21, 2023 |
Exponentially Faster Language Modeling
|
Nov 21, 2023 |
[short] Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
|
Nov 20, 2023 |
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
|
Nov 20, 2023 |
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
|
Nov 20, 2023 |
[short] The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
|
Nov 19, 2023 |
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
|
Nov 19, 2023 |
[short] Striped Attention: Faster Ring Attention for Causal Transformers
|
Nov 17, 2023 |
Striped Attention: Faster Ring Attention for Causal Transformers
|
Nov 17, 2023 |
[short] SiRA: Sparse Mixture of Low Rank Adaptation
|
Nov 16, 2023 |
SiRA: Sparse Mixture of Low Rank Adaptation
|
Nov 16, 2023 |
[short] Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
|
Nov 15, 2023 |
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
|
Nov 15, 2023 |
[short] Frontier Language Models are not Robust to Adversarial Arithmetic, or “What do I need to say so you agree 2+2=5?”
|
Nov 15, 2023 |
Frontier Language Models are not Robust to Adversarial Arithmetic, or “What do I need to say so you agree 2+2=5?”
|
Nov 15, 2023 |
[short] The ART of LLM Refinement: Ask, Refine, and Trust
|
Nov 15, 2023 |
The ART of LLM Refinement: Ask, Refine, and Trust
|
Nov 15, 2023 |
[short] LUMOS: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
|
Nov 13, 2023 |
LUMOS: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
|
Nov 13, 2023 |
[short] The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models
|
Nov 13, 2023 |
The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models
|
Nov 13, 2023 |
[short] Prompt Cache: Modular Attention Reuse for Low-Latency Inference
|
Nov 10, 2023 |
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
|
Nov 10, 2023 |
Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
|
Nov 10, 2023 |
[short] LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
|
Nov 10, 2023 |
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
|
Nov 10, 2023 |
[short] Everything of Thoughts : Defying the Law of Penrose Triangle for Thought Generation
|
Nov 09, 2023 |
Everything of Thoughts : Defying the Law of Penrose Triangle for Thought Generation
|
Nov 09, 2023 |
[short] Can LLMs Follow Simple Rules?
|
Nov 09, 2023 |
Can LLMs Follow Simple Rules?
|
Nov 09, 2023 |
[short] I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
|
Nov 08, 2023 |
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
|
Nov 08, 2023 |
[short] S-LoRA: Serving Thousands of Concurrent LoRA Adapters
|
Nov 07, 2023 |
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
|
Nov 07, 2023 |
[short] Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
|
Nov 06, 2023 |
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
|
Nov 06, 2023 |
[short] Global Optimization: A Machine Learning Approach
|
Nov 06, 2023 |
Global Optimization: A Machine Learning Approach
|
Nov 06, 2023 |
[short] Simplifying Transformer Blocks
|
Nov 06, 2023 |
Simplifying Transformer Blocks
|
Nov 06, 2023 |
[short] Managing AI Risks in an Era of Rapid Progress
|
Nov 06, 2023 |
Managing AI Risks in an Era of Rapid Progress
|
Nov 06, 2023 |
FlashDecoding++: Faster Large Language Model Inference on GPUs
|
Nov 03, 2023 |
[short] Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
|
Nov 03, 2023 |
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
|
Nov 03, 2023 |
[short] Text Rendering Strategies for Pixel Language Models
|
Nov 02, 2023 |
Text Rendering Strategies for Pixel Language Models
|
Nov 02, 2023 |
[short] The Generative AI Paradox: “What It Can Create, It May Not Understand”
|
Nov 02, 2023 |
The Generative AI Paradox: “What It Can Create, It May Not Understand”
|
Nov 02, 2023 |
[short] The Impact of Depth and Width on Transformer Language Model Generalization
|
Nov 01, 2023 |
The Impact of Depth and Width on Transformer Language Model Generalization
|
Nov 01, 2023 |
[short] Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
|
Nov 01, 2023 |
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
|
Nov 01, 2023 |
[short] A Survey on Knowledge Editing of Neural Networks
|
Oct 31, 2023 |
A Survey on Knowledge Editing of Neural Networks
|
Oct 31, 2023 |
[short] MM-VID : Advancing Video Understanding with GPT-4V(ision)
|
Oct 31, 2023 |
MM-VID : Advancing Video Understanding with GPT-4V(ision)
|
Oct 31, 2023 |
CodeFusion: A Pre-trained Diffusion Model for Code Generation
|
Oct 30, 2023 |
[short] Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation
|
Oct 30, 2023 |
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation
|
Oct 30, 2023 |
[small] Zephyr: Direct Distillation of LM Alignment
|
Oct 29, 2023 |
Zephyr: Direct Distillation of LM Alignment
|
Oct 29, 2023 |
[short] JudgeLM : Fine-tuned Large Language Models are Scalable Judges
|
Oct 28, 2023 |
JudgeLM : Fine-tuned Large Language Models are Scalable Judges
|
Oct 28, 2023 |
How do Language Models Bind Entities in Context?
|
Oct 27, 2023 |
[short] Controlled Decoding from Language Models
|
Oct 27, 2023 |
Controlled Decoding from Language Models
|
Oct 27, 2023 |
[short] Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
|
Oct 27, 2023 |
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
|
Oct 27, 2023 |
[short] A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
|
Oct 26, 2023 |
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
|
Oct 26, 2023 |
[short] Detecting Pretraining Data from Large Language Models
|
Oct 26, 2023 |
Detecting Pretraining Data from Large Language Models
|
Oct 26, 2023 |
[short] QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
|
Oct 26, 2023 |
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
|
Oct 26, 2023 |
[short] Function Vectors in Large Language Models
|
Oct 25, 2023 |
Function Vectors in Large Language Models
|
Oct 25, 2023 |
[short] In-Context Learning Creates Task Vectors
|
Oct 25, 2023 |
In-Context Learning Creates Task Vectors
|
Oct 25, 2023 |
[short] Woodpecker: Hallucination Correction for Multimodal Large Language Models
|
Oct 25, 2023 |
Woodpecker: Hallucination Correction for Multimodal Large Language Models
|
Oct 25, 2023 |
[short] Matryoshka Diffusion Models
|
Oct 24, 2023 |
Matryoshka Diffusion Models
|
Oct 24, 2023 |
[short] SpecTr: Fast Speculative Decoding via Optimal Transport
|
Oct 24, 2023 |
SpecTr: Fast Speculative Decoding via Optimal Transport
|
Oct 24, 2023 |
[short] Linear Representations of Sentiment in Large Language Models
|
Oct 24, 2023 |
Linear Representations of Sentiment in Large Language Models
|
Oct 24, 2023 |
Towards Understanding Sycophancy in Language Models
|
Oct 23, 2023 |
[short] Contrastive Preference Learning: Learning from Human Feedback without RL
|
Oct 23, 2023 |
Contrastive Preference Learning: Learning from Human Feedback without RL
|
Oct 23, 2023 |
[short] Demystifying the Myths and Legends of Nonconvex Convergence of SGD
|
Oct 22, 2023 |
Demystifying the Myths and Legends of Nonconvex Convergence of SGD
|
Oct 22, 2023 |
3D-GPT: Procedural 3D Modeling with Large Language Models
|
Oct 20, 2023 |
[short] An Emulator for Fine-Tuning Large Language Models using Small Language Models
|
Oct 20, 2023 |
An Emulator for Fine-Tuning Large Language Models using Small Language Models
|
Oct 20, 2023 |
[short] Frozen Transformers in Language Models Are Effective Visual Encoder Layers
|
Oct 20, 2023 |
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
|
Oct 20, 2023 |
[short] AgentTuning: Enabling Generalized Agent Abilities for LLMs
|
Oct 20, 2023 |
[short] Training Dynamics of Deep Network Linear Regions
|
Oct 20, 2023 |
Training Dynamics of Deep Network Linear Regions
|
Oct 20, 2023 |
AgentTuning: Enabling Generalized Agent Abilities for LLMs
|
Oct 20, 2023 |
[short] Eliciting Human Preferences with Language Models
|
Oct 19, 2023 |
Eliciting Human Preferences with Language Models
|
Oct 19, 2023 |
[short] Emptying the Ocean with a Spoon: Should We Edit Models?
|
Oct 19, 2023 |
Emptying the Ocean with a Spoon: Should We Edit Models?
|
Oct 19, 2023 |
[short] SELF-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
|
Oct 19, 2023 |
SELF-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
|
Oct 19, 2023 |
[short] Approximating Two-Layer Feedforward Networks for Efficient Transformers
|
Oct 18, 2023 |
Approximating Two-Layer Feedforward Networks for Efficient Transformers
|
Oct 18, 2023 |
[short] Context-Aware Meta-Learning
|
Oct 18, 2023 |
Context-Aware Meta-Learning
|
Oct 18, 2023 |
[short] In-Context Pretraining: Language Modeling Beyond Document Boundaries
|
Oct 17, 2023 |
In-Context Pretraining: Language Modeling Beyond Document Boundaries
|
Oct 17, 2023 |
[short] Improving Large Language Model Fine-tuning for Solving Math Problems
|
Oct 17, 2023 |
Improving Large Language Model Fine-tuning for Solving Math Problems
|
Oct 17, 2023 |
[short] Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
|
Oct 17, 2023 |
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
|
Oct 17, 2023 |
[short] MemGPT: Towards LLMs as Operating Systems
|
Oct 16, 2023 |
MemGPT: Towards LLMs as Operating Systems
|
Oct 16, 2023 |
[short] Neural Diffusion Models
|
Oct 13, 2023 |
Neural Diffusion Models
|
Oct 13, 2023 |
[short] Offline pRetraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
|
Oct 13, 2023 |
Offline pRetraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
|
Oct 13, 2023 |
[short] Online Speculative Decoding
|
Oct 12, 2023 |
Online Speculative Decoding
|
Oct 12, 2023 |
[short] MatFormer: Nested Transformer for Elastic Inference
|
Oct 12, 2023 |
MatFormer: Nested Transformer for Elastic Inference
|
Oct 12, 2023 |
[short] Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
|
Oct 11, 2023 |
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
|
Oct 11, 2023 |
Mistral 7B
|
Oct 11, 2023 |
[short] Teaching Language Models to Hallucinate Less with Synthetic Tasks
|
Oct 11, 2023 |
Teaching Language Models to Hallucinate Less with Synthetic Tasks
|
Oct 11, 2023 |
[short] Transformer Fusion with Optimal Transport
|
Oct 10, 2023 |
Transformer Fusion with Optimal Transport
|
Oct 10, 2023 |
[short] Grokking as Compression: A Nonlinear Complexity Perspective
|
Oct 10, 2023 |
Grokking as Compression: A Nonlinear Complexity Perspective
|
Oct 10, 2023 |
[short] Leveraging unpaired data for vision-language generative models via Cycle Consistency
|
Oct 06, 2023 |
Leveraging unpaired data for vision-language generative models via Cycle Consistency
|
Oct 06, 2023 |
[short] Language Models Represent Space and Time
|
Oct 05, 2023 |
Language Models Represent Space and Time
|
Oct 05, 2023 |
[short] Retrieval meets Long Context Large Language Models
|
Oct 05, 2023 |
Retrieval meets Long Context Large Language Models
|
Oct 05, 2023 |
[short] Large Language Models Cannot Self-Correct Reasoning Yet
|
Oct 04, 2023 |
Large Language Models Cannot Self-Correct Reasoning Yet
|
Oct 04, 2023 |
[short] Think before you speak: Training Language Models With Pause Tokens
|
Oct 04, 2023 |
Think before you speak: Training Language Models With Pause Tokens
|
Oct 04, 2023 |
[short] Enable Language Models to Implicitly Learn Self-Improvement From Data
|
Oct 03, 2023 |
Enable Language Models to Implicitly Learn Self-Improvement From Data
|
Oct 03, 2023 |
[short] AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
|
Oct 03, 2023 |
AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
|
Oct 03, 2023 |
Representation Engineering: A Top-Down Approach to AI Transparency
|
Oct 03, 2023 |
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
|
Oct 02, 2023 |
[short] Efficient Streaming Language Models with Attention Sinks
|
Oct 02, 2023 |
Efficient Streaming Language Models with Attention Sinks
|
Oct 02, 2023 |
[short] Tree Cross Attention
|
Oct 02, 2023 |
Tree Cross Attention
|
Oct 02, 2023 |
[short] Demystifying CLIP Data
|
Sep 30, 2023 |
Demystifying CLIP Data
|
Sep 30, 2023 |
[short] Effective Long-Context Scaling of Foundation Models
|
Sep 30, 2023 |
Effective Long-Context Scaling of Foundation Models
|
Sep 30, 2023 |
[short] GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
|
Sep 29, 2023 |
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
|
Sep 29, 2023 |
[short] MotionLM: Multi-Agent Motion Forecasting as Language Modeling
|
Sep 29, 2023 |
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
|
Sep 29, 2023 |
[short] Jointly Training Large Autoregressive Multimodal Models
|
Sep 28, 2023 |
Jointly Training Large Autoregressive Multimodal Models
|
Sep 28, 2023 |
Deep Model Fusion: A Survey
|
Sep 28, 2023 |
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
|
Sep 27, 2023 |
Large Language Model Alignment: A Survey
|
Sep 27, 2023 |
[short] Aligning Large Multimodal Models with Factually Augmented RLHF
|
Sep 27, 2023 |
Aligning Large Multimodal Models with Factually Augmented RLHF
|
Sep 27, 2023 |
[short] SCREWS : A Modular Framework for Reasoning with Revisions
|
Sep 26, 2023 |
SCREWS : A Modular Framework for Reasoning with Revisions
|
Sep 26, 2023 |
[small] Small-scale proxies for large-scale Transformer training instabilities
|
Sep 26, 2023 |
Small-scale proxies for large-scale Transformer training instabilities
|
Sep 26, 2023 |
[short] DEPT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
|
Sep 25, 2023 |
DEPT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
|
Sep 25, 2023 |
[short] Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning
|
Sep 25, 2023 |
The Rise and Potential of Large Language Model Based Agents: A Survey
|
Sep 24, 2023 |
[short] The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”
|
Sep 23, 2023 |
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”
|
Sep 23, 2023 |
[short] LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
|
Sep 23, 2023 |
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
|
Sep 22, 2023 |
[short] BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
|
Sep 22, 2023 |
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
|
Sep 22, 2023 |
[short] FreeU: Free Lunch in Diffusion U-Net
|
Sep 21, 2023 |
FreeU: Free Lunch in Diffusion U-Net
|
Sep 21, 2023 |
[short] Chain-of-Verification Reduces Hallucination in Large Language Models
|
Sep 21, 2023 |
Chain-of-Verification Reduces Hallucination in Large Language Models
|
Sep 21, 2023 |
[short] Language Modeling Is Compression
|
Sep 20, 2023 |
[full] Language Modeling Is Compression
|
Sep 20, 2023 |
[short] PDFTriage: Question Answering over Long, Structured Documents
|
Sep 19, 2023 |
[full] PDFTriage: Question Answering over Long, Structured Documents
|
Sep 19, 2023 |
[short] Contrastive Decoding Improves Reasoning in Large Language Models
|
Sep 19, 2023 |
[full] Contrastive Decoding Improves Reasoning in Large Language Models
|
Sep 19, 2023 |
Sparse Autoencoders Find Highly Interpretable Features in Language Models
|
Sep 18, 2023 |
Scaling Laws for Sparsely-Connected Foundation Models
|
Sep 18, 2023 |
Agents: An Open-source Framework for Autonomous Language Agents
|
Sep 16, 2023 |
Statistical rejection sampling improves preference optimization
|
Sep 14, 2023 |
Efficient Memory Management for Large Language Model Serving with PagedAttention
|
Sep 13, 2023 |
Uncovering mesa-optimization algorithms in Transformers
|
Sep 13, 2023 |
Fast Inference from Transformers via Speculative Decoding
|
Sep 12, 2023 |
Neurons in Large Language Models: Dead, N-gram, Positional
|
Sep 12, 2023 |
Fiat: Fusing Learning Paradigms with Instruction-Accelerated Tuning
|
Sep 12, 2023 |
ImageBind-LLM: Multi-modality Instruction Tuning
|
Sep 08, 2023 |
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
|
Sep 08, 2023 |
SLiMe: Segment Like Me
|
Sep 07, 2023 |
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
|
Sep 07, 2023 |
YaRN: Efficient Context Window Extension of Large Language Models
|
Sep 06, 2023 |
MVDream: Multi-view Diffusion for 3D Generation
|
Sep 01, 2023 |
LLaSM: Large Language and Speech Model
|
Aug 31, 2023 |
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
|
Aug 31, 2023 |
Reprogramming under constraints: efficient and reliable transferability of lottery tickets
|
Aug 30, 2023 |
Fast Feedforward Networks
|
Aug 29, 2023 |
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
|
Aug 28, 2023 |
Code Llama: Open Foundation Models for Code
|
Aug 28, 2023 |
Critical Learning Periods Emerge Even in Deep Linear Networks
|
Aug 25, 2023 |
Giraffe: Adventures in Expanding Context Lengths in LLMs
|
Aug 23, 2023 |
Reinforced Self-Training (ReST) for Language Modeling
|
Aug 23, 2023 |
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
|
Aug 22, 2023 |
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
|
Aug 21, 2023 |
A Theory for Emergence of Complex Skills in Language Models
|
Aug 18, 2023 |
Bayesian Flow Networks
|
Aug 17, 2023 |
Solving Challenging Math Word Problems Using GPT-4 with Code-based Self-Verification
|
Aug 16, 2023 |
The Devil is in the Errors: Leveraging LLMs for Fine-grained Machine Translation Evaluation
|
Aug 15, 2023 |
Shepherd: A Critic for Language Model Generation
|
Aug 10, 2023 |
Simple synthetic data reduces sycophancy in large language models
|
Aug 09, 2023 |
Evaluating Data Attribution for Text-to-Image Models
|
Aug 09, 2023 |
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
|
Aug 08, 2023 |
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
|
Jul 20, 2023 |
Challenges and Applications of Large Language Models
|
Jul 20, 2023 |
Towards A Unified Agent with Foundation Models
|
Jul 20, 2023 |
Llama 2: Open Foundation and Fine-Tuned Chat Models
|
Jul 19, 2023 |
How Is ChatGPT’s Behavior Changing over Time?
|
Jul 19, 2023 |
Retentive Network: A Successor to Transformer for Large Language Models
|
Jul 18, 2023 |
Alpagasus: Training A Better Alpaca with Fewer Data
|
Jul 18, 2023 |
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
|
Jul 17, 2023 |
Learning to Retrieve In-Context Examples for Large Language Models
|
Jul 17, 2023 |
Copy is All You Need
|
Jul 17, 2023 |
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
|
Jul 15, 2023 |
Self-consistency for open-ended generations
|
Jul 15, 2023 |
In-context Autoencoder for Context Compression in a Large Language Model
|
Jul 14, 2023 |
Towards Robust and Efficient Continual Language Learning
|
Jul 14, 2023 |
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
|
Jul 14, 2023 |
Secrets of RLHF in Large Language Models Part I: PPO
|
Jul 12, 2023 |
Generative Pretraining in Multimodality
|
Jul 12, 2023 |
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
|
Jul 12, 2023 |
VampNet: Music Generation via Masked Acoustic Token Modeling
|
Jul 11, 2023 |
Large Language Models as General Pattern Machines
|
Jul 11, 2023 |
Teaching Arithmetic to Small Transformers
|
Jul 11, 2023 |
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning
|
Jul 11, 2023 |
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
|
Jul 09, 2023 |
LLM Calibration and Automatic Hallucination Detection via Pareto Optimal Self-supervision
|
Jul 09, 2023 |
Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts
|
Jul 09, 2023 |
Focused Transformer: Contrastive Training for Context Scaling
|
Jul 07, 2023 |
Lost in the Middle: How Language Models Use Long Contexts
|
Jul 07, 2023 |
Flacuna: Unleashing the Problem Solving Power of Vicuna using Flan Fine-Tuning
|
Jul 06, 2023 |
LongNet: Scaling Transformers to 1,000,000,000 Tokens
|
Jul 06, 2023 |
Stay on topic with Classifier-Free Guidance
|
Jul 05, 2023 |
Segment Anything Meets Point Tracking
|
Jul 05, 2023 |
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization
|
Jun 30, 2023 |
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
|
Jun 30, 2023 |
Are aligned neural networks adversarially aligned?
|
Jun 29, 2023 |
Extending Context Window of Large Language Models via Position Interpolation
|
Jun 28, 2023 |
Restart Sampling for Improving Generative Processes
|
Jun 28, 2023 |
SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling with Backtracking
|
Jun 28, 2023 |
MotionGPT: Human Motion as a Foreign Language
|
Jun 28, 2023 |
Language models are weak learners
|
Jun 27, 2023 |
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
|
Jun 27, 2023 |
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
|
Jun 27, 2023 |
A Simple and Effective Pruning Approach for Large Language Models
|
Jun 25, 2023 |
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference
|
Jun 24, 2023 |
AudioPaLM: A Large Language Model That Can Speak and Listen
|
Jun 24, 2023 |
Training Transformers with 4-bit Integers
|
Jun 23, 2023 |
Fast Segment Anything
|
Jun 22, 2023 |
Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision
|
Jun 22, 2023 |
Textbooks Are All You Need
|
Jun 22, 2023 |
Glimmer: generalized late-interaction memory reranker
|
Jun 21, 2023 |
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
|
Jun 20, 2023 |
Demystifying GPT Self-Repair for Code Generation
|
Jun 20, 2023 |
MagicBrush : A Manually Annotated Dataset for Instruction-Guided Image Editing
|
Jun 20, 2023 |
Full Parameter Fine-tuning for Large Language Models with Limited Resources
|
Jun 19, 2023 |
TryOnDiffusion: A Tale of Two UNets
|
Jun 17, 2023 |
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
|
Jun 15, 2023 |
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
|
Jun 14, 2023 |
Augmenting Language Models with Long-Term Memory
|
Jun 14, 2023 |
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
|
Jun 14, 2023 |
FasterViT: Fast Vision Transformers with Hierarchical Attention
|
Jun 13, 2023 |