Arxiv Papers

By Igor Melnyk

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by Igor Melnyk

Category: Science

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast
    

Subscribers: 5
Reviews: 0
Episodes: 2169

Description

Running out of time to catch up with new arXiv papers? We take the most impactful papers and present them as convenient podcasts. If you're a visual learner, we offer these papers in an engaging video format. Our service fills the gap between overly brief paper summaries and time-consuming full paper reads. You gain academic insights in a time-efficient, digestible format. Code behind this work: https://github.com/imelnyk/ArxivPapers

Episode Date
[QA] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Apr 30, 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Apr 30, 2025
[QA] ReasonIR: Training Retrievers for Reasoning Tasks
Apr 30, 2025
ReasonIR: Training Retrievers for Reasoning Tasks
Apr 30, 2025
[QA] Scaling Laws For Scalable Oversight
Apr 28, 2025
Scaling Laws For Scalable Oversight
Apr 28, 2025
[QA] Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models
Apr 28, 2025
Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models
Apr 28, 2025
[QA] Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity
Apr 27, 2025
Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity
Apr 27, 2025
[QA] Learning Adaptive Parallel Reasoning with Language Models
Apr 26, 2025
Learning Adaptive Parallel Reasoning with Language Models
Apr 26, 2025
[QA] Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Apr 26, 2025
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Apr 26, 2025
[QA] Step1X-Edit: A Practical Framework for General Image Editing
Apr 25, 2025
Step1X-Edit: A Practical Framework for General Image Editing
Apr 25, 2025
[QA] Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Apr 25, 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Apr 25, 2025
[QA] Exploring How LLMs Capture and Represent Domain-Specific Knowledge
Apr 24, 2025
Exploring How LLMs Capture and Represent Domain-Specific Knowledge
Apr 24, 2025
[QA] I-Con: A Unifying Framework for Representation Learning
Apr 24, 2025
I-Con: A Unifying Framework for Representation Learning
Apr 24, 2025
[QA] Tina: Tiny Reasoning Models via LoRA
Apr 23, 2025
Tina: Tiny Reasoning Models via LoRA
Apr 23, 2025
[QA] LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Apr 23, 2025
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Apr 23, 2025
[QA] UFO2: The Desktop AgentOS
Apr 22, 2025
UFO2: The Desktop AgentOS
Apr 22, 2025
[QA] NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning
Apr 22, 2025
NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning
Apr 22, 2025
[QA] Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Apr 21, 2025
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Apr 21, 2025
[QA] Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Apr 21, 2025
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Apr 21, 2025
[QA] Reasoning Models Can Be Effective Without Thinking
Apr 20, 2025
Reasoning Models Can Be Effective Without Thinking
Apr 20, 2025
[QA] A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Apr 20, 2025
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Apr 20, 2025
[QA] CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Apr 19, 2025
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Apr 19, 2025
[QA] Antidistillation Sampling
Apr 18, 2025
Antidistillation Sampling
Apr 18, 2025
[QA] Position: The Most Expensive Part of an LLM should be its Training Data
Apr 18, 2025
Position: The Most Expensive Part of an LLM should be its Training Data
Apr 18, 2025
[QA] Activated LoRA: Fine-tuned LLMs for Intrinsics
Apr 18, 2025
Activated LoRA: Fine-tuned LLMs for Intrinsics
Apr 18, 2025
[QA] COLORBENCH: Can VLMs See and Understand the Colorful World?
Apr 17, 2025
COLORBENCH: Can VLMs See and Understand the Colorful World?
Apr 17, 2025
[QA] ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Apr 17, 2025
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Apr 17, 2025
[QA] Looking beyond the next token
Apr 16, 2025
Looking beyond the next token
Apr 16, 2025
[QA] How to Predict Best Pretraining Data with Small Experiments
Apr 16, 2025
How to Predict Best Pretraining Data with Small Experiments
Apr 16, 2025
[QA] Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Apr 15, 2025
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Apr 15, 2025
[QA] DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
Apr 15, 2025
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
Apr 15, 2025
[QA] Steering CLIP's vision transformer with sparse autoencoders
Apr 14, 2025
Steering CLIP's vision transformer with sparse autoencoders
Apr 14, 2025
[QA] Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Apr 14, 2025
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Apr 14, 2025
[QA] Rethinking Reflection in Pre-Training
Apr 13, 2025
Rethinking Reflection in Pre-Training
Apr 13, 2025
[QA] Self-Steering Language Models
Apr 13, 2025
Self-Steering Language Models
Apr 13, 2025
[QA] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Apr 12, 2025
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Apr 12, 2025
DDT: Decoupled Diffusion Transformer
Apr 12, 2025
DDT: Decoupled Diffusion Transformer
Apr 12, 2025
[QA] Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
Apr 11, 2025
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
Apr 11, 2025
[QA] Scaling Laws for Native Multimodal Models
Apr 11, 2025
Scaling Laws for Native Multimodal Models
Apr 11, 2025
[QA] OLMOTRACE: Tracing Language Model Outputs Back to Trillions of Training Tokens
Apr 10, 2025
OLMOTRACE: Tracing Language Model Outputs Back to Trillions of Training Tokens
Apr 10, 2025
[QA] Wanting to be Understood
Apr 10, 2025
Wanting to be Understood
Apr 10, 2025
[QA] A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Apr 10, 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Apr 10, 2025
[QA] From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
Apr 09, 2025
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
Apr 09, 2025
[QA] Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Apr 09, 2025
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Apr 09, 2025
[QA] Can ChatGPT Learn My Life From a Week of First-Person Video?
Apr 08, 2025
Can ChatGPT Learn My Life From a Week of First-Person Video?
Apr 08, 2025
[QA] Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
Apr 08, 2025
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
Apr 08, 2025
[QA] Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Apr 07, 2025
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models
Apr 07, 2025
[QA] Agentic Knowledgeable Self-awareness
Apr 07, 2025
Agentic Knowledgeable Self-awareness
Apr 07, 2025
[QA] Inference-Time Scaling for Generalist Reward Modeling
Apr 05, 2025
Inference-Time Scaling for Generalist Reward Modeling
Apr 05, 2025
[QA] Multi-Token Attention
Apr 05, 2025
Multi-Token Attention
Apr 05, 2025
[QA] Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Mar 30, 2025
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Mar 30, 2025
[QA] Wan: Open and Advanced Large-Scale Video Generative Models
Mar 29, 2025
Wan: Open and Advanced Large-Scale Video Generative Models
Mar 29, 2025
[QA] UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
Mar 29, 2025
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
Mar 29, 2025
[QA] SWI: Speaking with Intent in Large Language Models
Mar 28, 2025
SWI: Speaking with Intent in Large Language Models
Mar 28, 2025
[QA] Unified Multimodal Discrete Diffusion
Mar 28, 2025
Unified Multimodal Discrete Diffusion
Mar 28, 2025
[QA] Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Mar 27, 2025
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Mar 27, 2025
[QA] Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Mar 27, 2025
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Mar 27, 2025
[QA] LookAhead Tuning: Safer Language Models via Partial Answer Previews
Mar 26, 2025
LookAhead Tuning: Safer Language Models via Partial Answer Previews
Mar 26, 2025
[QA] ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Mar 26, 2025
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Mar 26, 2025
[QA] FFN Fusion: Rethinking Sequential Computation in Large Language Models
Mar 25, 2025
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Mar 25, 2025
[QA] Modifying Large Language Model Post-Training for Diverse Creative Writing
Mar 24, 2025
Modifying Large Language Model Post-Training for Diverse Creative Writing
Mar 24, 2025
[QA] Users Favor LLM-Generated Content—Until They Know It's AI
Mar 24, 2025
Users Favor LLM-Generated Content—Until They Know It's AI
Mar 24, 2025
[QA] Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
Mar 24, 2025
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
Mar 24, 2025
[QA] DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Mar 23, 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Mar 23, 2025
[QA] SynCity: Training-Free Generation of 3D Worlds
Mar 23, 2025
SynCity: Training-Free Generation of 3D Worlds
Mar 23, 2025
[QA] TULIP: Towards Unified Language-Image Pretraining
Mar 22, 2025
TULIP: Towards Unified Language-Image Pretraining
Mar 22, 2025
[QA] Causal Emergence 2.0: Quantifying emergent complexity
Mar 22, 2025
Causal Emergence 2.0: Quantifying emergent complexity
Mar 22, 2025
[QA] Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them
Mar 21, 2025
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them
Mar 21, 2025
[QA] Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Mar 21, 2025
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Mar 21, 2025
[QA] Cube: A Roblox View of 3D Intelligence
Mar 20, 2025
Cube: A Roblox View of 3D Intelligence
Mar 20, 2025
[QA] SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Mar 20, 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Mar 20, 2025
[QA] Measuring AI Ability to Complete Long Tasks
Mar 19, 2025
Measuring AI Ability to Complete Long Tasks
Mar 19, 2025
[QA] Impossible Videos
Mar 19, 2025
Impossible Videos
Mar 19, 2025
[QA] SuperBPE: Space Travel for Language Models
Mar 18, 2025
SuperBPE: Space Travel for Language Models
Mar 18, 2025
[QA] xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
Mar 18, 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
Mar 18, 2025
[QA] PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Mar 17, 2025
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Mar 17, 2025
[QA] Auditing language models for hidden objectives
Mar 17, 2025
Auditing language models for hidden objectives
Mar 17, 2025
[QA] Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
Mar 16, 2025
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
Mar 16, 2025
[QA] New Trends for Modern Machine Translation with Large Reasoning Models
Mar 16, 2025
New Trends for Modern Machine Translation with Large Reasoning Models
Mar 16, 2025
[QA] Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Mar 15, 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Mar 15, 2025
[QA] Long Context Tuning for Video Generation
Mar 15, 2025
Long Context Tuning for Video Generation
Mar 15, 2025
[QA] Transformers without Normalization
Mar 14, 2025
Transformers without Normalization
Mar 14, 2025
[QA] Charting and Navigating Hugging Face's Model Atlas
Mar 14, 2025
Charting and Navigating Hugging Face's Model Atlas
Mar 14, 2025
[QA] I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Mar 13, 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Mar 13, 2025
[QA] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Mar 13, 2025
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Mar 13, 2025
[QA] Gemini Embedding: Generalizable Embeddings from Gemini
Mar 12, 2025
Gemini Embedding: Generalizable Embeddings from Gemini
Mar 12, 2025
[QA] Inductive Moment Matching
Mar 11, 2025
Inductive Moment Matching
Mar 11, 2025
[QA] Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Mar 11, 2025
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Mar 11, 2025
[QA] Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Mar 10, 2025
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Mar 10, 2025
[QA] Continual Pre-training of MoEs: How robust is your router?
Mar 10, 2025
Continual Pre-training of MoEs: How robust is your router?
Mar 10, 2025
[QA] HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
Mar 09, 2025
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
Mar 09, 2025
[QA] Boosting Blockchain Throughput: Parallel EVM Execution with Asynchronous Storage for Reddio
Mar 09, 2025
Boosting Blockchain Throughput: Parallel EVM Execution with Asynchronous Storage for Reddio
Mar 09, 2025
[QA] Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
Mar 08, 2025
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
Mar 08, 2025
[QA] L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Mar 08, 2025
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Mar 08, 2025
[QA] PokéChamp: an Expert-level Minimax Language Agent
Mar 07, 2025
PokéChamp: an Expert-level Minimax Language Agent
Mar 07, 2025
[QA] Token-Efficient Long Video Understanding for Multimodal LLMs
Mar 07, 2025
Token-Efficient Long Video Understanding for Multimodal LLMs
Mar 07, 2025
[QA] Position: Model Collapse Does Not Mean What You Think
Mar 06, 2025
Position: Model Collapse Does Not Mean What You Think
Mar 06, 2025
[QA] Towards Understanding Distilled Reasoning Models: A Representational Approach
Mar 06, 2025
Towards Understanding Distilled Reasoning Models: A Representational Approach
Mar 06, 2025
[QA] Weak-to-Strong Generalization Even in Random Feature Networks, Provably
Mar 05, 2025
Weak-to-Strong Generalization Even in Random Feature Networks, Provably
Mar 05, 2025
[QA] Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Mar 05, 2025
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Mar 05, 2025
[QA] Chain of Draft: Thinking Faster by Writing Less
Mar 03, 2025
Chain of Draft: Thinking Faster by Writing Less
Mar 03, 2025
[QA] Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
Mar 03, 2025
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
Mar 03, 2025
[QA] Implicit Search via Discrete Diffusion: A Study on Chess
Mar 02, 2025
Implicit Search via Discrete Diffusion: A Study on Chess
Mar 02, 2025
[QA] Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars
Mar 02, 2025
Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars
Mar 02, 2025
[QA] LightThinker: Thinking Step-by-Step Compression
Mar 01, 2025
LightThinker: Thinking Step-by-Step Compression
Mar 01, 2025
[QA] LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Mar 01, 2025
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Mar 01, 2025
[QA] Self-rewarding correction for mathematical reasoning
Feb 28, 2025
Self-rewarding correction for mathematical reasoning
Feb 28, 2025
[QA] I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning
Feb 27, 2025
I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning
Feb 27, 2025
[QA] Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Feb 27, 2025
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Feb 27, 2025
[QA] DeepSeek vs. ChatGPT: A Comparative Study for Scientific Computing and Scientific Machine Learning Tasks
Feb 26, 2025
DeepSeek vs. ChatGPT: A Comparative Study for Scientific Computing and Scientific Machine Learning Tasks
Feb 26, 2025
[QA] Yes, Q-learning Helps Offline In-Context RL
Feb 26, 2025
Yes, Q-learning Helps Offline In-Context RL
Feb 26, 2025
[QA] Fractal Generative Models
Feb 25, 2025
Fractal Generative Models
Feb 25, 2025
[QA] Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Feb 24, 2025
Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Feb 24, 2025
[QA] Idiosyncrasies in Large Language Models
Feb 23, 2025
Idiosyncrasies in Large Language Models
Feb 23, 2025
[QA] SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Feb 22, 2025
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Feb 22, 2025
[QA] Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
Feb 21, 2025
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
Feb 21, 2025
[QA] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
Feb 21, 2025
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
Feb 21, 2025
[QA] Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Feb 20, 2025
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Feb 20, 2025
[QA] Small Models Struggle to Learn from Strong Reasoners
Feb 20, 2025
Small Models Struggle to Learn from Strong Reasoners
Feb 20, 2025
[QA] TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Feb 19, 2025
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Feb 19, 2025
[QA] Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Feb 19, 2025
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Feb 19, 2025
[QA] (How) Can Transformers Predict Pseudo-Random Numbers?
Feb 17, 2025
(How) Can Transformers Predict Pseudo-Random Numbers?
Feb 17, 2025
[QA] Do Large Language Models Reason Causally Like Us? Even Better?
Feb 17, 2025
Do Large Language Models Reason Causally Like Us? Even Better?
Feb 17, 2025
[QA] Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting
Feb 16, 2025
Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting
Feb 16, 2025
[QA] Fino1: On the Transferability of Reasoning‑Enhanced LLMs to Finance
Feb 16, 2025
Fino1: On the Transferability of Reasoning‑Enhanced LLMs to Finance
Feb 16, 2025
[QA] The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models
Feb 15, 2025
The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models
Feb 15, 2025
[QA] SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Feb 14, 2025
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Feb 14, 2025
[QA] LLM Pretraining with Continuous Concepts
Feb 13, 2025
LLM Pretraining with Continuous Concepts
Feb 13, 2025
[QA] Distillation Scaling Laws
Feb 13, 2025
Distillation Scaling Laws
Feb 13, 2025
[QA] Competitive Programming with Large Reasoning Models
Feb 12, 2025
Competitive Programming with Large Reasoning Models
Feb 12, 2025
[QA] Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Feb 12, 2025
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Feb 12, 2025
[QA] DeepCrossAttention: Supercharging Transformer Residual Connections
Feb 11, 2025
DeepCrossAttention: Supercharging Transformer Residual Connections
Feb 11, 2025
[QA] Matryoshka Quantization
Feb 11, 2025
Matryoshka Quantization
Feb 11, 2025
[QA] When One LLM Drools, Multi-LLM Collaboration Rules
Feb 10, 2025
When One LLM Drools, Multi-LLM Collaboration Rules
Feb 10, 2025
[QA] Self-Regulation and Requesting Interventions
Feb 10, 2025
Self-Regulation and Requesting Interventions
Feb 10, 2025
[QA] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Feb 10, 2025
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Feb 10, 2025
[QA] Value-Based Deep RL Scales Predictably
Feb 09, 2025
Value-Based Deep RL Scales Predictably
Feb 09, 2025
[QA] Demystifying Long Chain-of-Thought Reasoning in LLMs
Feb 09, 2025
Demystifying Long Chain-of-Thought Reasoning in LLMs
Feb 09, 2025
[QA] ULTRAIF: Advancing Instruction Following from the Wild
Feb 08, 2025
ULTRAIF: Advancing Instruction Following from the Wild
Feb 08, 2025
[QA] Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Feb 08, 2025
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Feb 08, 2025
[QA] Examining Two Hop Reasoning Through Information Content Scaling
Feb 07, 2025
Examining Two Hop Reasoning Through Information Content Scaling
Feb 07, 2025
[QA] Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
Feb 07, 2025
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
Feb 07, 2025
[QA] Do Large Language Model Benchmarks Test Reliability?
Feb 06, 2025
Do Large Language Model Benchmarks Test Reliability?
Feb 06, 2025
Detecting Strategic Deception Using Linear Probes
Feb 06, 2025
[QA] Evaluation of Large Language Models via Coupled Token Generation
Feb 05, 2025
Evaluation of Large Language Models via Coupled Token Generation
Feb 05, 2025
[QA] Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Feb 05, 2025
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Feb 05, 2025
[QA] Should You Use Your Large Language Model to Explore or Exploit?
Feb 04, 2025
Should You Use Your Large Language Model to Explore or Exploit?
Feb 04, 2025
[QA] Harmonic Loss Trains Interpretable AI Models
Feb 04, 2025
Harmonic Loss Trains Interpretable AI Models
Feb 04, 2025
[QA] Trading inference-time compute for adversarial robustness.
Feb 03, 2025
Trading inference-time compute for adversarial robustness.
Feb 03, 2025
[QA] LLMs can see and hear without any training
Feb 02, 2025
LLMs can see and hear without any training
Feb 02, 2025
[QA] o3-mini vs DeepSeek-R1: Which One is Safer?
Feb 02, 2025
o3-mini vs DeepSeek-R1: Which One is Safer?
Feb 02, 2025
[QA] Optimizing Large Language Model Training Using FP4 Quantization
Feb 01, 2025
Optimizing Large Language Model Training Using FP4 Quantization
Feb 01, 2025
[QA] People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
Feb 01, 2025
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
Feb 01, 2025
[QA] Large Language Models Think Too Fast To Explore Effectively
Jan 31, 2025
Large Language Models Think Too Fast To Explore Effectively
Jan 31, 2025
[QA] Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Jan 31, 2025
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Jan 31, 2025
[QA] Early External Safety Testing of OpenAI’s o3-mini: Insights from Pre-Deployment Evaluation
Jan 30, 2025
Early External Safety Testing of OpenAI’s o3-mini: Insights from Pre-Deployment Evaluation
Jan 30, 2025
[QA] Dynamics of Transient Structure in In-Context Linear Regression Transformers
Jan 30, 2025
Dynamics of Transient Structure in In-Context Linear Regression Transformers
Jan 30, 2025
[QA] Context is Key for Agent Security
Jan 29, 2025
Context is Key for Agent Security
Jan 29, 2025
[QA] AXBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Jan 29, 2025
AXBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Jan 29, 2025
[QA] Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
Jan 28, 2025
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
Jan 28, 2025
[QA] Feasible Learning
Jan 28, 2025
Feasible Learning
Jan 28, 2025
[QA] Humanity's Last Exam
Jan 27, 2025
Humanity's Last Exam
Jan 27, 2025
[QA] GaussMark: A Practical Approach for Structural Watermarking of Language Models
Jan 27, 2025
GaussMark: A Practical Approach for Structural Watermarking of Language Models
Jan 27, 2025
[QA] Kimi k1.5: Scaling Reinforcement Learning with LLMs
Jan 26, 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Jan 26, 2025
[QA] Can We Generate Images with CoT?
Jan 26, 2025
Can We Generate Images with CoT?
Jan 26, 2025
[QA] Physics of Skill Learning
Jan 25, 2025
Physics of Skill Learning
Jan 25, 2025
[QA] Hallucinations Can Improve Large Language Models in Drug Discovery
Jan 25, 2025
Hallucinations Can Improve Large Language Models in Drug Discovery
Jan 25, 2025
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Jan 24, 2025
[QA] Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
Jan 24, 2025
Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
Jan 24, 2025
[QA] FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
Jan 23, 2025
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
Jan 23, 2025
[QA] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Jan 23, 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Jan 23, 2025
[QA] Reasoning Language Models: A Blueprint
Jan 22, 2025
Reasoning Language Models: A Blueprint
Jan 22, 2025
[QA] Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
Jan 22, 2025
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
Jan 22, 2025
[QA] Evolving Deeper LLM Thinking
Jan 20, 2025
Evolving Deeper LLM Thinking
Jan 20, 2025
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Jan 20, 2025
[QA] Enhancing Generalization in Chain of Thought Reasoning for Smaller Models
Jan 20, 2025
[QA] How GPT Learns Layer by Layer
Jan 19, 2025
How GPT Learns Layer by Layer
Jan 19, 2025
[QA] FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Jan 19, 2025
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Jan 19, 2025
[QA] Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Jan 18, 2025
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Jan 18, 2025
[QA] LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Jan 18, 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Jan 18, 2025
[QA] Towards Understanding Extrapolation: a Causal Lens
Jan 17, 2025
Towards Understanding Extrapolation: a Causal Lens
Jan 17, 2025
[QA] Do generative video models learn physical principles from watching videos?
Jan 17, 2025
Do generative video models learn physical principles from watching videos?
Jan 17, 2025
[QA] Joint Learning of Depth and Appearance for Portrait Image Animation
Jan 16, 2025
Joint Learning of Depth and Appearance for Portrait Image Animation
Jan 16, 2025
[QA] Dissecting a Small Artificial Neural Network
Jan 16, 2025
Dissecting a Small Artificial Neural Network
Jan 16, 2025
[QA] Diffusion Adversarial Post-Training for One-Step Video Generation
Jan 15, 2025
Diffusion Adversarial Post-Training for One-Step Video Generation
Jan 15, 2025
[QA] Inference-Time-Compute: More Faithful? A Research Note
Jan 15, 2025
Inference-Time-Compute: More Faithful? A Research Note
Jan 15, 2025
[QA] Transformer: Self-adaptive LLMs
Jan 14, 2025
Transformer: Self-adaptive LLMs
Jan 14, 2025
[QA] Soup to go: mitigating forgetting during continual learning with model averaging
Jan 13, 2025
Soup to go: mitigating forgetting during continual learning with model averaging
Jan 13, 2025
[QA] Emergent Symbol-like Number Variables in Artificial Neural Networks
Jan 13, 2025
Emergent Symbol-like Number Variables in Artificial Neural Networks
Jan 13, 2025
[QA] Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Jan 11, 2025
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Jan 11, 2025
[QA] Representing Long Volumetric Video with Temporal Gaussian Hierarchy
Jan 11, 2025
Representing Long Volumetric Video with Temporal Gaussian Hierarchy
Jan 11, 2025
[QA] Uncertainty-aware Knowledge Tracing
Jan 10, 2025
Uncertainty-aware Knowledge Tracing
Jan 10, 2025
[QA] The GAN is dead; long live the GAN! A Modern Baseline GAN
Jan 10, 2025
The GAN is dead; long live the GAN! A Modern Baseline GAN
Jan 10, 2025
[QA] Supervision-free Vision-Language Alignment
Jan 09, 2025
Supervision-free Vision-Language Alignment
Jan 09, 2025
[QA] Grokking at the Edge of Numerical Stability
Jan 09, 2025
Grokking at the Edge of Numerical Stability
Jan 09, 2025
[QA] ComMer: a Framework for Compressing and Merging User Data for Personalization
Jan 08, 2025
ComMer: a Framework for Compressing and Merging User Data for Personalization
Jan 08, 2025
[QA] Entropy-Guided Attention for Private LLMs
Jan 08, 2025
Entropy-Guided Attention for Private LLMs
Jan 08, 2025
[QA] Easing Optimization Paths: a Circuit Perspective
Jan 07, 2025
Easing Optimization Paths: a Circuit Perspective
Jan 07, 2025
[QA] Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
Jan 07, 2025
Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
Jan 07, 2025
[QA] Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Jan 06, 2025
Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Jan 06, 2025
[QA] Predicting the Performance of Black-box LLMs through Self-Queries
Jan 06, 2025
Predicting the Performance of Black-box LLMs through Self-Queries
Jan 06, 2025
[QA] On Unifying Video Generation and Camera Pose Estimation
Jan 04, 2025
On Unifying Video Generation and Camera Pose Estimation
Jan 04, 2025
[QA] An analytic theory of creativity in convolutional diffusion models
Jan 04, 2025
An analytic theory of creativity in convolutional diffusion models
Jan 04, 2025
[QA] Finding Missed Code Size Optimizations in Compilers using LLMs
Jan 03, 2025
Finding Missed Code Size Optimizations in Compilers using LLMs
Jan 03, 2025
[QA] Titans: Learning to Memorize at Test Time
Jan 03, 2025
Titans: Learning to Memorize at Test Time
Jan 03, 2025
[QA] Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Dec 31, 2024
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Dec 31, 2024
[QA] Functional Risk Minimization
Dec 31, 2024
Functional Risk Minimization
Dec 31, 2024
[QA] HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Dec 30, 2024
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Dec 30, 2024
InfAlign: Inference-aware language model alignment
Dec 30, 2024
[QA] Consistency Checks for Language Model Forecasters
Dec 25, 2024
Consistency Checks for Language Model Forecasters
Dec 25, 2024
[QA] Deliberation in Latent Space via Differentiable Cache Augmentation
Dec 24, 2024
Deliberation in Latent Space via Differentiable Cache Augmentation
Dec 24, 2024
[QA] Automating the Search for Artificial Life with Foundation Models
Dec 24, 2024
Automating the Search for Artificial Life with Foundation Models
Dec 24, 2024
[QA] LLMs for Literature Review: Are we there yet?
Dec 23, 2024
LLMs for Literature Review: Are we there yet?
Dec 23, 2024
[QA] WebLLM: A High-Performance In-Browser LLM Inference Engine
Dec 23, 2024
WebLLM: A High-Performance In-Browser LLM Inference Engine
Dec 23, 2024
[QA] A Survey on LLM Inference-Time Self-Improvement
Dec 22, 2024
A Survey on LLM Inference-Time Self-Improvement
Dec 22, 2024
[QA] Tokenisation is NP-Complete
Dec 22, 2024
Tokenisation is NP-Complete
Dec 22, 2024
[QA] The Open-Source Advantage in Large Language Models (LLMs)
Dec 21, 2024
The Open-Source Advantage in Large Language Models (LLMs)
Dec 21, 2024
[QA] Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Dec 21, 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Dec 21, 2024
[QA] DriveGPT: Scaling Autoregressive Behavior Models for Driving
Dec 20, 2024
DriveGPT: Scaling Autoregressive Behavior Models for Driving
Dec 20, 2024
[QA] MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Dec 20, 2024
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Dec 20, 2024
[QA] Byte Latent Transformer: Patches Scale Better Than Tokens
Dec 18, 2024
Byte Latent Transformer: Patches Scale Better Than Tokens
Dec 18, 2024
[QA] Transformers Struggle to Learn to Search
Dec 09, 2024
Transformers Struggle to Learn to Search
Dec 09, 2024
[QA] Navigation World Models
Dec 07, 2024
Navigation World Models
Dec 07, 2024
[QA] Motion Prompting: Controlling Video Generation with Motion Trajectories
Dec 07, 2024
Motion Prompting: Controlling Video Generation with Motion Trajectories
Dec 07, 2024
[QA] Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Dec 06, 2024
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Dec 06, 2024
[QA] NVILA: Efficient Frontier Visual Language Models
Dec 06, 2024
NVILA: Efficient Frontier Visual Language Models
Dec 06, 2024
[QA] o1-Coder: an o1 Replication for Coding
Dec 03, 2024
o1-Coder: an o1 Replication for Coding
Dec 03, 2024
[QA] Efficient Track Anything
Dec 03, 2024
Efficient Track Anything
Dec 03, 2024
[QA] Reverse Thinking Makes LLMs Stronger Reasoners
Dec 02, 2024
Reverse Thinking Makes LLMs Stronger Reasoners
Dec 02, 2024
[QA] Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM’s Reasoning Capability
Dec 02, 2024
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM’s Reasoning Capability
Dec 02, 2024
[QA] JetFormer: an autoregressive generative model of raw images and text
Dec 02, 2024
JetFormer: an autoregressive generative model of raw images and text
Dec 02, 2024
[QA] CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Nov 29, 2024
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Nov 29, 2024
Attamba: Attending To Multi-Token States
Nov 28, 2024
[QA] Star Attention: Efficient LLM Inference over Long Sequences
Nov 28, 2024
Star Attention: Efficient LLM Inference over Long Sequences
Nov 28, 2024
[QA] ROICtrl: Boosting Instance Control for Visual Generation
Nov 28, 2024
ROICtrl: Boosting Instance Control for Visual Generation
Nov 28, 2024
[QA] Solaris: A Foundation Model of the Sun
Nov 26, 2024
Solaris: A Foundation Model of the Sun
Nov 26, 2024
[QA] Even Sparser Graph Transformers
Nov 26, 2024
Even Sparser Graph Transformers
Nov 26, 2024
[QA] Understanding LLM Embeddings for Regression
Nov 25, 2024
Understanding LLM Embeddings for Regression
Nov 25, 2024
[QA] Loss-to-Loss Prediction: Scaling Laws for All Datasets
Nov 24, 2024
Loss-to-Loss Prediction: Scaling Laws for All Datasets
Nov 24, 2024
[QA] When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Nov 23, 2024
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Nov 23, 2024
[QA] SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
Nov 23, 2024
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
Nov 23, 2024
[QA] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Nov 22, 2024
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Nov 22, 2024
[QA] Hymba: A Hybrid-head Architecture for Small Language Models
Nov 22, 2024
Hymba: A Hybrid-head Architecture for Small Language Models
Nov 22, 2024
[QA] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Nov 21, 2024
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Nov 21, 2024
[QA] Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Nov 20, 2024
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Nov 20, 2024
[QA] Introducing Milabench: Benchmarking Accelerators for AI
Nov 20, 2024
Introducing Milabench: Benchmarking Accelerators for AI
Nov 20, 2024
[QA] Does Prompt Formatting Have Any Impact on LLM Performance?
Nov 19, 2024
Does Prompt Formatting Have Any Impact on LLM Performance?
Nov 19, 2024
[QA] Steering Language Model Refusal with Sparse Autoencoders
Nov 19, 2024
Steering Language Model Refusal with Sparse Autoencoders
Nov 19, 2024
[QA] LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Nov 18, 2024
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Nov 18, 2024
[QA] Refusal in LLMs is an Affine Function
Nov 15, 2024
Refusal in LLMs is an Affine Function
Nov 15, 2024
[QA] Cut Your Losses in Large-Vocabulary Language Models
Nov 15, 2024
Cut Your Losses in Large-Vocabulary Language Models
Nov 15, 2024
[QA] Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
Nov 12, 2024
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
Nov 12, 2024
[QA] Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
Nov 12, 2024
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
Nov 12, 2024
[QA] Aioli: A unified optimization framework for language model data mixing
Nov 11, 2024
Aioli: A unified optimization framework for language model data mixing
Nov 11, 2024
[QA] BALANCING PIPELINE PARALLELISM WITH VOCABULARY PARALLELISM
Nov 11, 2024
BALANCING PIPELINE PARALLELISM WITH VOCABULARY PARALLELISM
Nov 11, 2024
[QA] Can Transformers Smell Like Humans?
Nov 10, 2024
Can Transformers Smell Like Humans?
Nov 10, 2024
[QA] Mixtures of In-Context Learners
Nov 10, 2024
Mixtures of In-Context Learners
Nov 10, 2024
[QA] How Far Is Video Generation from World Model: A Physical Law Perspective
Nov 09, 2024
How Far Is Video Generation from World Model: A Physical Law Perspective
Nov 09, 2024
[QA] ADOPT: Modified Adam Can Converge with Any with the Optimal Rate
Nov 09, 2024
ADOPT: Modified Adam Can Converge with Any with the Optimal Rate
Nov 09, 2024
[QA] Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Nov 08, 2024
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Nov 08, 2024
[QA] Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Nov 08, 2024
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Nov 08, 2024
[QA] Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
Nov 07, 2024
Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
Nov 07, 2024
[QA] How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
Nov 07, 2024
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
Nov 07, 2024
[QA] Discovering Data Structures: Nearest Neighbor Search and Beyond
Nov 06, 2024
Discovering Data Structures: Nearest Neighbor Search and Beyond
Nov 06, 2024
[QA] BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?
Nov 06, 2024
BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?
Nov 06, 2024
[QA] Adapting Language Models via Token Translation
Nov 04, 2024
Adapting Language Models via Token Translation
Nov 04, 2024
[QA] Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
Nov 04, 2024
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
Nov 04, 2024
[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
Nov 02, 2024
Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
Nov 02, 2024
[QA] $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Nov 02, 2024
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Nov 02, 2024
[QA] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Nov 01, 2024
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Nov 01, 2024
[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
Oct 31, 2024
Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters
Oct 31, 2024
[QA] Where Do Large Learning Rates Lead Us?
Oct 30, 2024
Where Do Large Learning Rates Lead Us?
Oct 30, 2024
[QA] Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
Oct 30, 2024
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
Oct 30, 2024
[QA] LoRA vs Full Fine-tuning: An Illusion of Equivalence
Oct 29, 2024
LoRA vs Full Fine-tuning: An Illusion of Equivalence
Oct 29, 2024
[QA] Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Oct 28, 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Oct 28, 2024
[QA] Computational Bottlenecks of Training Small-scale Large Language Models
Oct 28, 2024
Computational Bottlenecks of Training Small-scale Large Language Models
Oct 28, 2024
[QA] Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees
Oct 26, 2024
Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees
Oct 26, 2024
[QA] A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Oct 26, 2024
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Oct 26, 2024
[QA] LEGO: Language Model Building Blocks
Oct 25, 2024
LEGO: Language Model Building Blocks
Oct 25, 2024
[QA] Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data
Oct 25, 2024
Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data
Oct 25, 2024
[QA] Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers
Oct 24, 2024
Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers
Oct 24, 2024
[QA] ALTA: Compiler-Based Analysis of Transformers
Oct 24, 2024
ALTA: Compiler-Based Analysis of Transformers
Oct 24, 2024
[QA] UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
Oct 23, 2024
UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
Oct 23, 2024
[QA] Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
Oct 23, 2024
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
Oct 23, 2024
[QA] Generative Reward Models
Oct 22, 2024
Generative Reward Models
Oct 22, 2024
[QA] Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks
Oct 21, 2024
Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks
Oct 21, 2024
[QA] Decomposing The Dark Matter of Sparse Autoencoders
Oct 21, 2024
Decomposing The Dark Matter of Sparse Autoencoders
Oct 21, 2024
[QA] A Hitchhiker's Guide to Scaling Law Estimation
Oct 20, 2024
A Hitchhiker's Guide to Scaling Law Estimation
Oct 20, 2024
[QA] Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Oct 20, 2024
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Oct 20, 2024
[QA] Looking Inward: Language Models Can Learn About Themselves by Introspection
Oct 19, 2024
Looking Inward: Language Models Can Learn About Themselves by Introspection
Oct 19, 2024
[QA] Thinking LLMs: General Instruction Following with Thought Generation
Oct 19, 2024
Thinking LLMs: General Instruction Following with Thought Generation
Oct 19, 2024
[QA] Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Oct 18, 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Oct 18, 2024
[QA] MOVIE GEN: A Cast of Media Foundation Models
Oct 18, 2024
MOVIE GEN: A Cast of Media Foundation Models
Oct 18, 2024
[QA] One Step Diffusion via Shortcut Models
Oct 17, 2024
One Step Diffusion via Shortcut Models
Oct 17, 2024
[QA] Inference Scaling for Long-Context Retrieval Augmented Generation
Oct 17, 2024
Inference Scaling for Long-Context Retrieval Augmented Generation
Oct 17, 2024
[QA] What Matters in Transformers? Not All Attention is Needed
Oct 16, 2024
What Matters in Transformers? Not All Attention is Needed
Oct 16, 2024
[QA] Language Models Encode Numbers Using Digit Representations in Base 10
Oct 16, 2024
Language Models Encode Numbers Using Digit Representations in Base 10
Oct 16, 2024
[QA] Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs
Oct 14, 2024
Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs
Oct 14, 2024
[QA] Do Unlearning Methods Remove Information from Language Model Weights?
Oct 14, 2024
Do Unlearning Methods Remove Information from Language Model Weights?
Oct 14, 2024
[QA] MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Oct 13, 2024
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Oct 13, 2024
[QA] Pixtral 12B
Oct 13, 2024
Pixtral 12B
Oct 13, 2024
[QA] Differential Transformer
Oct 12, 2024
Differential Transformer
Oct 12, 2024
[QA] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Oct 12, 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Oct 12, 2024
[QA] Efficient Dictionary Learning with Switch Sparse Autoencoders
Oct 11, 2024
Efficient Dictionary Learning with Switch Sparse Autoencoders
Oct 11, 2024
[QA] Visual Scratchpads: Enabling Global Reasoning in Vision
Oct 11, 2024
Visual Scratchpads: Enabling Global Reasoning in Vision
Oct 11, 2024
[QA] RL, but don't do anything I wouldn't do
Oct 10, 2024
RL, but don't do anything I wouldn't do
Oct 10, 2024
[QA] Restructuring Vector Quantization with the Rotation Trick
Oct 10, 2024
Restructuring Vector Quantization with the Rotation Trick
Oct 10, 2024
[QA] EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM?
Oct 08, 2024
EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM?
Oct 08, 2024
[QA] Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Oct 08, 2024
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Oct 08, 2024
[QA] Teaching Transformers Modular Arithmetic at Scale
Oct 07, 2024
Teaching Transformers Modular Arithmetic at Scale
Oct 07, 2024
[QA] What Matters for Model Merging at Scale?
Oct 07, 2024
What Matters for Model Merging at Scale?
Oct 07, 2024
[QA] Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Oct 05, 2024
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Oct 05, 2024
[QA] Were RNNs All We Needed?
Oct 05, 2024
Were RNNs All We Needed?
Oct 05, 2024
[QA] OOD-CHAMELEON: Is Algorithm Selection for OOD Generalization Learnable?
Oct 04, 2024
OOD-CHAMELEON: Is Algorithm Selection for OOD Generalization Learnable?
Oct 04, 2024
[QA] Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Oct 04, 2024
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Oct 04, 2024
[QA] Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Oct 03, 2024
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Oct 03, 2024
[QA] Not All LLM Reasoners Are Created Equal
Oct 03, 2024
Not All LLM Reasoners Are Created Equal
Oct 03, 2024
[QA] Law of the Weakest Link: Cross Capabilities of Large Language Models
Oct 02, 2024
Law of the Weakest Link: Cross Capabilities of Large Language Models
Oct 02, 2024
[QA] Realistic Evaluation of Model Merging for Compositional Generalization
Oct 01, 2024
Realistic Evaluation of Model Merging for Compositional Generalization
Oct 01, 2024
[QA] Emu3: Next-Token Prediction is All You Need
Sep 30, 2024
Emu3: Next-Token Prediction is All You Need
Sep 30, 2024
[QA] MIO: A Foundation Model on Multimodal Tokens
Sep 30, 2024
MIO: A Foundation Model on Multimodal Tokens
Sep 30, 2024
[QA] A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor ?
Sep 29, 2024
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor ?
Sep 29, 2024
[QA] Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Sep 29, 2024
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Sep 29, 2024
[QA] Making Text Embedders Few-Shot Learners
Sep 28, 2024
Making Text Embedders Few-Shot Learners
Sep 28, 2024
[QA] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Sep 28, 2024
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Sep 28, 2024
[QA] Infer Human's Intentions Before Following Natural Language Instruction
Sep 27, 2024
Infer Human's Intentions Before Following Natural Language Instruction
Sep 27, 2024
[QA] MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Sep 27, 2024
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Sep 27, 2024
[QA] Counterfactual Token Generation in Large Language Models
Sep 26, 2024
Counterfactual Token Generation in Large Language Models
Sep 26, 2024
[QA] Characterizing stable regions in the residual stream of LLMs
Sep 26, 2024
Characterizing stable regions in the residual stream of LLMs
Sep 26, 2024
[QA] Watch Your Steps: Observable and Modular Chains of Thought
Sep 25, 2024
Watch Your Steps: Observable and Modular Chains of Thought
Sep 25, 2024
[QA] Seeing Faces in Things: A Model and Dataset for Pareidolia
Sep 25, 2024
Seeing Faces in Things: A Model and Dataset for Pareidolia
Sep 25, 2024
[QA] Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
Sep 24, 2024
Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
Sep 24, 2024
[QA] Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Sep 24, 2024
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Sep 24, 2024
[QA] LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
Sep 23, 2024
LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
Sep 23, 2024
[QA] Embedding Geometries of Contrastive Language-Image Pre-Training
Sep 23, 2024
Embedding Geometries of Contrastive Language-Image Pre-Training
Sep 23, 2024
[QA] Kolmogorov–Arnold Transformer
Sep 21, 2024
Kolmogorov–Arnold Transformer
Sep 21, 2024
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Sep 21, 2024
[QA] Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Sep 21, 2024
[QA] Re-Introducing LayerNorm: Geometric Meaning, Irreversibility and a Comparative Study with RMSNorm
Sep 20, 2024
Re-Introducing LayerNorm: Geometric Meaning, Irreversibility and a Comparative Study with RMSNorm
Sep 20, 2024
[QA] Is Tokenization Needed for Masked Particle Modelling?
Sep 20, 2024
Is Tokenization Needed for Masked Particle Modelling?
Sep 20, 2024
[QA] Finetuning Language Models to Emit Linguistic Expressions of Uncertainty
Sep 19, 2024
Finetuning Language Models to Emit Linguistic Expressions of Uncertainty
Sep 19, 2024
[QA] To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Sep 19, 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Sep 19, 2024
[QA] On the limits of agency in agent-based models
Sep 18, 2024
On the limits of agency in agent-based models
Sep 18, 2024
[QA] Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Sep 18, 2024
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Sep 18, 2024
[QA] Finetuning CLIP to Reason about Pairwise Differences
Sep 17, 2024
Finetuning CLIP to Reason about Pairwise Differences
Sep 17, 2024
[QA] Think Twice Before You Act: Improving Inverse Problem Solving With MCMC
Sep 16, 2024
Think Twice Before You Act: Improving Inverse Problem Solving With MCMC
Sep 16, 2024
[QA] Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Sep 16, 2024
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Sep 16, 2024
LLMs Will Always Hallucinate, and We Need to Live With This
Sep 15, 2024
[QA] PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Sep 15, 2024
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Sep 15, 2024
[QA]  LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Sep 14, 2024
 LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Sep 14, 2024
[QA] WINDOWS AGENT ARENA: Evaluating Multi-Modal OS Agents at Scale
Sep 14, 2024
WINDOWS AGENT ARENA: Evaluating Multi-Modal OS Agents at Scale
Sep 14, 2024
[QA] What Makes a Maze Look Like a Maze?
Sep 13, 2024
What Makes a Maze Look Like a Maze?
Sep 13, 2024
[QA] Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
Sep 13, 2024
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
Sep 13, 2024
[QA] Synthetic continued pretraining
Sep 12, 2024
Synthetic continued pretraining
Sep 12, 2024
[QA] Agent Workflow Memory
Sep 12, 2024
Agent Workflow Memory
Sep 12, 2024
[QA] Programming Refusal with Conditional Activation Steering
Sep 11, 2024
Programming Refusal with Conditional Activation Steering
Sep 11, 2024
[QA] Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Sep 11, 2024
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Sep 11, 2024
[QA] LoCa: Logit Calibration for Knowledge Distillation
Sep 10, 2024
LoCa: Logit Calibration for Knowledge Distillation
Sep 10, 2024
[QA] Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Sep 09, 2024
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Sep 09, 2024
[QA] Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Sep 09, 2024
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Sep 09, 2024
[QA] xLAM: A Family of Large Action Models to Empower AI Agent Systems
Sep 07, 2024
xLAM: A Family of Large Action Models to Empower AI Agent Systems
Sep 07, 2024
[QA] In Defense of RAG in the Era of Long-Context Language Models
Sep 07, 2024
In Defense of RAG in the Era of Long-Context Language Models
Sep 07, 2024
[QA] Building Math Agents with Multi-Turn Iterative Preference Learning
Sep 07, 2024
Building Math Agents with Multi-Turn Iterative Preference Learning
Sep 07, 2024
[QA] Attention Heads of Large Language Models: A Survey
Sep 07, 2024
Attention Heads of Large Language Models: A Survey
Sep 07, 2024
[QA] The AdEMAMix Optimizer: Better, Faster, Older
Sep 06, 2024
The AdEMAMix Optimizer: Better, Faster, Older
Sep 06, 2024
[QA] Planning In Natural Language Improves LLM Search For Code Generation
Sep 06, 2024
Planning In Natural Language Improves LLM Search For Code Generation
Sep 06, 2024
[QA] MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Sep 05, 2024
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Sep 05, 2024
[QA] Sample what you can't compress
Sep 05, 2024
Sample what you can't compress
Sep 05, 2024
[QA] CONTEXTCITE: Attributing Model Generation to Context
Sep 04, 2024
CONTEXTCITE: Attributing Model Generation to Context
Sep 04, 2024
[QA] FLUX that Plays Music
Sep 04, 2024
FLUX that Plays Music
Sep 04, 2024
[QA] Modularity in Transformers: Investigating Neuron Separability & Specialization
Sep 03, 2024
Modularity in Transformers: Investigating Neuron Separability & Specialization
Sep 03, 2024
[QA] Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Sep 02, 2024
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Sep 02, 2024
[QA] Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Aug 29, 2024
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Aug 29, 2024
[QA] CycleGAN with Better Cycles
Aug 29, 2024
CycleGAN with Better Cycles
Aug 29, 2024
[QA] The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Aug 28, 2024
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Aug 28, 2024
[QA] Generative Verifiers: Reward Modeling as Next-Token Prediction
Aug 28, 2024
Generative Verifiers: Reward Modeling as Next-Token Prediction
Aug 28, 2024
[QA] Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Aug 27, 2024
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Aug 27, 2024
[QA] A Law of Next-Token Prediction in Large Language Models
Aug 27, 2024
A Law of Next-Token Prediction in Large Language Models
Aug 27, 2024
[QA] SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection
Aug 26, 2024
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection
Aug 26, 2024
[QA] How Diffusion Models Learn to Factorize and Compose
Aug 26, 2024
How Diffusion Models Learn to Factorize and Compose
Aug 26, 2024
[QA] FERRET: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
Aug 25, 2024
FERRET: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
Aug 25, 2024
[QA] Scalable Autoregressive Image Generation with Mamba
Aug 25, 2024
Scalable Autoregressive Image Generation with Mamba
Aug 25, 2024
[QA] TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Aug 25, 2024
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Aug 25, 2024
[QA] FocusLLM: Scaling LLM's Context by Parallel Decoding
Aug 25, 2024
FocusLLM: Scaling LLM's Context by Parallel Decoding
Aug 25, 2024
[QA] Sapiens: Foundation for Human Vision Models
Aug 24, 2024
Sapiens: Foundation for Human Vision Models
Aug 24, 2024
[QA] Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Aug 24, 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Aug 24, 2024
[QA] Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Aug 23, 2024
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Aug 23, 2024
[QA] Hermes 3 Technical Report
Aug 23, 2024
Hermes 3 Technical Report
Aug 23, 2024
[QA] LLM Pruning and Distillation in Practice: The Minitron Approach
Aug 22, 2024
LLM Pruning and Distillation in Practice: The Minitron Approach
Aug 22, 2024
[QA] Approaching Deep Learning through the Spectral Dynamics of Weights
Aug 22, 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
Aug 22, 2024
[QA] Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Aug 21, 2024
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Aug 21, 2024
[QA] Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Aug 21, 2024
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Aug 21, 2024
[QA] Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aug 20, 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aug 20, 2024
[QA] JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
Aug 19, 2024
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
Aug 19, 2024
[QA] TextCAVs: Debugging vision models using text
Aug 19, 2024
TextCAVs: Debugging vision models using text
Aug 19, 2024
[QA] Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Aug 18, 2024
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Aug 18, 2024
[QA] Towards flexible perception with visual memory
Aug 17, 2024
Towards flexible perception with visual memory
Aug 17, 2024
[QA] I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm  
Aug 17, 2024
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm  
Aug 17, 2024
[QA] BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
Aug 16, 2024
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
Aug 16, 2024
[QA] Can Large Language Models Understand Symbolic Graphics Programs?
Aug 16, 2024
Can Large Language Models Understand Symbolic Graphics Programs?
Aug 16, 2024
[QA] Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Aug 15, 2024
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Aug 15, 2024
[QA] Generative Photomontage
Aug 15, 2024
Generative Photomontage
Aug 15, 2024
[QA] Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Aug 14, 2024
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Aug 14, 2024
[QA] Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction
Aug 14, 2024
Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction
Aug 14, 2024
[QA] The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Aug 13, 2024
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Aug 13, 2024
[QA] Body Transformer: Leveraging Robot Embodiment for Policy Learning
Aug 13, 2024
Body Transformer: Leveraging Robot Embodiment for Policy Learning
Aug 13, 2024
[QA] VITA: Towards Open-Source Interactive Omni Multimodal LLM
Aug 12, 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Aug 12, 2024
[QA] Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Aug 12, 2024
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Aug 12, 2024
[QA] Diffusion Guided Language Modeling
Aug 11, 2024
Diffusion Guided Language Modeling
Aug 11, 2024
[QA] Conversational Prompt Engineering
Aug 11, 2024
Conversational Prompt Engineering
Aug 11, 2024
[QA] Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Aug 10, 2024
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Aug 10, 2024
[QA] Better Alignment with Instruction Back-and-Forth Translation
Aug 10, 2024
Better Alignment with Instruction Back-and-Forth Translation
Aug 10, 2024
[QA] The Ungrounded Alignment Problem
Aug 09, 2024
The Ungrounded Alignment Problem
Aug 09, 2024
[QA] Prioritize Alignment in Dataset Distillation
Aug 08, 2024
Prioritize Alignment in Dataset Distillation
Aug 08, 2024
[QA] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
Aug 08, 2024
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
Aug 08, 2024
[QA] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Aug 07, 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Aug 07, 2024
[QA] Language Model Can Listen While Speaking
Aug 06, 2024
Language Model Can Listen While Speaking
Aug 06, 2024
[QA] Self-Taught Evaluators
Aug 06, 2024
Self-Taught Evaluators
Aug 06, 2024
[QA] Conditional LoRA Parameter Generation
Aug 05, 2024
Conditional LoRA Parameter Generation
Aug 05, 2024
[QA] Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Aug 05, 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Aug 05, 2024
[QA] Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Aug 04, 2024
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Aug 04, 2024
[QA] MindSearch : Mimicking Human Minds Elicits Deep AI Searcher
Aug 04, 2024
MindSearch : Mimicking Human Minds Elicits Deep AI Searcher
Aug 04, 2024
[QA] Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Aug 03, 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Aug 03, 2024
[QA] Gemma 2: Improving Open Language Models at a Practical Size
Aug 03, 2024
Gemma 2: Improving Open Language Models at a Practical Size
Aug 03, 2024
[QA] AI-Assisted Generation of Difficult Math Questions
Aug 02, 2024
AI-Assisted Generation of Difficult Math Questions
Aug 02, 2024
[QA] SAM 2: Segment Anything in Images and Videos
Aug 02, 2024
SAM 2: Segment Anything in Images and Videos
Aug 02, 2024
[QA] MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Aug 01, 2024
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Aug 01, 2024
[QA] Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Aug 01, 2024
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Aug 01, 2024
[QA] Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Jul 31, 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Jul 31, 2024
[QA] How to Measure the Intelligence of Large Language Models?
Jul 31, 2024
How to Measure the Intelligence of Large Language Models?
Jul 31, 2024
[QA] CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
Jul 28, 2024
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
Jul 28, 2024
[QA] Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Jul 28, 2024
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Jul 28, 2024
[QA] Course-Correction: Safety Alignment Using Synthetic Preferences
Jul 27, 2024
Course-Correction: Safety Alignment Using Synthetic Preferences
Jul 27, 2024
[QA] Solving The Travelling Salesman Problem Using A Single Qubit
Jul 27, 2024
Solving The Travelling Salesman Problem Using A Single Qubit
Jul 27, 2024
[QA] Exploring Scaling Trends in LLM Robustness
Jul 26, 2024
Exploring Scaling Trends in LLM Robustness
Jul 26, 2024
[QA] VILA2: VILA Augmented VILA
Jul 25, 2024
VILA2: VILA Augmented VILA
Jul 25, 2024
[QA] OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Jul 25, 2024
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Jul 25, 2024
[QA] MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
Jul 24, 2024
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
Jul 24, 2024
[QA] KAN or MLP: A Fairer Comparison
Jul 24, 2024
KAN or MLP: A Fairer Comparison
Jul 24, 2024
[QA] Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Jul 23, 2024
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Jul 23, 2024
[QA] BOND: Aligning LLMs with Best-of-N Distillation
Jul 23, 2024
BOND: Aligning LLMs with Best-of-N Distillation
Jul 23, 2024
[QA] Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Jul 23, 2024
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Jul 23, 2024
[QA] LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Jul 23, 2024
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Jul 23, 2024
[QA] Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Jul 21, 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Jul 21, 2024
[QA] NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
Jul 21, 2024
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
Jul 21, 2024
[QA] Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Jul 20, 2024
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Jul 20, 2024
[QA] Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
Jul 20, 2024
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
Jul 20, 2024
[QA] Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Jul 19, 2024
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Jul 19, 2024
[QA] Beyond KV Caching: Shared Attention for Efficient LLMs
Jul 19, 2024
Beyond KV Caching: Shared Attention for Efficient LLMs
Jul 19, 2024
[QA] Private prediction for large-scale synthetic text generation
Jul 18, 2024
Private prediction for large-scale synthetic text generation1
Jul 18, 2024
[QA] Chip Placement with Diffusion
Jul 18, 2024
Chip Placement with Diffusion
Jul 18, 2024
[QA] Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness
Jul 17, 2024
Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness
Jul 17, 2024
[QA] Does Refusal Training in LLMs Generalize to the Past Tense?
Jul 17, 2024
Does Refusal Training in LLMs Generalize to the Past Tense?
Jul 17, 2024
[QA] No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations
Jul 16, 2024
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations
Jul 16, 2024
[QA] LLM Circuit Analyses Are Consistent Across Training and Scale
Jul 16, 2024
LLM Circuit Analyses Are Consistent Across Training and Scale
Jul 16, 2024
[QA] SPREADSHEETLLM: Encoding Spreadsheets for Large Language Models
Jul 15, 2024
SPREADSHEETLLM: Encoding Spreadsheets for Large Language Models
Jul 15, 2024
[QA] Transformer Layers as Painters
Jul 15, 2024
Transformer Layers as Painters
Jul 15, 2024
[QA] Lynx: An Open Source Hallucination Evaluation Model
Jul 14, 2024
Lynx: An Open Source Hallucination Evaluation Model
Jul 14, 2024
[QA] Deconstructing What Makes a Good Optimizer for Language Models
Jul 14, 2024
Deconstructing What Makes a Good Optimizer for Language Models
Jul 14, 2024
[QA] Distilling System 2 into System 1
Jul 13, 2024
Distilling System 2 into System 1
Jul 13, 2024
[QA] Video Diffusion Alignment via Reward Gradients
Jul 13, 2024
Video Diffusion Alignment via Reward Gradients
Jul 13, 2024
[QA] Gradient Boosting Reinforcement Learning
Jul 12, 2024
Gradient Boosting Reinforcement Learning
Jul 12, 2024
[QA] PaliGemma: A versatile 3B VLM for transfer
Jul 12, 2024
PaliGemma: A versatile 3B VLM for transfer
Jul 12, 2024
[QA] Transformer Alignment in Large Language Models
Jul 11, 2024
Transformer Alignment in Large Language Models
Jul 11, 2024
[QA] Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformers
Jul 11, 2024
Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformers
Jul 11, 2024
[QA] Composable Interventions for Language Models
Jul 10, 2024
Composable Interventions for Language Models
Jul 10, 2024
[QA] Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Jul 10, 2024
Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Jul 10, 2024
[QA] Stepping on the Edge: Curvature Aware Learning Rate Tuners
Jul 09, 2024
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Jul 09, 2024
[QA] Unveiling Encoder-Free Vision-Language Models
Jul 09, 2024
Unveiling Encoder-Free Vision-Language Models
Jul 09, 2024
[QA] Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Jul 08, 2024
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Jul 08, 2024
[QA] On scalable oversight with weak LLMs judging strong LLMs
Jul 08, 2024
On scalable oversight with weak LLMs judging strong LLMs
Jul 08, 2024
[QA] Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Jul 07, 2024
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Jul 07, 2024
[QA] Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
Jul 07, 2024
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
Jul 07, 2024
[QA] InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Jul 05, 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Jul 05, 2024
[QA] Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Jul 05, 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Jul 05, 2024
[QA] LLM-Select: Feature Selection with Large Language Models
Jul 04, 2024
LLM-Select: Feature Selection with Large Language Models
Jul 04, 2024
[QA] Universal Length Generalization with Turing Programs
Jul 04, 2024
Universal Length Generalization with Turing Programs
Jul 04, 2024
[QA] Searching for Best Practices in Retrieval-Augmented Generation
Jul 03, 2024
Searching for Best Practices in Retrieval-Augmented Generation
Jul 03, 2024
[QA] AI Agents That Matter
Jul 02, 2024
AI Agents That Matter
Jul 02, 2024
[QA] Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
Jul 01, 2024
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
Jul 01, 2024
[QA] Scaling Synthetic Data Creation with 1,000,000,000 Personas
Jul 01, 2024
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Jul 01, 2024
[QA] Is Programming by Example solved by LLMs?
Jun 30, 2024
Is Programming by Example solved by LLMs?
Jun 30, 2024
[QA] Can LLMs Learn by Teaching? A Preliminary Study
Jun 30, 2024
Can LLMs Learn by Teaching? A Preliminary Study
Jun 30, 2024
[QA] Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Jun 29, 2024
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Jun 29, 2024
[QA] Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction
Jun 29, 2024
Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction
Jun 29, 2024
[QA] REVISION MATTERS: Generative Design Guided by Revision Edits
Jun 28, 2024
REVISION MATTERS: Generative Design Guided by Revision Edits
Jun 28, 2024
[QA] The Remarkable Robustness of LLMs: Stages of Inference?
Jun 28, 2024
The Remarkable Robustness of LLMs: Stages of Inference?
Jun 28, 2024
[QA] Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Jun 27, 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Jun 27, 2024
[QA] Data curation via joint example selection further accelerates multimodal learning
Jun 27, 2024
Data curation via joint example selection further accelerates multimodal learning
Jun 27, 2024
[QA] Large Language Models are Interpretable Learners
Jun 26, 2024
Large Language Models are Interpretable Learners
Jun 26, 2024
[QA] Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
Jun 26, 2024
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
Jun 26, 2024
[QA] Adam-mini: Use Fewer Learning Rates To Gain More
Jun 25, 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Jun 25, 2024
[QA] Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Jun 25, 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Jun 25, 2024
[QA] Evaluating Numerical Reasoning in Text-to-Image Models
Jun 24, 2024
Evaluating Numerical Reasoning in Text-to-Image Models
Jun 24, 2024
[QA] Advantage Alignment Algorithms
Jun 24, 2024
Advantage Alignment Algorithms
Jun 24, 2024
[QA] Transcendence: Generative Models Can Outperform The Experts That Train Them
Jun 23, 2024
Transcendence: Generative Models Can Outperform The Experts That Train Them
Jun 23, 2024
[QA] Refusal in Language Models Is Mediated by a Single Direction
Jun 23, 2024
Refusal in Language Models Is Mediated by a Single Direction
Jun 23, 2024
[QA] Instruction Pre-Training: Language Models are Supervised Multitask Learners
Jun 22, 2024
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Jun 22, 2024
[QA] Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Jun 22, 2024
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Jun 22, 2024
[QA] RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
Jun 21, 2024
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
Jun 21, 2024
[QA] Consistency Models Made Easy
Jun 21, 2024
Consistency Models Made Easy
Jun 21, 2024
[QA] What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
Jun 19, 2024
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
Jun 19, 2024
[QA] Adversarial Attacks on Multimodal Agents
Jun 19, 2024
Adversarial Attacks on Multimodal Agents
Jun 19, 2024
[QA] Can Go AIs be adversarially robust?
Jun 19, 2024
Can Go AIs be adversarially robust?
Jun 19, 2024
[QA] Autoregressive Image Generation without Vector Quantization
Jun 18, 2024
Autoregressive Image Generation without Vector Quantization
Jun 18, 2024
[QA] Measuring memorization in RLHF for code completion
Jun 18, 2024
Measuring memorization in RLHF for code completion
Jun 18, 2024
[QA] Bootstrapping Language Models with DPO Implicit Rewards
Jun 17, 2024
Bootstrapping Language Models with DPO Implicit Rewards
Jun 17, 2024
[QA] Ad Auctions for LLMs via Retrieval Augmented Generation
Jun 17, 2024
Ad Auctions for LLMs via Retrieval Augmented Generation
Jun 17, 2024
[QA] An Empirical Study of Mamba-based Language Models
Jun 16, 2024
An Empirical Study of Mamba-based Language Models
Jun 16, 2024
[QA] Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Jun 16, 2024
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Jun 16, 2024
[QA] What If We Recaption Billions of Web Images with LLaMA-3?
Jun 15, 2024
What If We Recaption Billions of Web Images with LLaMA-3?
Jun 15, 2024
[QA] SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Jun 15, 2024
SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Jun 15, 2024
[QA] Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
Jun 14, 2024
Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
Jun 14, 2024
[QA] An Image is Worth More Than 1616 Patches: Exploring Transformers on Individual Pixels
Jun 14, 2024
An Image is Worth More Than 1616 Patches: Exploring Transformers on Individual Pixels
Jun 14, 2024
[QA] Large Language Models Must Be Taught to Know What They Don't Know
Jun 13, 2024
Large Language Models Must Be Taught to Know What They Don't Know
Jun 13, 2024
[QA] State Soup: In-Context Skill Learning, Retrieval and Mixing
Jun 13, 2024
State Soup: In-Context Skill Learning, Retrieval and Mixing
Jun 13, 2024
[QA] Estimating the Hallucination Rate of Generative AI
Jun 12, 2024
Estimating the Hallucination Rate of Generative AI
Jun 12, 2024
[QA] Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
Jun 12, 2024
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
Jun 12, 2024
Attention as a Hypernetwork
Jun 11, 2024
[QA] Distributional Preference Alignment of LLMs via Optimal Transport
Jun 11, 2024
Distributional Preference Alignment of LLMs via Optimal Transport
Jun 11, 2024
[QA] How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
Jun 11, 2024
How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
Jun 11, 2024
[QA] Mixture-of-Agents Enhances Large Language Model Capabilities
Jun 10, 2024
Mixture-of-Agents Enhances Large Language Model Capabilities
Jun 10, 2024
[QA] Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Jun 10, 2024
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Jun 10, 2024
[QA] Improving Alignment and Robustness with Short Circuiting
Jun 09, 2024
Improving Alignment and Robustness with Short Circuiting
Jun 09, 2024
[QA] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Jun 09, 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Jun 09, 2024
[QA] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Jun 08, 2024
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Jun 08, 2024
[QA]   Block Transformer: Global-to-Local Language Modeling for Fast Inference
Jun 08, 2024
  Block Transformer: Global-to-Local Language Modeling for Fast Inference
Jun 08, 2024
[CHAT] The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
Jun 07, 2024
[QA] The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
Jun 07, 2024
The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
Jun 07, 2024
[CHAT] Verbalized Machine Learning: Revisiting Machine Learning with Language Models
Jun 07, 2024
[QA] Verbalized Machine Learning: Revisiting Machine Learning with Language Models
Jun 07, 2024
Verbalized Machine Learning: Revisiting Machine Learning with Language Models
Jun 07, 2024
[QA] How Truncating Weights Improves Reasoning in Language Models
Jun 06, 2024
How Truncating Weights Improves Reasoning in Language Models
Jun 06, 2024
[QA] Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need
Jun 06, 2024
Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need
Jun 06, 2024
[QA] Does your data spark joy? Performance gains from domain upsampling at the end of training
Jun 06, 2024
Does your data spark joy? Performance gains from domain upsampling at the end of training
Jun 06, 2024
[QA] Guiding a Diffusion Model with a Bad Version of Itself
Jun 05, 2024
Guiding a Diffusion Model with a Bad Version of Itself
Jun 05, 2024
[QA] To Believe or Not to Believe Your LLM
Jun 05, 2024
To Believe or Not to Believe Your LLM
Jun 05, 2024
[QA] Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Jun 05, 2024
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Jun 05, 2024
[QA] Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Jun 04, 2024
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Jun 04, 2024
[QA] What makes unlearning hard and what to do about it
Jun 04, 2024
What makes unlearning hard and what to do about it
Jun 04, 2024
[QA] Learning to Play Atari in a World of Tokens
Jun 04, 2024
Learning to Play Atari in a World of Tokens
Jun 04, 2024
[QA] Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Jun 03, 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Jun 03, 2024
[QA] Contextual Position Encoding: Learning to Count What's Important
Jun 02, 2024
Contextual Position Encoding: Learning to Count What's Important
Jun 02, 2024
[QA] JINA CLIP: Your CLIP Model Is Also Your Text Retriever
Jun 02, 2024
JINA CLIP: Your CLIP Model Is Also Your Text Retriever
Jun 02, 2024
[QA] Large Language Models Can Self-Improve At Web Agent Tasks
Jun 01, 2024
Large Language Models Can Self-Improve At Web Agent Tasks
Jun 01, 2024
[QA] Is In-Context Learning Sufficient for Instruction Following in LLMs?
Jun 01, 2024
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Jun 01, 2024
[QA] Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities
May 31, 2024
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities
May 31, 2024
[QA] COSY: Evaluating Textual Explanations of Neurons
May 31, 2024
COSY: Evaluating Textual Explanations of Neurons
May 31, 2024
[QA] Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
May 30, 2024
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
May 30, 2024
[QA] Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
May 30, 2024
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
May 30, 2024
[QA] Phased Consistency Model
May 29, 2024
Phased Consistency Model
May 29, 2024
[QA] Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
May 29, 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
May 29, 2024
[QA] On the Origin of Llamas: Model Tree Heritage Recovery
May 29, 2024
On the Origin of Llamas: Model Tree Heritage Recovery
May 29, 2024
[QA] Transformers Can Do Arithmetic with the Right Embeddings
May 28, 2024
Transformers Can Do Arithmetic with the Right Embeddings
May 28, 2024
[QA] EM Distillation for One-step Diffusion Models
May 28, 2024
EM Distillation for One-step Diffusion Models
May 28, 2024
[QA] Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
May 27, 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
May 27, 2024
[QA] Are Long-LLMs A Necessity For Long-Context Tasks?
May 27, 2024
Are Long-LLMs A Necessity For Long-Context Tasks?
May 27, 2024
[QA] AGILE: A Novel Framework of LLM Agents
May 26, 2024
AGILE: A Novel Framework of LLM Agents
May 26, 2024
[QA] Thermodynamic Natural Gradient Descent
May 26, 2024
Thermodynamic Natural Gradient Descent
May 26, 2024
[QA] DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
May 26, 2024
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
May 26, 2024
[QA] Lessons from the Trenches on Reproducible Evaluation of Language Models
May 25, 2024
Lessons from the Trenches on Reproducible Evaluation of Language Models
May 25, 2024
[QA] Not All Language Model Features Are Linear
May 24, 2024
Not All Language Model Features Are Linear
May 24, 2024
[QA] Implicit In-context Learning
May 24, 2024
Implicit In-context Learning
May 24, 2024
[QA] (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
May 23, 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
May 23, 2024
[QA] MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
May 23, 2024
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
May 23, 2024
[QA] Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
May 22, 2024
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
May 22, 2024
[QA] Your Transformer is Secretly Linear
May 22, 2024
Your Transformer is Secretly Linear
May 22, 2024
[QA] Training Data Attribution via Approximate Unrolled Differentation
May 21, 2024
Training Data Attribution via Approximate Unrolled Differentation
May 21, 2024
[QA] Information Leakage from Embedding in Large Language Models
May 21, 2024
Information Leakage from Embedding in Large Language Models
May 21, 2024
[QA] Layer-Condensed KV Cache for Efficient Inference of Large Language Models
May 20, 2024
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
May 20, 2024
[QA] Observational Scaling Laws and the Predictability of Language Model Performance
May 20, 2024
Observational Scaling Laws and the Predictability of Language Model Performance
May 20, 2024
[QA] Zero-Shot Tokenizer Transfer
May 18, 2024
Zero-Shot Tokenizer Transfer
May 18, 2024
[QA] Many-Shot In-Context Learning in Multimodal Foundation Models
May 18, 2024
Many-Shot In-Context Learning in Multimodal Foundation Models
May 18, 2024
[QA] Chameleon: Mixed-Modal Early-Fusion Foundation Models
May 17, 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
May 17, 2024
[QA] LoRA Learns Less and Forgets Less
May 17, 2024
LoRA Learns Less and Forgets Less
May 17, 2024
[QA] The Platonic Representation Hypothesis
May 16, 2024
The Platonic Representation Hypothesis
May 16, 2024
[QA] Improving Transformers using Faithful Positional Encoding
May 16, 2024
Improving Transformers using Faithful Positional Encoding
May 16, 2024
[QA] Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
May 15, 2024
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
May 15, 2024
[QA] Energy-based Hopfield Boosting for Out-of-Distribution Detection
May 15, 2024
Energy-based Hopfield Boosting for Out-of-Distribution Detection
May 15, 2024
[QA] RLHF Workflow: From Reward Modeling to Online RLHF
May 14, 2024
RLHF Workflow: From Reward Modeling to Online RLHF
May 14, 2024
[QA] SUTRA: Scalable Multilingual Language Model Architecture
May 14, 2024
SUTRA: Scalable Multilingual Language Model Architecture
May 14, 2024
[QA] Memory Mosaics
May 13, 2024
Memory Mosaics
May 13, 2024
[QA] Linearizing Large Language Models
May 13, 2024
Linearizing Large Language Models
May 13, 2024
[QA] From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control
May 12, 2024
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control
May 12, 2024
[QA] Distilling Diffusion Models into Conditional GANs
May 12, 2024
Distilling Diffusion Models into Conditional GANs
May 12, 2024
[QA] AlphaMath Almost Zero: process Supervision without process
May 11, 2024
AlphaMath Almost Zero: process Supervision without process
May 11, 2024
[QA] Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models
May 11, 2024
Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models
May 11, 2024
[QA] Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
May 10, 2024
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
May 10, 2024
[QA] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
May 10, 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
May 10, 2024
[QA] Towards a Theoretical Understanding of the `Reversal Curse' via Training Dynamics
May 09, 2024
Towards a Theoretical Understanding of the `Reversal Curse' via Training Dynamics
May 09, 2024
[QA] Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
May 09, 2024
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
May 09, 2024
[QA] Custom Gradient Estimators are Straight-Through Estimators in Disguise
May 09, 2024
Custom Gradient Estimators are Straight-Through Estimators in Disguise
May 09, 2024
[QA] The Curse of Diversity in Ensemble-Based Exploration
May 08, 2024
The Curse of Diversity in Ensemble-Based Exploration
May 08, 2024
[QA] ImageInWords: Unlocking Hyper-Detailed Image Descriptions
May 07, 2024
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
May 07, 2024
[QA] Why is SAM Robust to Label Noise?
May 07, 2024
Why is SAM Robust to Label Noise?
May 07, 2024
[QA] Is Flash Attention Stable?
May 07, 2024
Is Flash Attention Stable?
May 07, 2024
[QA] Understanding LLMs Requires More Than Statistical Generalization
May 06, 2024
Understanding LLMs Requires More Than Statistical Generalization
May 06, 2024
[QA] Mitigating LLM Hallucinations via Conformal Abstention
May 06, 2024
Mitigating LLM Hallucinations via Conformal Abstention
May 06, 2024
[QA] Structural Pruning of Pre-trained Language Models via Neural Architecture Search
May 06, 2024
Structural Pruning of Pre-trained Language Models via Neural Architecture Search
May 06, 2024
[QA] Capabilities of Gemini Models in Medicine
May 05, 2024
Capabilities of Gemini Models in Medicine
May 05, 2024
[QA] StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
May 04, 2024
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
May 04, 2024
[QA] In-Context Learning with Long-Context Models: An In-Depth Exploration
May 04, 2024
In-Context Learning with Long-Context Models: An In-Depth Exploration
May 04, 2024
[QA] WILDCHAT: 1M ChatGPT Interaction Logs in the Wild
May 03, 2024
WILDCHAT: 1M ChatGPT Interaction Logs in the Wild
May 03, 2024
[QA] NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
May 03, 2024
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
May 03, 2024
[QA]  PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models
May 03, 2024
 PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models
May 03, 2024
[QA] A Careful Examination of Large Language Model Performance on Grade School Arithmetic
May 02, 2024
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
May 02, 2024
[QA] Self-Play Preference Optimization for Language Model Alignment
May 02, 2024
Self-Play Preference Optimization for Language Model Alignment
May 02, 2024
[QA] Is Bigger Edit Batch Size Always Better? - An Empirical Study on Model Editing with Llama-3
May 02, 2024
Is Bigger Edit Batch Size Always Better? - An Empirical Study on Model Editing with Llama-3
May 02, 2024
[QA] Iterative Reasoning Preference Optimization
May 01, 2024
Iterative Reasoning Preference Optimization
May 01, 2024
[QA] Harmonic LLMs are Trustworthy
May 01, 2024
Harmonic LLMs are Trustworthy
May 01, 2024
[QA] Better & Faster Large Language Models via Multi-token Prediction
May 01, 2024
Better & Faster Large Language Models via Multi-token Prediction
May 01, 2024
[QA] Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Apr 30, 2024
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Apr 30, 2024
[QA] Stylus: Automatic Adapter Selection for Diffusion Models
Apr 30, 2024
Stylus: Automatic Adapter Selection for Diffusion Models
Apr 30, 2024
[QA] DPO Meets PPO: Reinforced Token Optimization for RLHF
Apr 30, 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Apr 30, 2024
[QA] Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Apr 30, 2024
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Apr 30, 2024
[QA] AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Apr 29, 2024
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Apr 29, 2024
[QA] Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Apr 29, 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Apr 29, 2024
[QA] Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models
Apr 28, 2024
Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models
Apr 28, 2024
[QA] Retrieval Head Mechanistically Explains Long-Context Factuality
Apr 28, 2024
Retrieval Head Mechanistically Explains Long-Context Factuality
Apr 28, 2024
[QA] AUTOCRAWLER : A Progressive Understanding Web Agent for Web Crawler Generation
Apr 28, 2024
AUTOCRAWLER : A Progressive Understanding Web Agent for Web Crawler Generation
Apr 28, 2024
[QA] Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Apr 27, 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Apr 27, 2024
[QA] LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Apr 27, 2024
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Apr 27, 2024
[QA] Make Your LLM Fully Utilize the Context
Apr 27, 2024
Make Your LLM Fully Utilize the Context
Apr 27, 2024
[QA] Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically
Apr 26, 2024
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically
Apr 26, 2024
[QA] AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models
Apr 26, 2024
AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models
Apr 26, 2024
[QA] Weak-to-Strong Extrapolation Expedites Alignment
Apr 26, 2024
Weak-to-Strong Extrapolation Expedites Alignment
Apr 26, 2024
[QA] Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts
Apr 25, 2024
Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts
Apr 25, 2024
[QA] Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
Apr 25, 2024
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
Apr 25, 2024
[QA] OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Apr 24, 2024
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Apr 24, 2024
[QA] Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners
Apr 24, 2024
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners
Apr 24, 2024
[QA] SnapKV: LLM Knows What You are Looking for Before Generation
Apr 24, 2024
SnapKV: LLM Knows What You are Looking for Before Generation
Apr 24, 2024
[QA] Multi-Head Mixture-of-Experts
Apr 24, 2024
Multi-Head Mixture-of-Experts
Apr 24, 2024
[QA] The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Apr 23, 2024
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Apr 23, 2024
[QA] Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Apr 23, 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Apr 23, 2024
[QA] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Apr 23, 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Apr 23, 2024
[QA] Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction
Apr 22, 2024
Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction
Apr 22, 2024
[QA] HalluciBot: Is There No Such Thing as a Bad Question?
Apr 22, 2024
HalluciBot: Is There No Such Thing as a Bad Question?
Apr 22, 2024
[QA] Stronger Random Baselines for In-Context Learning
Apr 22, 2024
Stronger Random Baselines for In-Context Learning
Apr 22, 2024
[QA] Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Apr 21, 2024
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Apr 21, 2024
[QA] The Illusion of State in State-Space Models
Apr 21, 2024
The Illusion of State in State-Space Models
Apr 21, 2024
[QA] Chinchilla Scaling: A replication attempt
Apr 21, 2024
Chinchilla Scaling: A replication attempt
Apr 21, 2024
[QA] From R to Q: Your Language Model is Secretly a Q-Function
Apr 20, 2024
From R to Q: Your Language Model is Secretly a Q-Function
Apr 20, 2024
[QA] Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Apr 20, 2024
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Apr 20, 2024
[QA] Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
Apr 20, 2024
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
Apr 20, 2024
[QA] TRIFORCE: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Apr 19, 2024
TRIFORCE: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Apr 19, 2024
[QA] Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Apr 19, 2024
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Apr 19, 2024
[QA] BLINK: Multimodal Large Language Models Can See but Not Perceive
Apr 19, 2024
BLINK: Multimodal Large Language Models Can See but Not Perceive
Apr 19, 2024
[QA] Fewer Truncations Improve Language Modeling
Apr 18, 2024
Fewer Truncations Improve Language Modeling
Apr 18, 2024
[QA] Many-Shot In-Context Learning
Apr 18, 2024
Many-Shot In-Context Learning
Apr 18, 2024
[QA] Can Language Models Solve Olympiad Programming?
Apr 18, 2024
Can Language Models Solve Olympiad Programming?
Apr 18, 2024
[QA] Social Choice for Al Alignment: Dealing with Diverse Human Feedback
Apr 17, 2024
[short] Social Choice for Al Alignment: Dealing with Diverse Human Feedback
Apr 17, 2024
Social Choice for Al Alignment: Dealing with Diverse Human Feedback
Apr 17, 2024
[QA] Self-playing Adversarial Language Game Enhances LLM Reasoning
Apr 17, 2024
[short] Self-playing Adversarial Language Game Enhances LLM Reasoning
Apr 17, 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning
Apr 17, 2024
[QA] Transformer FAM: Feedback attention is working memory
Apr 16, 2024
Transformer FAM: Feedback attention is working memory
Apr 16, 2024
[QA] Learn Your Reference Model for Real Good Alignment
Apr 16, 2024
Learn Your Reference Model for Real Good Alignment
Apr 16, 2024
[QA] Compression Represents Intelligence Linearly
Apr 16, 2024
Compression Represents Intelligence Linearly
Apr 16, 2024
[QA] Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Apr 16, 2024
[short] Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Apr 16, 2024
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Apr 16, 2024
[QA] Reducing hallucination in structured outputs via Retrieval-Augmented Generation
Apr 16, 2024
[short] Reducing hallucination in structured outputs via Retrieval-Augmented Generation
Apr 16, 2024
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
Apr 16, 2024
[QA] Pre-training Small Base LMs with Fewer Tokens
Apr 15, 2024
Pre-training Small Base LMs with Fewer Tokens
Apr 15, 2024
[QA] Probing the 3D Awareness of Visual Foundation Models
Apr 15, 2024
Probing the 3D Awareness of Visual Foundation Models
Apr 15, 2024
[QA] Is ChatGPT Transforming Academics' Writing Style?
Apr 15, 2024
[short] Is ChatGPT Transforming Academics' Writing Style?
Apr 15, 2024
Is ChatGPT Transforming Academics' Writing Style?
Apr 15, 2024
[QA] JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Apr 14, 2024
[short] JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Apr 14, 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Apr 14, 2024
[QA] Explaining Explainability: Understanding Concept Activation Vectors
Apr 14, 2024
[short] Explaining Explainability: Understanding Concept Activation Vectors
Apr 14, 2024
Explaining Explainability: Understanding Concept Activation Vectors
Apr 14, 2024
[QA] Manipulating Large Language Models to Increase Product Visibility
Apr 14, 2024
[short] Manipulating Large Language Models to Increase Product Visibility
Apr 14, 2024
Manipulating Large Language Models to Increase Product Visibility
Apr 14, 2024
[QA] Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Apr 13, 2024
[short] Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Apr 13, 2024
Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Apr 13, 2024
[QA] Adapting LLaMA Decoder to Vision Transformer
Apr 13, 2024
[short] Adapting LLaMA Decoder to Vision Transformer
Apr 13, 2024
Adapting LLaMA Decoder to Vision Transformer
Apr 13, 2024
[QA] RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Apr 13, 2024
[short] RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Apr 13, 2024
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Apr 13, 2024
[QA] Best Practices and Lessons Learned on Synthetic Data for Language Models
Apr 12, 2024
[QA] ControlNet: Improving Conditional Controls with Efficient Consistency Feedback
Apr 12, 2024
[short] ControlNet: Improving Conditional Controls with Efficient Consistency Feedback
Apr 12, 2024
ControlNet: Improving Conditional Controls with Efficient Consistency Feedback
Apr 12, 2024
[QA] RHO-1: Not All Tokens Are What You Need
Apr 12, 2024
[short] RHO-1: Not All Tokens Are What You Need
Apr 12, 2024
RHO-1: Not All Tokens Are What You Need
Apr 12, 2024
[QA] LLoCO: Learning Long Contexts Offline
Apr 12, 2024
[short] LLoCO: Learning Long Contexts Offline
Apr 12, 2024
LLoCO: Learning Long Contexts Offline
Apr 12, 2024
[QA] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
Apr 11, 2024
[short] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
Apr 11, 2024
Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
Apr 11, 2024
[QA] Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Apr 11, 2024
[short] Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Apr 11, 2024
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Apr 11, 2024
[QA] LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Apr 10, 2024
[short] LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Apr 10, 2024
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Apr 10, 2024
[QA] Simultaneous linear connectivity of neural networks modulo permutation
Apr 10, 2024
[short] Simultaneous linear connectivity of neural networks modulo permutation
Apr 10, 2024
Simultaneous linear connectivity of neural networks modulo permutation
Apr 10, 2024
[QA] Does Transformer Interpretability Transfer to RNNs?
Apr 10, 2024
[short] Does Transformer Interpretability Transfer to RNNs?
Apr 10, 2024
Does Transformer Interpretability Transfer to RNNs?
Apr 10, 2024
[QA] Finding Visual Task Vectors
Apr 09, 2024
[short] Finding Visual Task Vectors
Apr 09, 2024
Finding Visual Task Vectors
Apr 09, 2024
[QA] PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Apr 09, 2024
[short] PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Apr 09, 2024
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Apr 09, 2024
[QA] Stream of Search (SoS): Learning to Search in Language
Apr 08, 2024
[short] Stream of Search (SoS): Learning to Search in Language
Apr 08, 2024
Stream of Search (SoS): Learning to Search in Language
Apr 08, 2024
[QA] No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Apr 08, 2024
[short] No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Apr 08, 2024
No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Apr 08, 2024
[QA] AIOS: LLM Agent Operating System
Apr 07, 2024
[short] AIOS: LLM Agent Operating System
Apr 07, 2024
AIOS: LLM Agent Operating System
Apr 07, 2024
[QA] Evaluating LLMs at Detecting Errors in LLM Responses
Apr 07, 2024
[short] Evaluating LLMs at Detecting Errors in LLM Responses
Apr 07, 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Apr 07, 2024
[QA] AI and the Problem of Knowledge Collapse
Apr 06, 2024
[short] AI and the Problem of Knowledge Collapse
Apr 06, 2024
AI and the Problem of Knowledge Collapse
Apr 06, 2024
[QA] Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Apr 06, 2024
[short] Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Apr 06, 2024
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Apr 06, 2024
[QA] Do Language Models Plan for Future Tokens?
Apr 06, 2024
[short] Do Language Models Plan for Future Tokens?
Apr 06, 2024
Do Language Models Plan for Future Tokens?
Apr 06, 2024
[QA] AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Apr 05, 2024
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Apr 05, 2024
[QA] LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Apr 05, 2024
[short] LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Apr 05, 2024
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Apr 05, 2024
[QA] ReFT: Representation Finetuning for Language Models
Apr 05, 2024
[short] ReFT: Representation Finetuning for Language Models
Apr 05, 2024
ReFT: Representation Finetuning for Language Models
Apr 05, 2024
[QA] Linear Attention Sequence Parallelism
Apr 04, 2024
[short] Linear Attention Sequence Parallelism
Apr 04, 2024
Linear Attention Sequence Parallelism
Apr 04, 2024
[QA] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Apr 04, 2024
[short] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Apr 04, 2024
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Apr 04, 2024
[QA] ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Apr 04, 2024
[short] ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Apr 04, 2024
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Apr 04, 2024
[QA] Advancing LLM Reasoning Generalists with Preference Trees
Apr 03, 2024
[short] Advancing LLM Reasoning Generalists with Preference Trees
Apr 03, 2024
Advancing LLM Reasoning Generalists with Preference Trees
Apr 03, 2024
[QA] Long-context LLMs Struggle with Long In-context Learning
Apr 03, 2024
[short] Long-context LLMs Struggle with Long In-context Learning
Apr 03, 2024
Long-context LLMs Struggle with Long In-context Learning
Apr 03, 2024
[QA] Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Apr 03, 2024
[short] Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Apr 03, 2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Apr 03, 2024
[QA] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution
Apr 02, 2024
[short] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution
Apr 02, 2024
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution
Apr 02, 2024
[QA] Localizing Paragraph Memorization in Language Models
Apr 02, 2024
[short] Localizing Paragraph Memorization in Language Models
Apr 02, 2024
Localizing Paragraph Memorization in Language Models
Apr 02, 2024
[QA] ReALM: Reference Resolution As Language Modeling
Apr 01, 2024
[short] ReALM: Reference Resolution As Language Modeling
Apr 01, 2024
ReALM: Reference Resolution As Language Modeling
Apr 01, 2024
[QA] Jamba: A Hybrid Transformer-Mamba Language Model
Apr 01, 2024
[short] Jamba: A Hybrid Transformer-Mamba Language Model
Apr 01, 2024
Jamba: A Hybrid Transformer-Mamba Language Model
Apr 01, 2024
[QA] Gecko: Versatile Text Embeddings Distilled from Large Language Models
Apr 01, 2024
[short] Gecko: Versatile Text Embeddings Distilled from Large Language Models
Apr 01, 2024
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Apr 01, 2024
[QA] LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Apr 01, 2024
[short] LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Apr 01, 2024
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Apr 01, 2024
[short] LLAMAFACTORY: Unified Efficient Fine-Tuning of 100+ Language Models
Mar 31, 2024
LLAMAFACTORY: Unified Efficient Fine-Tuning of 100+ Language Models
Mar 31, 2024
[short] MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Mar 31, 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Mar 31, 2024
[short] A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
Mar 30, 2024
A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
Mar 30, 2024
[short] ViTAR: Vision Transformer with Any Resolution
Mar 30, 2024
ViTAR: Vision Transformer with Any Resolution
Mar 30, 2024
sDPO: Don't Use Your Data All at Once
Mar 30, 2024
[short] LITA: Language Instructed Temporal-Localization Assistant
Mar 29, 2024
LITA: Language Instructed Temporal-Localization Assistant
Mar 29, 2024
[short] Model Stock: All we need is just a few fine-tuned models
Mar 29, 2024
Model Stock: All we need is just a few fine-tuned models
Mar 29, 2024
[short] BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Mar 28, 2024
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Mar 28, 2024
[short] Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Mar 28, 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Mar 28, 2024
[short] Long-form factuality in large language models
Mar 28, 2024
Long-form factuality in large language models
Mar 28, 2024
[short] Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs
Mar 27, 2024
Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs
Mar 27, 2024
[short] The Unreasonable Ineffectiveness of the Deeper Layers
Mar 27, 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Mar 27, 2024
[short] Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Mar 26, 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Mar 26, 2024
[short] LLM Agent Operating System
Mar 26, 2024
LLM Agent Operating System
Mar 26, 2024
[short] FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Mar 25, 2024
FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Mar 25, 2024
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Mar 25, 2024
[short] Can large language models explore in-context?
Mar 25, 2024
Can large language models explore in-context?
Mar 25, 2024
[short] Detoxifying Large Language Models via Knowledge Editing
Mar 24, 2024
Detoxifying Large Language Models via Knowledge Editing
Mar 24, 2024
[short] Evolutionary Optimization of Model Merging Recipes
Mar 23, 2024
Evolutionary Optimization of Model Merging Recipes
Mar 23, 2024
[short] Recourse for Reclamation: Chatting with Generative Language Models
Mar 23, 2024
Recourse for Reclamation: Chatting with Generative Language Models
Mar 23, 2024
[short] A Unified Framework for Model Editing
Mar 22, 2024
A Unified Framework for Model Editing
Mar 22, 2024
[short] Language Models Can Reduce Asymmetry in Information Markets
Mar 22, 2024
Language Models Can Reduce Asymmetry in Information Markets
Mar 22, 2024
[short] AI and Memory Wall
Mar 22, 2024
AI and Memory Wall
Mar 22, 2024
RewardBench: Evaluating Reward Models for Language Modeling
Mar 21, 2024
Evaluating Frontier Models for Dangerous Capabilities
Mar 21, 2024
Reverse Training to Nurse the Reversal Curse
Mar 21, 2024
[short] Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Mar 20, 2024
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Mar 20, 2024
[short] Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Mar 20, 2024
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Mar 20, 2024
[short] Yell At Your Robot Improving On-the-Fly from Language Corrections
Mar 20, 2024
Yell At Your Robot Improving On-the-Fly from Language Corrections
Mar 20, 2024
[short] Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Mar 19, 2024
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Mar 19, 2024
[short] Larimar: Large Language Models with Episodic Memory Control
Mar 19, 2024
Larimar: Large Language Models with Episodic Memory Control
Mar 19, 2024
[short] Mind Eye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
Mar 19, 2024
Mind Eye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
Mar 19, 2024
[short] Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Mar 18, 2024
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Mar 18, 2024
[short] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Mar 18, 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Mar 18, 2024
Monitoring AI-Modified Content at Scale
Mar 18, 2024
AutoDev: Automated AI-Driven Development
Mar 17, 2024
Is Cosine-Similarity of Embeddings Really About Similarity?
Mar 17, 2024
[short] AutoEval Done Right: Using Synthetic Data for Model Evaluation
Mar 17, 2024
AutoEval Done Right: Using Synthetic Data for Model Evaluation
Mar 17, 2024
Language Model Inversion
Mar 16, 2024
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Mar 16, 2024
[short] Logits of API-Protected LLMs Leak Proprietary Information
Mar 16, 2024
Logits of API-Protected LLMs Leak Proprietary Information
Mar 16, 2024
[short] Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
Mar 15, 2024
Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
Mar 15, 2024
[short] MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Mar 15, 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Mar 15, 2024
Gemma: Open Models Based on Gemini Research and Technology
Mar 14, 2024
[short] Language models scale reliably with over-training and on downstream tasks
Mar 14, 2024
Language models scale reliably with over-training and on downstream tasks
Mar 14, 2024
[short] Human Alignment of Large Language Models through Online Preference Optimisation
Mar 14, 2024
Human Alignment of Large Language Models through Online Preference Optimisation
Mar 14, 2024
[short] Synth: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Mar 13, 2024
Synth: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Mar 13, 2024
[short] Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Mar 13, 2024
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Mar 13, 2024
Multistep Consistency Models
Mar 12, 2024
[short] VideoMamba: State Space Model for Efficient Video Understanding
Mar 12, 2024
VideoMamba: State Space Model for Efficient Video Understanding
Mar 12, 2024
[short] Stealing Part of a Production Language Model
Mar 12, 2024
Stealing Part of a Production Language Model
Mar 12, 2024
[short] Stacking as Accelerated Gradient Descent
Mar 11, 2024
Stacking as Accelerated Gradient Descent
Mar 11, 2024
[short] ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Mar 11, 2024
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Mar 11, 2024
[short] Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Mar 10, 2024
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Mar 10, 2024
[short] GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Mar 09, 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Mar 09, 2024
[short] Yi: Open Foundation Models by 01.AI
Mar 08, 2024
Yi: Open Foundation Models by 01.AI
Mar 08, 2024
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Mar 08, 2024
[short] Teaching Large Language Models to Reason with Reinforcement Learning
Mar 08, 2024
Teaching Large Language Models to Reason with Reinforcement Learning
Mar 08, 2024
[short] Backtracing: Retrieving the Cause of the Query
Mar 07, 2024
Backtracing: Retrieving the Cause of the Query
Mar 07, 2024
[short] Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Mar 07, 2024
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Mar 07, 2024
[short] Design2Code: How Far Are We From Automating Front-End Engineering?
Mar 06, 2024
Design2Code: How Far Are We From Automating Front-End Engineering?
Mar 06, 2024
[short] Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Mar 06, 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Mar 06, 2024
[short] Learning and Leveraging World Models in Visual Representation Learning
Mar 04, 2024
Learning and Leveraging World Models in Visual Representation Learning
Mar 04, 2024
[short] Simple linear attention language models balance the recall-throughput tradeoff
Mar 03, 2024
Simple linear attention language models balance the recall-throughput tradeoff
Mar 03, 2024
[short] Beyond Language Models: Byte Models are Digital World Simulators
Mar 03, 2024
Beyond Language Models: Byte Models are Digital World Simulators
Mar 03, 2024
[short] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Mar 02, 2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Mar 02, 2024
[short] Do Transformer World Models Give Better Policy Gradients?
Mar 02, 2024
Do Transformer World Models Give Better Policy Gradients?
Mar 02, 2024
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
Mar 01, 2024
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
Mar 01, 2024
[short] Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Feb 29, 2024
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Feb 29, 2024
[short] Approaching Human-Level Forecasting with Language Models
Feb 29, 2024
Approaching Human-Level Forecasting with Language Models
Feb 29, 2024
[short] Massive Activations in Large Language Models
Feb 28, 2024
Massive Activations in Large Language Models
Feb 28, 2024
[short] Video as the New Language for Real-World Decision Making
Feb 28, 2024
Video as the New Language for Real-World Decision Making
Feb 28, 2024
[short] ChatMusician: Understanding and Generating Music Intrinsically with LLM
Feb 27, 2024
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Feb 27, 2024
[short] Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding
Feb 27, 2024
Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding
Feb 27, 2024
[short] Watermarking Makes Language Models Radioactive
Feb 26, 2024
Watermarking Makes Language Models Radioactive
Feb 26, 2024
[short] Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Feb 26, 2024
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Feb 26, 2024
[short] Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Feb 26, 2024
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Feb 26, 2024
[short] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Feb 25, 2024
[short] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Feb 25, 2024
[short] Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Feb 24, 2024
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Feb 24, 2024
[short] OmniPred: Language Models as Universal Regressors
Feb 23, 2024
OmniPred: Language Models as Universal Regressors
Feb 23, 2024
[short] T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Feb 23, 2024
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Feb 23, 2024
[short] Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Feb 23, 2024
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Feb 23, 2024
[short] In deep reinforcement learning, a pruned network is a good network
Feb 22, 2024
In deep reinforcement learning, a pruned network is a good network
Feb 22, 2024
[short] Coercing LLMs to do and reveal (almost) anything
Feb 22, 2024
Coercing LLMs to do and reveal (almost) anything
Feb 22, 2024
[short] Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Feb 21, 2024
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Feb 21, 2024
[short] Neural Network Diffusion
Feb 21, 2024
Neural Network Diffusion
Feb 21, 2024
[short] SEQUOIA: Scalable, Robust, and Hardware-aware Speculative Decoding
Feb 20, 2024
SEQUOIA: Scalable, Robust, and Hardware-aware Speculative Decoding
Feb 20, 2024
[short] Reformatted Alignment
Feb 20, 2024
Reformatted Alignment
Feb 20, 2024
[short] PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter
Feb 19, 2024
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter
Feb 19, 2024
[short] In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Feb 19, 2024
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Feb 19, 2024
[short] DoRA: Weight-Decomposed Low-Rank Adaptation
Feb 18, 2024
DoRA: Weight-Decomposed Low-Rank Adaptation
Feb 18, 2024
[short] A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Feb 17, 2024
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Feb 17, 2024
[short] On Limitations of the Transformer Architecture
Feb 17, 2024
On Limitations of the Transformer Architecture
Feb 17, 2024
[short] Chain-of-Thought Reasoning Without Prompting
Feb 16, 2024
Chain-of-Thought Reasoning Without Prompting
Feb 16, 2024
[short] How to Train Data-Efficient LLMs
Feb 16, 2024
How to Train Data-Efficient LLMs
Feb 16, 2024
[short] Data Engineering for Scaling Language Models to 128K Context
Feb 16, 2024
Data Engineering for Scaling Language Models to 128K Context
Feb 16, 2024
[short] Premise Order Matters in Reasoning with Large Language Models
Feb 15, 2024
Premise Order Matters in Reasoning with Large Language Models
Feb 15, 2024
[short] HiRE: High Recall Approximate Top-k Estimation for Efficient LLM Inference
Feb 15, 2024
HiRE: High Recall Approximate Top-k Estimation for Efficient LLM Inference
Feb 15, 2024
[short] Transformers Can Achieve Length Generalization But Not Robustly
Feb 15, 2024
Transformers Can Achieve Length Generalization But Not Robustly
Feb 15, 2024
Tandem Transformers for Inference Efficient LLMs
Feb 14, 2024
Tandem Transformers for Inference Efficient LLMs
Feb 14, 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Feb 14, 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Feb 14, 2024
[short] A Tale of Tails: Model Collapse as a Change of Scaling Laws
Feb 13, 2024
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Feb 13, 2024
[short] Scaling Laws for Fine-Grained Mixture of Experts
Feb 13, 2024
Scaling Laws for Fine-Grained Mixture of Experts
Feb 13, 2024
[short] Suppressing Pink Elephants with Direct Principle Feedback
Feb 13, 2024
Suppressing Pink Elephants with Direct Principle Feedback
Feb 13, 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
Feb 12, 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
Feb 12, 2024
[short] Feedback Loops With Language Models Drive In-Context Reward Hacking
Feb 12, 2024
Feedback Loops With Language Models Drive In-Context Reward Hacking
Feb 12, 2024
The boundary of neural network trainability is fractal
Feb 12, 2024
[short] More Agents Is All You Need
Feb 11, 2024
More Agents Is All You Need
Feb 11, 2024
[short] Buffer Overflow in Mixture of Experts
Feb 11, 2024
Buffer Overflow in Mixture of Experts
Feb 11, 2024
[short] Grandmaster-Level Chess Without Search
Feb 10, 2024
Grandmaster-Level Chess Without Search
Feb 10, 2024
[short] Driving Everywhere with Large Language Model Policy Adaptation
Feb 10, 2024
Driving Everywhere with Large Language Model Policy Adaptation
Feb 10, 2024
[short] An Interactive Agent Foundation Model
Feb 09, 2024
An Interactive Agent Foundation Model
Feb 09, 2024
[short] Learning to Route Among Specialized Experts for Zero-Shot Generalization
Feb 09, 2024
Learning to Route Among Specialized Experts for Zero-Shot Generalization
Feb 09, 2024
[short] Hydragen: High-Throughput LLM Inference with Shared Prefixes
Feb 08, 2024
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Feb 08, 2024
[short] Direct Language Model Alignment from Online AI Feedback
Feb 08, 2024
Direct Language Model Alignment from Online AI Feedback
Feb 08, 2024
[short] Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Feb 08, 2024
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Feb 08, 2024
[short] SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures
Feb 07, 2024
SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures
Feb 07, 2024
[short] Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Feb 06, 2024
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Feb 06, 2024
[short] Decoding-time Realignment of Language Models
Feb 06, 2024
Decoding-time Realignment of Language Models
Feb 06, 2024
[short] Specialized Language Models with Cheap Inference from Limited Domain Data
Feb 05, 2024
Specialized Language Models with Cheap Inference from Limited Domain Data
Feb 05, 2024
[short] Repeat After Me: Transformers are Better than State Space Models at Copying
Feb 05, 2024
Repeat After Me: Transformers are Better than State Space Models at Copying
Feb 05, 2024
[short] Unlearnable Algorithms for In-context Learning
Feb 04, 2024
Unlearnable Algorithms for In-context Learning
Feb 04, 2024
[short] Can Large Language Models Understand Context?
Feb 03, 2024
Can Large Language Models Understand Context?
Feb 03, 2024
[short] Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI
Feb 03, 2024
Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI
Feb 03, 2024
[short] Transforming and Combining Rewards for Aligning Large Language Models
Feb 02, 2024
Transforming and Combining Rewards for Aligning Large Language Models
Feb 02, 2024
[short] Efficient Exploration for LLMs
Feb 02, 2024
Efficient Exploration for LLMs
Feb 02, 2024
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Feb 02, 2024
[short] Infini-gram: Scaling Unbounded -gram Language Models to a Trillion Tokens
Feb 01, 2024
Infini-gram: Scaling Unbounded -gram Language Models to a Trillion Tokens
Feb 01, 2024
[short] Arrows of Time for Large Language Models
Feb 01, 2024
Arrows of Time for Large Language Models
Feb 01, 2024
[short] WEAVER: Foundation Models for Creative Writing
Jan 31, 2024
WEAVER: Foundation Models for Creative Writing
Jan 31, 2024
[short] H2O-Danube-1.8B Technical Report
Jan 31, 2024
H2O-Danube-1.8B Technical Report
Jan 31, 2024
[short] Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Jan 30, 2024
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Jan 30, 2024
[short] SYNCHFORMER: EFFICIENT SYNCHRONIZATION FROM SPARSE CUES
Jan 30, 2024
SYNCHFORMER: EFFICIENT SYNCHRONIZATION FROM SPARSE CUES
Jan 30, 2024
[short] MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Jan 30, 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Jan 30, 2024
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Jan 29, 2024
[short] From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Jan 29, 2024
[short] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Jan 29, 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Jan 29, 2024
[short] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Jan 28, 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Jan 28, 2024
Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research Visibility
Jan 26, 2024
[short] Rethinking Patch Dependence for Masked Autoencoders
Jan 26, 2024
Rethinking Patch Dependence for Masked Autoencoders
Jan 26, 2024
[short] Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Jan 26, 2024
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Jan 26, 2024
[short] MambaByte: Token-free Selective State Space Model
Jan 25, 2024
MambaByte: Token-free Selective State Space Model
Jan 25, 2024
[short] Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Jan 24, 2024
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Jan 24, 2024
[short] Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Jan 24, 2024
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Jan 24, 2024
[short] WARM: On the Benefits of Weight Averaged Reward Models
Jan 23, 2024
WARM: On the Benefits of Weight Averaged Reward Models
Jan 23, 2024
[short] Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
Jan 23, 2024
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
Jan 23, 2024
Zero Bubble Pipeline Parallelism
Jan 22, 2024
[short] MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Jan 22, 2024
MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Jan 22, 2024
[short] Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
Jan 22, 2024
Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
Jan 22, 2024
Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability
Jan 21, 2024
VMamba: Visual State Space Model
Jan 21, 2024
[short] Training Neural Networks is NP-Hard in Fixed Dimension
Jan 20, 2024
Training Neural Networks is NP-Hard in Fixed Dimension
Jan 20, 2024
[short] Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Jan 20, 2024
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Jan 20, 2024
[short] ChatQA: Building GPT-4 Level Conversational QA Models
Jan 19, 2024
ChatQA: Building GPT-4 Level Conversational QA Models
Jan 19, 2024
[short] Improving fine-grained understanding in image-text pre-training
Jan 19, 2024
Improving fine-grained understanding in image-text pre-training
Jan 19, 2024
[short] Self-Rewarding Language Models
Jan 19, 2024
Self-Rewarding Language Models
Jan 19, 2024
[short] REFT: Reasoning with REinforced Fine-Tuning
Jan 18, 2024
REFT: Reasoning with REinforced Fine-Tuning
Jan 18, 2024
[short] Asynchronous Local-SGD Training for Language Modeling
Jan 18, 2024
Asynchronous Local-SGD Training for Language Modeling
Jan 18, 2024
[short] DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Jan 18, 2024
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Jan 18, 2024
[short] Tuning Language Models by Proxy
Jan 17, 2024
Tuning Language Models by Proxy
Jan 17, 2024
[short] Scalable Pre-training of Large Autoregressive Image Models
Jan 17, 2024
Scalable Pre-training of Large Autoregressive Image Models
Jan 17, 2024
[short] A Closer Look at AUROC and AUPRC under Class Imbalance
Jan 16, 2024
A Closer Look at AUROC and AUPRC under Class Imbalance
Jan 16, 2024
[short] Nash Learning from Human Feedback
Jan 16, 2024
Nash Learning from Human Feedback
Jan 16, 2024
[short] The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
Jan 15, 2024
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
Jan 15, 2024
[short] AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
Jan 15, 2024
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
Jan 15, 2024
PALP: Prompt Aligned Personalization of Text-to-Image Models
Jan 14, 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Jan 14, 2024
[short] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Jan 12, 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Jan 12, 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Jan 12, 2024
[short] Towards Conversational Diagnostic AI
Jan 12, 2024
Towards Conversational Diagnostic AI
Jan 12, 2024
[short] Transformers are Multi-State RNNs
Jan 12, 2024
Transformers are Multi-State RNNs
Jan 12, 2024
[short] PIXART: Fast and Controllable Image Generation with Latent Consistency Models
Jan 11, 2024
PIXART: Fast and Controllable Image Generation with Latent Consistency Models
Jan 11, 2024
[short] The Impact of Reasoning Step Length on Large Language Models
Jan 11, 2024
The Impact of Reasoning Step Length on Large Language Models
Jan 11, 2024
[short] Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Jan 10, 2024
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Jan 10, 2024
[short] Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Jan 09, 2024
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Jan 09, 2024
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Jan 09, 2024
Mixtral of Experts
Jan 09, 2024
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Jan 09, 2024
[short] Progressive Knowledge Distillation of Stable Diffusion XL using Layer Level Loss
Jan 08, 2024
Progressive Knowledge Distillation of Stable Diffusion XL using Layer Level Loss
Jan 08, 2024
[short] Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Jan 08, 2024
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Jan 08, 2024
[short] Can AI Be as Creative as Humans?
Jan 06, 2024
Can AI Be as Creative as Humans?
Jan 06, 2024
[short] Improving Text Embeddings with Large Language Models
Jan 06, 2024
Improving Text Embeddings with Large Language Models
Jan 06, 2024
[short] LLAMA PRO: Progressive LLaMA with Block Expansion
Jan 06, 2024
LLAMA PRO: Progressive LLaMA with Block Expansion
Jan 06, 2024
[short] Instruct-Imagen: Image Generation with Multi-modal Instruction
Jan 05, 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
Jan 05, 2024
[short] LLM Augmented LLMs: Expanding Capabilities through Composition
Jan 05, 2024
LLM Augmented LLMs: Expanding Capabilities through Composition
Jan 05, 2024
[short] TinyLlama: An Open-Source Small Language Model
Jan 05, 2024
TinyLlama: An Open-Source Small Language Model
Jan 05, 2024
[short] GPT-4V(ision) is a Generalist Web Agent, if Grounded
Jan 04, 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Jan 04, 2024
[short] aMUSEd: An open MUSE reproduction
Jan 04, 2024
aMUSEd: An open MUSE reproduction
Jan 04, 2024
[short] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Jan 03, 2024
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Jan 03, 2024
[short] Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Jan 02, 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Jan 02, 2024
[short] MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Jan 02, 2024
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Jan 02, 2024
[short] Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
Jan 01, 2024
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
Jan 01, 2024
[short] Fast Inference of Mixture-of-Experts Language Models with Offloading
Dec 30, 2023
Fast Inference of Mixture-of-Experts Language Models with Offloading
Dec 30, 2023
[short] MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Dec 29, 2023
MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Dec 29, 2023
[short] From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Dec 28, 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Dec 28, 2023
[short] Supervised Knowledge Makes Large Language Models Better In-context Learners
Dec 27, 2023
Supervised Knowledge Makes Large Language Models Better In-context Learners
Dec 27, 2023
[short] Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Dec 26, 2023
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Dec 26, 2023
[short] The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Dec 25, 2023
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Dec 25, 2023
[short] LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
Dec 24, 2023
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
Dec 24, 2023
[short] Time is Encoded in the Weights of Finetuned Language Models
Dec 23, 2023
Time is Encoded in the Weights of Finetuned Language Models
Dec 23, 2023
[short] DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Dec 22, 2023
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Dec 22, 2023
[short] Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Dec 21, 2023
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Dec 21, 2023
[short] LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Dec 20, 2023
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Dec 20, 2023
[short] G-LLaVA : Solving Geometric Problem with Multi-Modal Large Language Model
Dec 19, 2023
G-LLaVA : Solving Geometric Problem with Multi-Modal Large Language Model
Dec 19, 2023
[short] Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Dec 18, 2023
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Dec 18, 2023
[short] Self-Evaluation Improves Selective Generation in Large Language Models
Dec 18, 2023
Self-Evaluation Improves Selective Generation in Large Language Models
Dec 18, 2023
[short] Weight Subcloning: Direct Initialization of Transformers Using Larger Pretrained Ones
Dec 18, 2023
Weight Subcloning: Direct Initialization of Transformers Using Larger Pretrained Ones
Dec 18, 2023
[short] TinyGSM: achieving on GSM8k with small language models
Dec 16, 2023
TinyGSM: achieving on GSM8k with small language models
Dec 16, 2023
[short] Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Dec 15, 2023
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Dec 15, 2023
[short] Vision-Language Models as a Source of Rewards
Dec 15, 2023
Vision-Language Models as a Source of Rewards
Dec 15, 2023
[short] SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Dec 14, 2023
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Dec 14, 2023
[short] Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Dec 13, 2023
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Dec 13, 2023
[short] Steering Llama 2 via Contrastive Activation Addition
Dec 13, 2023
Steering Llama 2 via Contrastive Activation Addition
Dec 13, 2023
[short] LLM360: Towards Fully Transparent Open-Source LLMs
Dec 12, 2023
LLM360: Towards Fully Transparent Open-Source LLMs
Dec 12, 2023
[short] Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models
Dec 12, 2023
Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models
Dec 12, 2023
[short] Context Tuning for Retrieval Augmented Generation
Dec 12, 2023
Context Tuning for Retrieval Augmented Generation
Dec 12, 2023
[short] Using Captum to Explain Generative Language Models
Dec 12, 2023
Using Captum to Explain Generative Language Models
Dec 12, 2023
[short] HALO: An Ontology for Representing Hallucinations in Generative Models
Dec 11, 2023
HALO: An Ontology for Representing Hallucinations in Generative Models
Dec 11, 2023
[short] How to Guess a Gradient
Dec 11, 2023
How to Guess a Gradient
Dec 11, 2023
[short] Using Large Language Models for Hyperparameter Optimization
Dec 09, 2023
Using Large Language Models for Hyperparameter Optimization
Dec 09, 2023
[short] Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Dec 08, 2023
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Dec 08, 2023
[short] Efficient Monotonic Multihead Attention
Dec 08, 2023
Efficient Monotonic Multihead Attention
Dec 08, 2023
[short] Self-conditioned Image Generation via Generating Representations
Dec 07, 2023
Self-conditioned Image Generation via Generating Representations
Dec 07, 2023
[short] Relightable Gaussian Codec Avatars
Dec 07, 2023
Relightable Gaussian Codec Avatars
Dec 07, 2023
[short] Training Chain-of-Thought via Latent-Variable Inference
Dec 06, 2023
Training Chain-of-Thought via Latent-Variable Inference
Dec 06, 2023
[short] Eliciting Latent Knowledge from Quirky Language Models
Dec 05, 2023
Eliciting Latent Knowledge from Quirky Language Models
Dec 05, 2023
[short] The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Dec 05, 2023
  The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Dec 05, 2023
[short] GIVT: Generative Infinite-Vocabulary Transformers
Dec 05, 2023
GIVT: Generative Infinite-Vocabulary Transformers
Dec 05, 2023
[short] Object Recognition as Next Token Prediction
Dec 05, 2023
Object Recognition as Next Token Prediction
Dec 05, 2023
[short] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Dec 04, 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Dec 04, 2023
[short] Instruction-tuning Aligns LLMs to the Human Brain
Dec 04, 2023
Instruction-tuning Aligns LLMs to the Human Brain
Dec 04, 2023
[short] Dataset Distillation in Large Data Era
Dec 03, 2023
Dataset Distillation in Large Data Era
Dec 03, 2023
[short] One-step Diffusion with Distribution Matching Distillation
Dec 01, 2023
One-step Diffusion with Distribution Matching Distillation
Dec 01, 2023
[short] Initializing Models with Larger Ones
Dec 01, 2023
Initializing Models with Larger Ones
Dec 01, 2023
[short] SODA: Bottleneck Diffusion Models for Representation Learning
Nov 30, 2023
SODA: Bottleneck Diffusion Models for Representation Learning
Nov 30, 2023
[short] No Representation Rules Them All in Category Discovery
Nov 29, 2023
No Representation Rules Them All in Category Discovery
Nov 29, 2023
[short] Manifold Preserving Guided Diffusion
Nov 29, 2023
Manifold Preserving Guided Diffusion
Nov 29, 2023
[short] On the Long Range Abilities of Transformers
Nov 29, 2023
On the Long Range Abilities of Transformers
Nov 29, 2023
[short] Scalable Extraction of Training Data from (Production) Language Models
Nov 29, 2023
Scalable Extraction of Training Data from (Production) Language Models
Nov 29, 2023
[short] Scalable AI Safety via Doubly-Efficient Debate
Nov 27, 2023
Scalable AI Safety via Doubly-Efficient Debate
Nov 27, 2023
[short] Let's Verify Step by Step
Nov 24, 2023
Let's Verify Step by Step
Nov 24, 2023
PaSS: Parallel Speculative Sampling
Nov 24, 2023
[short] In-Context Learning Functions with Varying Number of Minima
Nov 22, 2023
In-Context Learning Functions with Varying Number of Minima
Nov 22, 2023
[short] Orca 2: Teaching Small Language Models How to Reason
Nov 21, 2023
Orca 2: Teaching Small Language Models How to Reason
Nov 21, 2023
[short] Exponentially Faster Language Modeling
Nov 21, 2023
Exponentially Faster Language Modeling
Nov 21, 2023
[short] Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Nov 20, 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Nov 20, 2023
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
Nov 20, 2023
[short] The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Nov 19, 2023
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Nov 19, 2023
[short] Striped Attention: Faster Ring Attention for Causal Transformers
Nov 17, 2023
Striped Attention: Faster Ring Attention for Causal Transformers
Nov 17, 2023
[short] SiRA: Sparse Mixture of Low Rank Adaptation
Nov 16, 2023
SiRA: Sparse Mixture of Low Rank Adaptation
Nov 16, 2023
[short] Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Nov 15, 2023
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Nov 15, 2023
[short] Frontier Language Models are not Robust to Adversarial Arithmetic, or “What do I need to say so you agree 2+2=5?”
Nov 15, 2023
Frontier Language Models are not Robust to Adversarial Arithmetic, or “What do I need to say so you agree 2+2=5?”
Nov 15, 2023
[short] The ART of LLM Refinement: Ask, Refine, and Trust
Nov 15, 2023
The ART of LLM Refinement: Ask, Refine, and Trust
Nov 15, 2023
[short] LUMOS: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
Nov 13, 2023
LUMOS: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
Nov 13, 2023
[short] The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models
Nov 13, 2023
The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models
Nov 13, 2023
[short] Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Nov 10, 2023
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Nov 10, 2023
Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
Nov 10, 2023
[short] LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Nov 10, 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Nov 10, 2023
[short] Everything of Thoughts : Defying the Law of Penrose Triangle for Thought Generation
Nov 09, 2023
Everything of Thoughts : Defying the Law of Penrose Triangle for Thought Generation
Nov 09, 2023
[short] Can LLMs Follow Simple Rules?
Nov 09, 2023
Can LLMs Follow Simple Rules?
Nov 09, 2023
[short] I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Nov 08, 2023
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Nov 08, 2023
[short] S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Nov 07, 2023
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Nov 07, 2023
[short] Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Nov 06, 2023
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Nov 06, 2023
[short] Global Optimization: A Machine Learning Approach
Nov 06, 2023
Global Optimization: A Machine Learning Approach
Nov 06, 2023
[short] Simplifying Transformer Blocks
Nov 06, 2023
Simplifying Transformer Blocks
Nov 06, 2023
[short] Managing AI Risks in an Era of Rapid Progress
Nov 06, 2023
Managing AI Risks in an Era of Rapid Progress
Nov 06, 2023
FlashDecoding++: Faster Large Language Model Inference on GPUs
Nov 03, 2023
[short] Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Nov 03, 2023
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Nov 03, 2023
[short] Text Rendering Strategies for Pixel Language Models
Nov 02, 2023
Text Rendering Strategies for Pixel Language Models
Nov 02, 2023
[short] The Generative AI Paradox: “What It Can Create, It May Not Understand”
Nov 02, 2023
The Generative AI Paradox: “What It Can Create, It May Not Understand”
Nov 02, 2023
[short] The Impact of Depth and Width on Transformer Language Model Generalization
Nov 01, 2023
The Impact of Depth and Width on Transformer Language Model Generalization
Nov 01, 2023
[short] Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Nov 01, 2023
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Nov 01, 2023
[short] A Survey on Knowledge Editing of Neural Networks
Oct 31, 2023
A Survey on Knowledge Editing of Neural Networks
Oct 31, 2023
[short] MM-VID : Advancing Video Understanding with GPT-4V(ision)
Oct 31, 2023
MM-VID : Advancing Video Understanding with GPT-4V(ision)
Oct 31, 2023
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Oct 30, 2023
[short] Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation
Oct 30, 2023
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation
Oct 30, 2023
[small] Zephyr: Direct Distillation of LM Alignment
Oct 29, 2023
Zephyr: Direct Distillation of LM Alignment
Oct 29, 2023
[short] JudgeLM : Fine-tuned Large Language Models are Scalable Judges
Oct 28, 2023
JudgeLM : Fine-tuned Large Language Models are Scalable Judges
Oct 28, 2023
How do Language Models Bind Entities in Context?
Oct 27, 2023
[short] Controlled Decoding from Language Models
Oct 27, 2023
Controlled Decoding from Language Models
Oct 27, 2023
[short] Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
Oct 27, 2023
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
Oct 27, 2023
[short] A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Oct 26, 2023
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Oct 26, 2023
[short] Detecting Pretraining Data from Large Language Models
Oct 26, 2023
Detecting Pretraining Data from Large Language Models
Oct 26, 2023
[short] QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Oct 26, 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Oct 26, 2023
[short] Function Vectors in Large Language Models
Oct 25, 2023
Function Vectors in Large Language Models
Oct 25, 2023
[short] In-Context Learning Creates Task Vectors
Oct 25, 2023
In-Context Learning Creates Task Vectors
Oct 25, 2023
[short] Woodpecker: Hallucination Correction for Multimodal Large Language Models
Oct 25, 2023
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Oct 25, 2023
[short] Matryoshka Diffusion Models
Oct 24, 2023
Matryoshka Diffusion Models
Oct 24, 2023
[short] SpecTr: Fast Speculative Decoding via Optimal Transport
Oct 24, 2023
SpecTr: Fast Speculative Decoding via Optimal Transport
Oct 24, 2023
[short] Linear Representations of Sentiment in Large Language Models
Oct 24, 2023
Linear Representations of Sentiment in Large Language Models
Oct 24, 2023
Towards Understanding Sycophancy in Language Models
Oct 23, 2023
[short] Contrastive Preference Learning: Learning from Human Feedback without RL
Oct 23, 2023
Contrastive Preference Learning: Learning from Human Feedback without RL
Oct 23, 2023
[short] Demystifying the Myths and Legends of Nonconvex Convergence of SGD
Oct 22, 2023
Demystifying the Myths and Legends of Nonconvex Convergence of SGD
Oct 22, 2023
3D-GPT: Procedural 3D Modeling with Large Language Models
Oct 20, 2023
[short] An Emulator for Fine-Tuning Large Language Models using Small Language Models
Oct 20, 2023
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Oct 20, 2023
[short] Frozen Transformers in Language Models Are Effective Visual Encoder Layers
Oct 20, 2023
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
Oct 20, 2023
[short] AgentTuning: Enabling Generalized Agent Abilities for LLMs
Oct 20, 2023
[short] Training Dynamics of Deep Network Linear Regions
Oct 20, 2023
Training Dynamics of Deep Network Linear Regions
Oct 20, 2023
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Oct 20, 2023
[short] Eliciting Human Preferences with Language Models
Oct 19, 2023
Eliciting Human Preferences with Language Models
Oct 19, 2023
[short] Emptying the Ocean with a Spoon: Should We Edit Models?
Oct 19, 2023
Emptying the Ocean with a Spoon: Should We Edit Models?
Oct 19, 2023
[short] SELF-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Oct 19, 2023
SELF-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Oct 19, 2023
[short] Approximating Two-Layer Feedforward Networks for Efficient Transformers
Oct 18, 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Oct 18, 2023
[short] Context-Aware Meta-Learning
Oct 18, 2023
Context-Aware Meta-Learning
Oct 18, 2023
[short] In-Context Pretraining: Language Modeling Beyond Document Boundaries
Oct 17, 2023
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Oct 17, 2023
[short] Improving Large Language Model Fine-tuning for Solving Math Problems
Oct 17, 2023
Improving Large Language Model Fine-tuning for Solving Math Problems
Oct 17, 2023
[short] Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
Oct 17, 2023
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
Oct 17, 2023
[short] MemGPT: Towards LLMs as Operating Systems
Oct 16, 2023
MemGPT: Towards LLMs as Operating Systems
Oct 16, 2023
[short] Neural Diffusion Models
Oct 13, 2023
Neural Diffusion Models
Oct 13, 2023
[short] Offline pRetraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Oct 13, 2023
Offline pRetraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Oct 13, 2023
[short] Online Speculative Decoding
Oct 12, 2023
Online Speculative Decoding
Oct 12, 2023
[short] MatFormer: Nested Transformer for Elastic Inference
Oct 12, 2023
MatFormer: Nested Transformer for Elastic Inference
Oct 12, 2023
[short] Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Oct 11, 2023
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Oct 11, 2023
Mistral 7B
Oct 11, 2023
[short] Teaching Language Models to Hallucinate Less with Synthetic Tasks
Oct 11, 2023
Teaching Language Models to Hallucinate Less with Synthetic Tasks
Oct 11, 2023
[short] Transformer Fusion with Optimal Transport
Oct 10, 2023
Transformer Fusion with Optimal Transport
Oct 10, 2023
[short] Grokking as Compression: A Nonlinear Complexity Perspective
Oct 10, 2023
Grokking as Compression: A Nonlinear Complexity Perspective
Oct 10, 2023
[short] Leveraging unpaired data for vision-language generative models via Cycle Consistency
Oct 06, 2023
Leveraging unpaired data for vision-language generative models via Cycle Consistency
Oct 06, 2023
[short] Language Models Represent Space and Time
Oct 05, 2023
Language Models Represent Space and Time
Oct 05, 2023
[short] Retrieval meets Long Context Large Language Models
Oct 05, 2023
Retrieval meets Long Context Large Language Models
Oct 05, 2023
[short] Large Language Models Cannot Self-Correct Reasoning Yet
Oct 04, 2023
Large Language Models Cannot Self-Correct Reasoning Yet
Oct 04, 2023
[short] Think before you speak: Training Language Models With Pause Tokens
Oct 04, 2023
Think before you speak: Training Language Models With Pause Tokens
Oct 04, 2023
[short] Enable Language Models to Implicitly Learn Self-Improvement From Data
Oct 03, 2023
Enable Language Models to Implicitly Learn Self-Improvement From Data
Oct 03, 2023
[short] AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
Oct 03, 2023
AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
Oct 03, 2023
Representation Engineering: A Top-Down Approach to AI Transparency
Oct 03, 2023
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Oct 02, 2023
[short] Efficient Streaming Language Models with Attention Sinks
Oct 02, 2023
Efficient Streaming Language Models with Attention Sinks
Oct 02, 2023
[short] Tree Cross Attention
Oct 02, 2023
Tree Cross Attention
Oct 02, 2023
[short] Demystifying CLIP Data
Sep 30, 2023
Demystifying CLIP Data
Sep 30, 2023
[short] Effective Long-Context Scaling of Foundation Models
Sep 30, 2023
Effective Long-Context Scaling of Foundation Models
Sep 30, 2023
[short] GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Sep 29, 2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Sep 29, 2023
[short] MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Sep 29, 2023
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Sep 29, 2023
[short] Jointly Training Large Autoregressive Multimodal Models
Sep 28, 2023
Jointly Training Large Autoregressive Multimodal Models
Sep 28, 2023
Deep Model Fusion: A Survey
Sep 28, 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Sep 27, 2023
Large Language Model Alignment: A Survey
Sep 27, 2023
[short] Aligning Large Multimodal Models with Factually Augmented RLHF
Sep 27, 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Sep 27, 2023
[short] SCREWS : A Modular Framework for Reasoning with Revisions
Sep 26, 2023
SCREWS : A Modular Framework for Reasoning with Revisions
Sep 26, 2023
[small] Small-scale proxies for large-scale Transformer training instabilities
Sep 26, 2023
Small-scale proxies for large-scale Transformer training instabilities
Sep 26, 2023
[short] DEPT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Sep 25, 2023
DEPT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Sep 25, 2023
[short] Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning
Sep 25, 2023
The Rise and Potential of Large Language Model Based Agents: A Survey
Sep 24, 2023
[short] The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”
Sep 23, 2023
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”
Sep 23, 2023
[short] LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Sep 23, 2023
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Sep 22, 2023
[short] BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Sep 22, 2023
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Sep 22, 2023
[short] FreeU: Free Lunch in Diffusion U-Net
Sep 21, 2023
FreeU: Free Lunch in Diffusion U-Net
Sep 21, 2023
[short] Chain-of-Verification Reduces Hallucination in Large Language Models
Sep 21, 2023
Chain-of-Verification Reduces Hallucination in Large Language Models
Sep 21, 2023
[short] Language Modeling Is Compression
Sep 20, 2023
[full] Language Modeling Is Compression
Sep 20, 2023
[short] PDFTriage: Question Answering over Long, Structured Documents
Sep 19, 2023
[full] PDFTriage: Question Answering over Long, Structured Documents
Sep 19, 2023
[short] Contrastive Decoding Improves Reasoning in Large Language Models
Sep 19, 2023
[full] Contrastive Decoding Improves Reasoning in Large Language Models
Sep 19, 2023
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Sep 18, 2023
Scaling Laws for Sparsely-Connected Foundation Models
Sep 18, 2023
Agents: An Open-source Framework for Autonomous Language Agents
Sep 16, 2023
Statistical rejection sampling improves preference optimization
Sep 14, 2023
Efficient Memory Management for Large Language Model Serving with PagedAttention
Sep 13, 2023
Uncovering mesa-optimization algorithms in Transformers
Sep 13, 2023
Fast Inference from Transformers via Speculative Decoding
Sep 12, 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Sep 12, 2023
Fiat: Fusing Learning Paradigms with Instruction-Accelerated Tuning
Sep 12, 2023
ImageBind-LLM: Multi-modality Instruction Tuning
Sep 08, 2023
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Sep 08, 2023
SLiMe: Segment Like Me
Sep 07, 2023
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Sep 07, 2023
YaRN: Efficient Context Window Extension of Large Language Models
Sep 06, 2023
MVDream: Multi-view Diffusion for 3D Generation
Sep 01, 2023
LLaSM: Large Language and Speech Model
Aug 31, 2023
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Aug 31, 2023
Reprogramming under constraints: efficient and reliable transferability of lottery tickets
Aug 30, 2023
Fast Feedforward Networks
Aug 29, 2023
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
Aug 28, 2023
Code Llama: Open Foundation Models for Code
Aug 28, 2023
Critical Learning Periods Emerge Even in Deep Linear Networks
Aug 25, 2023
Giraffe: Adventures in Expanding Context Lengths in LLMs
Aug 23, 2023
Reinforced Self-Training (ReST) for Language Modeling
Aug 23, 2023
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
Aug 22, 2023
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Aug 21, 2023
A Theory for Emergence of Complex Skills in Language Models
Aug 18, 2023
Bayesian Flow Networks
Aug 17, 2023
Solving Challenging Math Word Problems Using GPT-4 with Code-based Self-Verification
Aug 16, 2023
The Devil is in the Errors: Leveraging LLMs for Fine-grained Machine Translation Evaluation
Aug 15, 2023
Shepherd: A Critic for Language Model Generation
Aug 10, 2023
Simple synthetic data reduces sycophancy in large language models
Aug 09, 2023
Evaluating Data Attribution for Text-to-Image Models
Aug 09, 2023
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Aug 08, 2023
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Jul 20, 2023
Challenges and Applications of Large Language Models
Jul 20, 2023
Towards A Unified Agent with Foundation Models
Jul 20, 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Jul 19, 2023
How Is ChatGPT’s Behavior Changing over Time?
Jul 19, 2023
Retentive Network: A Successor to Transformer for Large Language Models
Jul 18, 2023
Alpagasus: Training A Better Alpaca with Fewer Data
Jul 18, 2023
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
Jul 17, 2023
Learning to Retrieve In-Context Examples for Large Language Models
Jul 17, 2023
Copy is All You Need
Jul 17, 2023
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Jul 15, 2023
Self-consistency for open-ended generations
Jul 15, 2023
In-context Autoencoder for Context Compression in a Large Language Model
Jul 14, 2023
Towards Robust and Efficient Continual Language Learning
Jul 14, 2023
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
Jul 14, 2023
Secrets of RLHF in Large Language Models Part I: PPO
Jul 12, 2023
Generative Pretraining in Multimodality
Jul 12, 2023
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Jul 12, 2023
VampNet: Music Generation via Masked Acoustic Token Modeling
Jul 11, 2023
Large Language Models as General Pattern Machines
Jul 11, 2023
Teaching Arithmetic to Small Transformers
Jul 11, 2023
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning
Jul 11, 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Jul 09, 2023
LLM Calibration and Automatic Hallucination Detection via Pareto Optimal Self-supervision
Jul 09, 2023
Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts
Jul 09, 2023
Focused Transformer: Contrastive Training for Context Scaling
Jul 07, 2023
Lost in the Middle: How Language Models Use Long Contexts
Jul 07, 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using Flan Fine-Tuning
Jul 06, 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jul 06, 2023
Stay on topic with Classifier-Free Guidance
Jul 05, 2023
Segment Anything Meets Point Tracking
Jul 05, 2023
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization
Jun 30, 2023
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Jun 30, 2023
Are aligned neural networks adversarially aligned?
Jun 29, 2023
Extending Context Window of Large Language Models via Position Interpolation
Jun 28, 2023
Restart Sampling for Improving Generative Processes
Jun 28, 2023
SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling with Backtracking
Jun 28, 2023
MotionGPT: Human Motion as a Foreign Language
Jun 28, 2023
Language models are weak learners
Jun 27, 2023
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
Jun 27, 2023
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Jun 27, 2023
A Simple and Effective Pruning Approach for Large Language Models
Jun 25, 2023
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference
Jun 24, 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
Jun 24, 2023
Training Transformers with 4-bit Integers
Jun 23, 2023
Fast Segment Anything
Jun 22, 2023
Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision
Jun 22, 2023
Textbooks Are All You Need
Jun 22, 2023
Glimmer: generalized late-interaction memory reranker
Jun 21, 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Jun 20, 2023
Demystifying GPT Self-Repair for Code Generation
Jun 20, 2023
MagicBrush : A Manually Annotated Dataset for Instruction-Guided Image Editing
Jun 20, 2023
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Jun 19, 2023
TryOnDiffusion: A Tale of Two UNets
Jun 17, 2023
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Jun 15, 2023
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Jun 14, 2023
Augmenting Language Models with Long-Term Memory
Jun 14, 2023
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Jun 14, 2023
FasterViT: Fast Vision Transformers with Hierarchical Attention
Jun 13, 2023