Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Episode | Date |
---|---|
Make Your LLM Fully Utilize the Context
|
May 03, 2024 |
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
|
May 02, 2024 |
Dynamic Generation of Personalities with Large Language Models
|
Apr 30, 2024 |
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
|
Apr 25, 2024 |
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
|
Apr 24, 2024 |
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
|
Apr 23, 2024 |
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
|
Apr 22, 2024 |
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
|
Apr 19, 2024 |
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
|
Apr 18, 2024 |
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
|
Apr 16, 2024 |
AutoCodeRover: Autonomous Program Improvement
|
Apr 15, 2024 |
TrustLLM: Trustworthiness in Large Language Models
|
Apr 15, 2024 |
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
|
Apr 12, 2024 |
Fast Timing-Conditioned Latent Audio Diffusion
|
Apr 11, 2024 |
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
|
Apr 10, 2024 |
ReFT: Representation Finetuning for Language Models
|
Apr 09, 2024 |
Long-form factuality in large language models
|
Apr 08, 2024 |
Jamba: A Hybrid Transformer-Mamba Language Model
|
Apr 06, 2024 |
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
|
Apr 05, 2024 |
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
|
Apr 04, 2024 |
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
|
Apr 03, 2024 |
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
|
Apr 02, 2024 |
Evolutionary Optimization of Model Merging Recipes
|
Mar 27, 2024 |
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
|
Mar 26, 2024 |
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
|
Mar 25, 2024 |
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
|
Mar 22, 2024 |
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
|
Mar 21, 2024 |
Chronos: Learning the Language of Time Series
|
Mar 19, 2024 |
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
|
Mar 18, 2024 |
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
|
Mar 15, 2024 |
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
|
Mar 14, 2024 |
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
|
Mar 13, 2024 |
TripoSR: Fast 3D Object Reconstruction from a Single Image
|
Mar 12, 2024 |
Diffusion Model-Based Image Editing: A Survey
|
Mar 08, 2024 |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
|
Mar 07, 2024 |
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
|
Mar 06, 2024 |
Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases
|
Mar 05, 2024 |
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
|
Mar 04, 2024 |
BitDelta: Your Fine-Tune May Only Be Worth One Bit
|
Feb 27, 2024 |
Ring Attention with Blockwise Transformers for Near-Infinite Context
|
Feb 26, 2024 |
Premise Order Matters in Reasoning with Large Language Models
|
Feb 23, 2024 |
Generative Representational Instruction Tuning
|
Feb 20, 2024 |
DoRA: Weight-Decomposed Low-Rank Adaptation
|
Feb 19, 2024 |
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
|
Feb 18, 2024 |
World Model on Million-Length Video And Language With RingAttention
|
Feb 17, 2024 |
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
|
Feb 16, 2024 |
Fractal Patterns May Unravel the Intelligence in Next-Token Prediction
|
Feb 15, 2024 |
Precise Zero-Shot Dense Retrieval without Relevance Labels
|
Feb 13, 2024 |
Relevance-guided Supervision for OpenQA with ColBERT
|
Feb 11, 2024 |
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
|
Feb 11, 2024 |
PLAID: An Efficient Engine for Late Interaction Retrieval
|
Feb 10, 2024 |
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
|
Feb 09, 2024 |
Corrective Retrieval Augmented Generation
|
Feb 08, 2024 |
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
|
Feb 07, 2024 |
A Comprehensive Survey on 3D Content Generation
|
Feb 07, 2024 |
OLMo: Accelerating the Science of Language Models
|
Feb 06, 2024 |
Who’s Harry Potter? Approximate Unlearning in LLMs
|
Feb 04, 2024 |
Parameter-Efficient Transfer Learning for NLP
|
Feb 03, 2024 |
A Survey on Transformers in Reinforcement Learning
|
Feb 02, 2024 |
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
|
Feb 01, 2024 |
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
|
Jan 31, 2024 |
Matryoshka Representation Learning
|
Jan 30, 2024 |
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
|
Jan 27, 2024 |
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
|
Jan 26, 2024 |
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
|
Jan 25, 2024 |
Self-Rewarding Language Models
|
Jan 24, 2024 |
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
|
Jan 23, 2024 |
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
|
Jan 22, 2024 |
Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
|
Jan 19, 2024 |
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
|
Jan 18, 2024 |
Large Language Models for Generative Information Extraction: A Survey
|
Jan 17, 2024 |
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
|
Jan 16, 2024 |
Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon
|
Jan 15, 2024 |
Parameter-Efficient Transfer Learning for NLP
|
Jan 13, 2024 |
Mixtral of Experts
|
Jan 12, 2024 |
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
|
Jan 11, 2024 |
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
|
Jan 11, 2024 |
Video Understanding with Large Language Models: A Survey
|
Jan 10, 2024 |
GPT-4V(ision) is a Generalist Web Agent, if Grounded
|
Jan 09, 2024 |
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
|
Jan 08, 2024 |
AnyText: Multilingual Visual Text Generation And Editing
|
Jan 05, 2024 |
KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
|
Jan 04, 2024 |
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
|
Jan 03, 2024 |
Fast Inference of Mixture-of-Experts Language Models with Offloading
|
Jan 02, 2024 |
Retrieval-Augmented Generation for Large Language Models: A Survey
|
Dec 29, 2023 |
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
|
Dec 28, 2023 |
Pearl: A Production-ready Reinforcement Learning Agent
|
Dec 19, 2023 |
Are Emergent Abilities in Large Language Models just In-Context Learning?
|
Dec 17, 2023 |
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
|
Dec 16, 2023 |
Instruction Tuning for Large Language Models: A Survey
|
Dec 15, 2023 |
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
|
Dec 14, 2023 |
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
|
Dec 13, 2023 |
Sequential Modeling Enables Scalable Learning for Large Vision Models
|
Dec 12, 2023 |
Magicoder: Source Code Is All You Need
|
Dec 11, 2023 |
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
|
Dec 10, 2023 |
Adversarial Diffusion Distillation
|
Dec 09, 2023 |
Instruction Tuning with Human Curriculum
|
Dec 08, 2023 |
Initializing Models with Larger Ones
|
Dec 07, 2023 |
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
|
Dec 06, 2023 |
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
|
Dec 05, 2023 |
TaskWeaver: A Code-First Agent Framework
|
Dec 04, 2023 |
Efficient LLM Inference on CPUs
|
Dec 02, 2023 |
Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents
|
Dec 01, 2023 |
STaR: Bootstrapping Reasoning With Reasoning
|
Nov 30, 2023 |
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
|
Nov 29, 2023 |
Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
|
Nov 28, 2023 |
Exponentially Faster Language Modelling
|
Nov 27, 2023 |
Orca 2: Teaching Small Language Models How to Reason
|
Nov 24, 2023 |
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
|
Nov 23, 2023 |
A Survey on Language Models for Code
|
Nov 22, 2023 |
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
|
Nov 21, 2023 |
Learning to Filter Context for Retrieval-Augmented Generation
|
Nov 20, 2023 |
GraphCast: Learning skillful medium-range global weather forecasting
|
Nov 19, 2023 |
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
|
Nov 17, 2023 |
CogVLM: Visual Expert for Pretrained Language Models
|
Nov 16, 2023 |
How Can Recommender Systems Benefit from Large Language Models: A Survey
|
Nov 15, 2023 |
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
|
Nov 14, 2023 |
Generative Pretraining in Multimodality
|
Nov 10, 2023 |
Evaluating Large Language Models: A Comprehensive Survey
|
Nov 08, 2023 |
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
|
Nov 03, 2023 |
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
|
Nov 02, 2023 |
Zephyr: Direct Distillation of LM Alignment
|
Nov 01, 2023 |
Large Language Models for Software Engineering: Survey and Open Problems
|
Oct 31, 2023 |
Eureka: Human-Level Reward Design via Coding Large Language Models
|
Oct 30, 2023 |
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
|
Oct 29, 2023 |
Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference
|
Oct 27, 2023 |
AgentTuning: Enabling Generalized Agent Abilities for LLMs
|
Oct 26, 2023 |
MemGPT: Towards LLMs as Operating Systems
|
Oct 25, 2023 |
AgentTuning: Enabling Generalized Agent Abilities for LLMs
|
Oct 24, 2023 |
Character-LLM: A Trainable Agent for Role-Playing
|
Oct 23, 2023 |
OpenAgents: An Open Platform for Language Agents in the Wild
|
Oct 20, 2023 |
Text Embeddings Reveal (Almost) As Much As Text
|
Oct 18, 2023 |
Ferret: Refer and Ground Anything Anywhere at Any Granularity
|
Oct 17, 2023 |
Mistral 7B
|
Oct 16, 2023 |
A Definition of Continual Reinforcement Learning
|
Oct 15, 2023 |
A Survey of Techniques for Optimizing Transformer Inference
|
Oct 14, 2023 |
Improved Baselines with Visual Instruction Tuning
|
Oct 13, 2023 |
Can large language models provide useful feedback on research papers? A large-scale empirical analysis
|
Oct 12, 2023 |
The Rise and Potential of Large Language Model Based Agents: A Survey
|
Oct 11, 2023 |
Demystifying CLIP Data
|
Oct 10, 2023 |
MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
|
Oct 09, 2023 |
Efficient Streaming Language Models with Attention Sinks
|
Oct 06, 2023 |
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
|
Oct 05, 2023 |
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
|
Oct 04, 2023 |
What Makes Good In-Context Examples for GPT-3?
|
Oct 03, 2023 |
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
|
Oct 02, 2023 |
Qwen Technical Report
|
Oct 02, 2023 |
Extending Context Window of Large Language Models via Positional Interpolation
|
Sep 28, 2023 |
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
|
Sep 27, 2023 |
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
|
Sep 26, 2023 |
NExT-GPT: Any-to-Any Multimodal LLM
|
Sep 25, 2023 |
GPT Can Solve Mathematical Problems Without a Calculator
|
Sep 22, 2023 |
Tracking Anything with Decoupled Video Segmentation
|
Sep 21, 2023 |
ModuleFormer: Modularity Emerges from Mixture-of-Experts
|
Sep 20, 2023 |
Agents: An Open-source Framework for Autonomous Language Agents
|
Sep 18, 2023 |
Cognitive Architectures for Language Agents
|
Sep 15, 2023 |
PyGraft: Configurable Generation of Schemas and Knowledge Graphs at Your Fingertips
|
Sep 14, 2023 |
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
|
Sep 13, 2023 |
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
|
Sep 11, 2023 |
LLaSM: Large Language and Speech Model
|
Sep 07, 2023 |
Nougat: Neural Optical Understanding for Academic Documents
|
Sep 06, 2023 |
Communicative Agents for Software Development
|
Sep 04, 2023 |
Prompt2Model: Generating Deployable Models from Natural Language Instructions
|
Sep 02, 2023 |
Code Llama: Open Foundation Models for Code
|
Sep 01, 2023 |
A Survey on Large Language Model based Autonomous Agents
|
Aug 31, 2023 |
SoTaNa: The Open-Source Software Development Assistant
|
Aug 31, 2023 |
Efficient Guided Generation for Large Language Models
|
Aug 27, 2023 |
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
|
Aug 25, 2023 |
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
|
Aug 25, 2023 |
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
|
Aug 24, 2023 |
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
|
Aug 23, 2023 |
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
|
Aug 20, 2023 |
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
|
Aug 19, 2023 |
Simple synthetic data reduces sycophancy in large language models
|
Aug 18, 2023 |
Shepherd: A Critic for Language Model Generation
|
Aug 17, 2023 |
AgentBench: Evaluating LLMs as Agents
|
Aug 16, 2023 |
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
|
Aug 14, 2023 |
Revisiting the Minimalist Approach to Offline Reinforcement Learning
|
Aug 13, 2023 |
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
|
Aug 12, 2023 |
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
|
Aug 08, 2023 |
LISA: Reasoning Segmentation via Large Language Model
|
Aug 08, 2023 |
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
|
Aug 07, 2023 |
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
|
Aug 03, 2023 |
Universal and Transferable Adversarial Attacks on Aligned Language Models
|
Jul 31, 2023 |
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
|
Jul 30, 2023 |
Aligning Large Language Models with Human: A Survey
|
Jul 29, 2023 |
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
|
Jul 28, 2023 |
FABRIC: Personalizing Diffusion Models with Iterative Feedback
|
Jul 27, 2023 |
How is ChatGPT’s behavior changing over time?
|
Jul 26, 2023 |
Meta-Transformer: A Unified Framework for Multimodal Learning
|
Jul 25, 2023 |
LIMA: Less Is More for Alignment
|
Jul 25, 2023 |
Llama 2: Open Foundation and Fine-Tuned Chat Models
|
Jul 23, 2023 |
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
|
Jul 23, 2023 |
Deep Unrestricted Document Image Rectification
|
Jul 22, 2023 |
MMBench: Is Your Multi-modal Model an All-around Player?
|
Jul 21, 2023 |
Editing Large Language Models: Problems, Methods, and Opportunities
|
Jul 20, 2023 |
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
|
Jul 19, 2023 |
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
|
Jul 18, 2023 |
Secrets of RLHF in Large Language Models Part I: PPO
|
Jul 17, 2023 |
Liquid Time-constant Networks
|
Jul 16, 2023 |