Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
Episode | Date |
---|---|
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
|
Nov 01, 2024 |
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
|
Oct 31, 2024 |
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
|
Oct 30, 2024 |
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
|
Oct 18, 2024 |
LightRAG: Simple and Fast Retrieval-Augmented Generation
|
Oct 17, 2024 |
Aria: An Open Multimodal Native Mixture-of-Experts Model
|
Oct 16, 2024 |
AgentKit: Structured LLM Reasoning with Dynamic Graphs
|
Oct 15, 2024 |
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
|
Oct 14, 2024 |
Diffusion Models are Evolutionary Algorithms
|
Oct 10, 2024 |
Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering
|
Oct 09, 2024 |
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
|
Oct 08, 2024 |
Internal Consistency and Self-Feedback in Large Language Models: A Survey
|
Oct 07, 2024 |
On the Diagram of Thought
|
Oct 02, 2024 |
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
|
Oct 01, 2024 |
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
|
Sep 30, 2024 |
On the limits of agency in agent-based models
|
Sep 24, 2024 |
Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization
|
Sep 23, 2024 |
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
|
Sep 23, 2024 |
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery
|
Sep 21, 2024 |
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
|
Sep 20, 2024 |
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
|
Sep 19, 2024 |
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
|
Sep 18, 2024 |
GeoCalib: Learning Single-image Calibration with Geometric Optimization
|
Sep 17, 2024 |
Artificial Immune System of Secure Face Recognition Against Adversarial Attacks
|
Sep 13, 2024 |
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
|
Sep 12, 2024 |
rerankers: A Lightweight Python Library to Unify Ranking Methods
|
Sep 11, 2024 |
Automated Design of Agentic Systems
|
Sep 10, 2024 |
Text2SQL is Not Enough: Unifying AI and Databases with TAG
|
Sep 09, 2024 |
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
|
Sep 05, 2024 |
Sapiens: Foundation for Human Vision Models
|
Sep 04, 2024 |
OctFusion: Octree-based Diffusion Models for 3D Shape Generation
|
Sep 03, 2024 |
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
|
Sep 02, 2024 |
Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs
|
Aug 30, 2024 |
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation
|
Aug 29, 2024 |
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
|
Aug 28, 2024 |
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
|
Aug 23, 2024 |
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
|
Aug 21, 2024 |
ControlNeXt: Powerful and Efficient Control for Image and Video Generation
|
Aug 20, 2024 |
OpenResearcher: Unleashing AI for Accelerated Scientific Research
|
Aug 19, 2024 |
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
|
Aug 14, 2024 |
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
|
Aug 13, 2024 |
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
|
Aug 09, 2024 |
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
|
Aug 08, 2024 |
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
|
Aug 07, 2024 |
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
|
Aug 05, 2024 |
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
|
Jul 31, 2024 |
FinanceBench: A New Benchmark for Financial Question Answering
|
Jul 30, 2024 |
Stable-Hair: Real-World Hair Transfer via Diffusion Model
|
Jul 29, 2024 |
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
|
Jul 26, 2024 |
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
|
Jul 25, 2024 |
Patch-Level Training for Large Language Models
|
Jul 24, 2024 |
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
|
Jul 23, 2024 |
IMAGDressing-v1: Customizable Virtual Dressing
|
Jul 22, 2024 |
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
|
Jul 19, 2024 |
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
|
Jul 18, 2024 |
SEED-Story: Multimodal Long Story Generation with Large Language Model
|
Jul 16, 2024 |
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
|
Jul 15, 2024 |
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
|
Jul 12, 2024 |
Agentless: Demystifying LLM-based Software Engineering Agents
|
Jul 11, 2024 |
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
|
Jul 09, 2024 |
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
|
Jul 08, 2024 |
Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
|
Jul 05, 2024 |
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
|
Jul 04, 2024 |
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
|
Jun 28, 2024 |
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
|
Jun 27, 2024 |
Seven Failure Points When Engineering a Retrieval Augmented Generation System
|
Jun 26, 2024 |
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
|
Jun 25, 2024 |
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM
|
Jun 24, 2024 |
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
|
Jun 21, 2024 |
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
|
Jun 20, 2024 |
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
|
Jun 19, 2024 |
”Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
|
Jun 18, 2024 |
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
|
Jun 17, 2024 |
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
|
Jun 13, 2024 |
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning
|
Jun 12, 2024 |
AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}
|
Jun 11, 2024 |
From Sora What We Can See: A Survey of Text-to-Video Generation
|
Jun 04, 2024 |
The Future of Large Language Model Pre-training is Federated
|
Jun 03, 2024 |
Long-form factuality in large language models
|
Jun 01, 2024 |
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
|
May 31, 2024 |
Retrieval-Augmented Generation for AI-Generated Content: A Survey
|
May 29, 2024 |
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
|
May 28, 2024 |
LightAutoML: AutoML Solution for a Large Financial Services Ecosystem
|
May 27, 2024 |
Efficient Multimodal Large Language Models: A Survey
|
May 24, 2024 |
The Platonic Representation Hypothesis
|
May 23, 2024 |
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
|
May 22, 2024 |
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
|
May 21, 2024 |
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
|
May 16, 2024 |
A decoder-only foundation model for time-series forecasting
|
May 14, 2024 |
Autonomous LLM-driven research from data to human-verifiable research papers
|
May 13, 2024 |
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
|
May 12, 2024 |
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
|
May 11, 2024 |
Improving Diffusion Models for Virtual Try-on
|
May 10, 2024 |
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
|
May 08, 2024 |
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
|
May 07, 2024 |
KAN: Kolmogorov-Arnold Networks
|
May 06, 2024 |
Make Your LLM Fully Utilize the Context
|
May 03, 2024 |
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
|
May 02, 2024 |
Dynamic Generation of Personalities with Large Language Models
|
Apr 30, 2024 |
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
|
Apr 25, 2024 |
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
|
Apr 24, 2024 |
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
|
Apr 23, 2024 |
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
|
Apr 22, 2024 |
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
|
Apr 19, 2024 |
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
|
Apr 18, 2024 |
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
|
Apr 16, 2024 |
AutoCodeRover: Autonomous Program Improvement
|
Apr 15, 2024 |
TrustLLM: Trustworthiness in Large Language Models
|
Apr 15, 2024 |
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
|
Apr 12, 2024 |
Fast Timing-Conditioned Latent Audio Diffusion
|
Apr 11, 2024 |
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
|
Apr 10, 2024 |
ReFT: Representation Finetuning for Language Models
|
Apr 09, 2024 |
Long-form factuality in large language models
|
Apr 08, 2024 |
Jamba: A Hybrid Transformer-Mamba Language Model
|
Apr 06, 2024 |
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
|
Apr 05, 2024 |
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
|
Apr 04, 2024 |
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
|
Apr 03, 2024 |
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
|
Apr 02, 2024 |
Evolutionary Optimization of Model Merging Recipes
|
Mar 27, 2024 |
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
|
Mar 26, 2024 |
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
|
Mar 25, 2024 |
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
|
Mar 22, 2024 |
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
|
Mar 21, 2024 |
Chronos: Learning the Language of Time Series
|
Mar 19, 2024 |
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
|
Mar 18, 2024 |
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
|
Mar 15, 2024 |
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
|
Mar 14, 2024 |
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
|
Mar 13, 2024 |
TripoSR: Fast 3D Object Reconstruction from a Single Image
|
Mar 12, 2024 |
Diffusion Model-Based Image Editing: A Survey
|
Mar 08, 2024 |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
|
Mar 07, 2024 |
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
|
Mar 06, 2024 |
Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases
|
Mar 05, 2024 |
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
|
Mar 04, 2024 |
BitDelta: Your Fine-Tune May Only Be Worth One Bit
|
Feb 27, 2024 |
Ring Attention with Blockwise Transformers for Near-Infinite Context
|
Feb 26, 2024 |
Premise Order Matters in Reasoning with Large Language Models
|
Feb 23, 2024 |
Generative Representational Instruction Tuning
|
Feb 20, 2024 |
DoRA: Weight-Decomposed Low-Rank Adaptation
|
Feb 19, 2024 |
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
|
Feb 18, 2024 |
World Model on Million-Length Video And Language With RingAttention
|
Feb 17, 2024 |
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
|
Feb 16, 2024 |
Fractal Patterns May Unravel the Intelligence in Next-Token Prediction
|
Feb 15, 2024 |
Precise Zero-Shot Dense Retrieval without Relevance Labels
|
Feb 13, 2024 |
Relevance-guided Supervision for OpenQA with ColBERT
|
Feb 11, 2024 |
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
|
Feb 11, 2024 |
PLAID: An Efficient Engine for Late Interaction Retrieval
|
Feb 10, 2024 |
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
|
Feb 09, 2024 |
Corrective Retrieval Augmented Generation
|
Feb 08, 2024 |
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
|
Feb 07, 2024 |
A Comprehensive Survey on 3D Content Generation
|
Feb 07, 2024 |
OLMo: Accelerating the Science of Language Models
|
Feb 06, 2024 |
Who’s Harry Potter? Approximate Unlearning in LLMs
|
Feb 04, 2024 |
Parameter-Efficient Transfer Learning for NLP
|
Feb 03, 2024 |
A Survey on Transformers in Reinforcement Learning
|
Feb 02, 2024 |
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
|
Feb 01, 2024 |
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
|
Jan 31, 2024 |
Matryoshka Representation Learning
|
Jan 30, 2024 |
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
|
Jan 27, 2024 |
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
|
Jan 26, 2024 |
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
|
Jan 25, 2024 |
Self-Rewarding Language Models
|
Jan 24, 2024 |
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
|
Jan 23, 2024 |
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
|
Jan 22, 2024 |
Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
|
Jan 19, 2024 |
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
|
Jan 18, 2024 |
Large Language Models for Generative Information Extraction: A Survey
|
Jan 17, 2024 |
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
|
Jan 16, 2024 |
Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon
|
Jan 15, 2024 |
Parameter-Efficient Transfer Learning for NLP
|
Jan 13, 2024 |
Mixtral of Experts
|
Jan 12, 2024 |
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
|
Jan 11, 2024 |
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
|
Jan 11, 2024 |
Video Understanding with Large Language Models: A Survey
|
Jan 10, 2024 |
GPT-4V(ision) is a Generalist Web Agent, if Grounded
|
Jan 09, 2024 |
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
|
Jan 08, 2024 |
AnyText: Multilingual Visual Text Generation And Editing
|
Jan 05, 2024 |
KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
|
Jan 04, 2024 |
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
|
Jan 03, 2024 |
Fast Inference of Mixture-of-Experts Language Models with Offloading
|
Jan 02, 2024 |
Retrieval-Augmented Generation for Large Language Models: A Survey
|
Dec 29, 2023 |
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
|
Dec 28, 2023 |
Pearl: A Production-ready Reinforcement Learning Agent
|
Dec 19, 2023 |
Are Emergent Abilities in Large Language Models just In-Context Learning?
|
Dec 17, 2023 |
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
|
Dec 16, 2023 |
Instruction Tuning for Large Language Models: A Survey
|
Dec 15, 2023 |
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
|
Dec 14, 2023 |
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
|
Dec 13, 2023 |
Sequential Modeling Enables Scalable Learning for Large Vision Models
|
Dec 12, 2023 |
Magicoder: Source Code Is All You Need
|
Dec 11, 2023 |
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
|
Dec 10, 2023 |
Adversarial Diffusion Distillation
|
Dec 09, 2023 |
Instruction Tuning with Human Curriculum
|
Dec 08, 2023 |
Initializing Models with Larger Ones
|
Dec 07, 2023 |
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
|
Dec 06, 2023 |
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
|
Dec 05, 2023 |
TaskWeaver: A Code-First Agent Framework
|
Dec 04, 2023 |
Efficient LLM Inference on CPUs
|
Dec 02, 2023 |
Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents
|
Dec 01, 2023 |
STaR: Bootstrapping Reasoning With Reasoning
|
Nov 30, 2023 |