Papers Read on AI

By Rob

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by Rob

Category: Tech News

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast
    

Subscribers: 7
Reviews: 0
Episodes: 200

Description

Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.

Episode Date
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
Nov 01, 2024
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
Oct 31, 2024
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Oct 30, 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Oct 18, 2024
LightRAG: Simple and Fast Retrieval-Augmented Generation
Oct 17, 2024
Aria: An Open Multimodal Native Mixture-of-Experts Model
Oct 16, 2024
AgentKit: Structured LLM Reasoning with Dynamic Graphs
Oct 15, 2024
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
Oct 14, 2024
Diffusion Models are Evolutionary Algorithms
Oct 10, 2024
Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering
Oct 09, 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Oct 08, 2024
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Oct 07, 2024
On the Diagram of Thought
Oct 02, 2024
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Oct 01, 2024
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
Sep 30, 2024
On the limits of agency in agent-based models
Sep 24, 2024
Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization
Sep 23, 2024
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Sep 23, 2024
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery
Sep 21, 2024
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Sep 20, 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Sep 19, 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Sep 18, 2024
GeoCalib: Learning Single-image Calibration with Geometric Optimization
Sep 17, 2024
Artificial Immune System of Secure Face Recognition Against Adversarial Attacks
Sep 13, 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Sep 12, 2024
rerankers: A Lightweight Python Library to Unify Ranking Methods
Sep 11, 2024
Automated Design of Agentic Systems
Sep 10, 2024
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Sep 09, 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Sep 05, 2024
Sapiens: Foundation for Human Vision Models
Sep 04, 2024
OctFusion: Octree-based Diffusion Models for 3D Shape Generation
Sep 03, 2024
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Sep 02, 2024
Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs
Aug 30, 2024
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation
Aug 29, 2024
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
Aug 28, 2024
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Aug 23, 2024
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Aug 21, 2024
ControlNeXt: Powerful and Efficient Control for Image and Video Generation
Aug 20, 2024
OpenResearcher: Unleashing AI for Accelerated Scientific Research
Aug 19, 2024
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Aug 14, 2024
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Aug 13, 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Aug 09, 2024
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Aug 08, 2024
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Aug 07, 2024
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Aug 05, 2024
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
Jul 31, 2024
FinanceBench: A New Benchmark for Financial Question Answering
Jul 30, 2024
Stable-Hair: Real-World Hair Transfer via Diffusion Model
Jul 29, 2024
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Jul 26, 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Jul 25, 2024
Patch-Level Training for Large Language Models
Jul 24, 2024
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Jul 23, 2024
IMAGDressing-v1: Customizable Virtual Dressing
Jul 22, 2024
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
Jul 19, 2024
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
Jul 18, 2024
SEED-Story: Multimodal Long Story Generation with Large Language Model
Jul 16, 2024
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Jul 15, 2024
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
Jul 12, 2024
Agentless: Demystifying LLM-based Software Engineering Agents
Jul 11, 2024
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Jul 09, 2024
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
Jul 08, 2024
Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
Jul 05, 2024
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Jul 04, 2024
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Jun 28, 2024
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Jun 27, 2024
Seven Failure Points When Engineering a Retrieval Augmented Generation System
Jun 26, 2024
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Jun 25, 2024
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM
Jun 24, 2024
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
Jun 21, 2024
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Jun 20, 2024
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Jun 19, 2024
”Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Jun 18, 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Jun 17, 2024
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Jun 13, 2024
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning
Jun 12, 2024
AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}
Jun 11, 2024
From Sora What We Can See: A Survey of Text-to-Video Generation
Jun 04, 2024
The Future of Large Language Model Pre-training is Federated
Jun 03, 2024
Long-form factuality in large language models
Jun 01, 2024
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
May 31, 2024
Retrieval-Augmented Generation for AI-Generated Content: A Survey
May 29, 2024
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
May 28, 2024
LightAutoML: AutoML Solution for a Large Financial Services Ecosystem
May 27, 2024
Efficient Multimodal Large Language Models: A Survey
May 24, 2024
The Platonic Representation Hypothesis
May 23, 2024
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
May 22, 2024
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
May 21, 2024
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
May 16, 2024
A decoder-only foundation model for time-series forecasting
May 14, 2024
Autonomous LLM-driven research from data to human-verifiable research papers
May 13, 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
May 12, 2024
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
May 11, 2024
Improving Diffusion Models for Virtual Try-on
May 10, 2024
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
May 08, 2024
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
May 07, 2024
KAN: Kolmogorov-Arnold Networks
May 06, 2024
Make Your LLM Fully Utilize the Context
May 03, 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
May 02, 2024
Dynamic Generation of Personalities with Large Language Models
Apr 30, 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Apr 25, 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Apr 24, 2024
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Apr 23, 2024
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Apr 22, 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Apr 19, 2024
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Apr 18, 2024
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Apr 16, 2024
AutoCodeRover: Autonomous Program Improvement
Apr 15, 2024
TrustLLM: Trustworthiness in Large Language Models
Apr 15, 2024
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Apr 12, 2024
Fast Timing-Conditioned Latent Audio Diffusion
Apr 11, 2024
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Apr 10, 2024
ReFT: Representation Finetuning for Language Models
Apr 09, 2024
Long-form factuality in large language models
Apr 08, 2024
Jamba: A Hybrid Transformer-Mamba Language Model
Apr 06, 2024
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Apr 05, 2024
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Apr 04, 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Apr 03, 2024
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Apr 02, 2024
Evolutionary Optimization of Model Merging Recipes
Mar 27, 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Mar 26, 2024
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
Mar 25, 2024
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
Mar 22, 2024
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
Mar 21, 2024
Chronos: Learning the Language of Time Series
Mar 19, 2024
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Mar 18, 2024
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
Mar 15, 2024
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
Mar 14, 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Mar 13, 2024
TripoSR: Fast 3D Object Reconstruction from a Single Image
Mar 12, 2024
Diffusion Model-Based Image Editing: A Survey
Mar 08, 2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Mar 07, 2024
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
Mar 06, 2024
Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases
Mar 05, 2024
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Mar 04, 2024
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Feb 27, 2024
Ring Attention with Blockwise Transformers for Near-Infinite Context
Feb 26, 2024
Premise Order Matters in Reasoning with Large Language Models
Feb 23, 2024
Generative Representational Instruction Tuning
Feb 20, 2024
DoRA: Weight-Decomposed Low-Rank Adaptation
Feb 19, 2024
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Feb 18, 2024
World Model on Million-Length Video And Language With RingAttention
Feb 17, 2024
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Feb 16, 2024
Fractal Patterns May Unravel the Intelligence in Next-Token Prediction
Feb 15, 2024
Precise Zero-Shot Dense Retrieval without Relevance Labels
Feb 13, 2024
Relevance-guided Supervision for OpenQA with ColBERT
Feb 11, 2024
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
Feb 11, 2024
PLAID: An Efficient Engine for Late Interaction Retrieval
Feb 10, 2024
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Feb 09, 2024
Corrective Retrieval Augmented Generation
Feb 08, 2024
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Feb 07, 2024
A Comprehensive Survey on 3D Content Generation
Feb 07, 2024
OLMo: Accelerating the Science of Language Models
Feb 06, 2024
Who’s Harry Potter? Approximate Unlearning in LLMs
Feb 04, 2024
Parameter-Efficient Transfer Learning for NLP
Feb 03, 2024
A Survey on Transformers in Reinforcement Learning
Feb 02, 2024
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
Feb 01, 2024
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Jan 31, 2024
Matryoshka Representation Learning
Jan 30, 2024
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Jan 27, 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Jan 26, 2024
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
Jan 25, 2024
Self-Rewarding Language Models
Jan 24, 2024
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Jan 23, 2024
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
Jan 22, 2024
Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
Jan 19, 2024
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Jan 18, 2024
Large Language Models for Generative Information Extraction: A Survey
Jan 17, 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Jan 16, 2024
Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon
Jan 15, 2024
Parameter-Efficient Transfer Learning for NLP
Jan 13, 2024
Mixtral of Experts
Jan 12, 2024
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Jan 11, 2024
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Jan 11, 2024
Video Understanding with Large Language Models: A Survey
Jan 10, 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Jan 09, 2024
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Jan 08, 2024
AnyText: Multilingual Visual Text Generation And Editing
Jan 05, 2024
KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
Jan 04, 2024
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Jan 03, 2024
Fast Inference of Mixture-of-Experts Language Models with Offloading
Jan 02, 2024
Retrieval-Augmented Generation for Large Language Models: A Survey
Dec 29, 2023
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Dec 28, 2023
Pearl: A Production-ready Reinforcement Learning Agent
Dec 19, 2023
Are Emergent Abilities in Large Language Models just In-Context Learning?
Dec 17, 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Dec 16, 2023
Instruction Tuning for Large Language Models: A Survey
Dec 15, 2023
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Dec 14, 2023
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Dec 13, 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
Dec 12, 2023
Magicoder: Source Code Is All You Need
Dec 11, 2023
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Dec 10, 2023
Adversarial Diffusion Distillation
Dec 09, 2023
Instruction Tuning with Human Curriculum
Dec 08, 2023
Initializing Models with Larger Ones
Dec 07, 2023
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
Dec 06, 2023
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
Dec 05, 2023
TaskWeaver: A Code-First Agent Framework
Dec 04, 2023
Efficient LLM Inference on CPUs
Dec 02, 2023
Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents
Dec 01, 2023
STaR: Bootstrapping Reasoning With Reasoning
Nov 30, 2023