AI Breakdown

By agibreakdown

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by agibreakdown

Category: Education

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast

Subscribers: 8
Reviews: 0
Episodes: 400

Description

The podcast where we use AI to breakdown the recent AI papers and provide simplified explanations of intricate AI topics for educational purposes. The content presented here is generated automatically by utilizing LLM and text to speech technologies. While every effort is made to ensure accuracy, any potential misrepresentations or inaccuracies are unintentional due to evolving technology. We value your feedback to enhance our podcast and provide you with the best possible learning experience.

Episode Date
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Mar 06, 2026
Mode Seeking meets Mean Seeking for Fast Long Video Generation
Mar 04, 2026
Recursive Language Models
Mar 04, 2026
PaperBanana: Automating Academic Illustration for AI Scientists
Feb 10, 2026
World-Gymnast: Training Robots with Reinforcement Learning in a World Model
Feb 10, 2026
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory
Jan 29, 2026
Self-Rewarding Language Models
Jan 08, 2026
On the generalization of language models from in-context learning and finetuning: a controlled study
Jan 05, 2026
OpenThoughts: Data Recipes for Reasoning Models
Dec 16, 2025
Nested Learning: The Illusion of Deep Learning Architecture
Dec 13, 2025
ARC Is a Vision Problem!
Dec 09, 2025
Solving a Million-Step LLM Task with Zero Errors
Dec 09, 2025
DataRater: Meta-Learned Dataset Curation
Dec 05, 2025
Mathematical exploration and discovery at scale
Nov 15, 2025
Kosmos: An AI Scientist for Autonomous Discovery
Nov 12, 2025
World Simulation with Video Foundation Models for Physical AI
Nov 08, 2025
Towards Robust Mathematical Reasoning
Nov 06, 2025
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Nov 04, 2025
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Oct 28, 2025
ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases
Oct 27, 2025
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Oct 27, 2025
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Oct 23, 2025
DeepSeek-OCR: Contexts Optical Compression
Oct 21, 2025
The Markovian Thinker
Oct 16, 2025
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
Oct 08, 2025
Towards a Physics Foundation Model
Oct 03, 2025
Scalable Option Learning in High-Throughput Environments
Sep 30, 2025
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Sep 24, 2025
Reverse-Engineered Reasoning for Open-Ended Generation
Sep 19, 2025
Scaling Performance of Large Language Model Pretraining
Sep 16, 2025
General Social Agents
Sep 15, 2025
We need a new ethics for a world of AI agents
Sep 12, 2025
Hierarchical Reasoning Model
Sep 11, 2025
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
Sep 10, 2025
Small Language Models are the Future of Agentic AI
Sep 09, 2025
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
Sep 08, 2025
Why Language Models Hallucinate
Sep 07, 2025
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Aug 19, 2025
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
Aug 15, 2025
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
Aug 13, 2025
Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
Aug 01, 2025
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
Jul 31, 2025
Working with AI: Measuring the Occupational Implications of Generative AI
Jul 31, 2025
Towards physician-centered oversight of conversational diagnostic AI
Jul 30, 2025
Learning without training: The implicit dynamics of in-context learning
Jul 28, 2025
Aime: Towards Fully-Autonomous Multi-Agent Framework
Jul 25, 2025
ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation
Jul 23, 2025
4KAgent: Agentic Any Image to 4K Super-Resolution
Jul 18, 2025
Critiques of World Models
Jul 16, 2025
Arxiv paper - Expert-level validation of AI-generated medical text with scalable language models
Jul 15, 2025
Arxiv paper - ImplicitQA: Going beyond frames towards Implicit Video Reasoning
Jul 11, 2025
Arxiv paper - BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
Jul 08, 2025
Arxiv paper - Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory
Jul 08, 2025
Blogpost paper - Project Vend: Can Claude run a small shop? (And why does that matter?)
Jul 02, 2025
Arxiv paper - Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
Jul 02, 2025
Arxiv paper - SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Jun 30, 2025
Arxiv paper - OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
Jun 27, 2025
Arxiv paper - Long-Context State-Space Video World Models
Jun 25, 2025
Arxiv paper - From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
Jun 24, 2025
Arxiv paper - Reinforcement Pre-Training
Jun 20, 2025
Arxiv paper - Token-Efficient Long Video Understanding for Multimodal LLMs
Jun 18, 2025
Arxiv paper - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Jun 11, 2025
Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Jun 09, 2025
Arxiv paper - How much do language models memorize?
Jun 06, 2025
Arxiv paper - MMaDA: Multimodal Large Diffusion Language Models
Jun 03, 2025
Arxiv paper - Superhuman performance of a large language model on the reasoning tasks of a physician
Jun 03, 2025
Arxiv paper - The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
May 29, 2025
Arxiv paper - DanceGRPO: Unleashing GRPO on Visual Generation
May 28, 2025
Arxiv paper - Visual Planning: Let’s Think Only with Images
May 21, 2025
Arxiv paper - A Preliminary Study for GPT-4o on Image Restoration
May 14, 2025
Arxiv paper - DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
May 12, 2025
Arxiv paper - RayZer: A Self-supervised Large View Synthesis Model
May 09, 2025
Arxiv paper - Reinforcement Learning for Reasoning in Large Language Models with One Training Example
May 08, 2025
Arxiv paper - MINERVA: Evaluating Complex Video Reasoning
May 06, 2025
Arxiv paper - The Leaderboard Illusion
May 06, 2025
Arxiv paper - Towards Understanding Camera Motions in Any Video
May 05, 2025
Arxiv paper - Describe Anything: Detailed Localized Image and Video Captioning
Apr 29, 2025
Arxiv paper - MCNC: MANIFOLD-CONSTRAINED REPARAMETERIZATION FOR NEURAL COMPRESSION
Apr 28, 2025
Arxiv paper - Self-Improving Robust Preference Optimization
Apr 23, 2025
Arxiv paper - LLM Post-Training: A Deep Dive into Reasoning Large Language Models
Apr 22, 2025
Arxiv paper - Welcome to the Era of Experience
Apr 21, 2025
Arxiv paper - MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Apr 19, 2025
Arxiv paper - InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Apr 17, 2025
Arxiv paper - EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise
Apr 16, 2025
Arxiv paper - TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Apr 16, 2025
Arxiv paper - Reasoning Models Don’t Always Say What They Think
Apr 09, 2025
Arxiv paper - Slow-Fast Architecture for Video Multi-Modal Large Language Models
Apr 07, 2025
Arxiv paper - TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
Apr 04, 2025
Arxiv paper - VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Apr 01, 2025
Arxiv paper - SynCity: Training-Free Generation of 3D Worlds
Mar 28, 2025
Arxiv paper - HD-EPIC: A Highly-Detailed Egocentric Video Dataset
Mar 26, 2025
Arxiv paper - Video-T1: Test-Time Scaling for Video Generation
Mar 25, 2025
Arxiv paper - Calibrated Multi-Preference Optimization for Aligning Diffusion Models
Mar 24, 2025
Arxiv paper - Personalize Anything for Free with Diffusion Transformer
Mar 21, 2025
Arxiv paper - Story-Adapter: A Training-free Iterative Framework for Long Story Visualization
Mar 20, 2025
Arxiv paper - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Mar 18, 2025
Arxiv paper - Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Mar 17, 2025
Arxiv paper - MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Mar 13, 2025
Arxiv paper - TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
Mar 12, 2025
Arxiv paper - PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
Mar 11, 2025
Arxiv paper - VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
Mar 08, 2025
Arxiv paper - ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
Mar 04, 2025
Arxiv paper - Teaching Language Models to Critique via Reinforcement Learning
Mar 03, 2025
Arxiv paper - PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling
Feb 27, 2025
Arxiv paper - VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
Feb 24, 2025
Arxiv paper - Heuristically Adaptive Diffusion-Model Evolutionary Strategy
Feb 22, 2025
Arxiv paper - Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Feb 20, 2025
Arxiv paper - EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Feb 19, 2025
Arxiv paper - VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Feb 14, 2025
Arxiv paper - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
Feb 13, 2025
Arxiv paper - HunyuanVideo: A Systematic Framework For Large Video Generative Models
Feb 12, 2025
Arxiv paper - s1: Simple test-time scaling
Feb 10, 2025
Arxiv paper - Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Feb 07, 2025
Arxiv paper - MatAnyone: Stable Video Matting with Consistent Memory Propagation
Feb 07, 2025
Arxiv paper - Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
Feb 03, 2025
Arxiv paper - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Jan 31, 2025
Arxiv paper - MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Jan 30, 2025
Arxiv paper - Improving Video Generation with Human Feedback
Jan 29, 2025
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Jan 28, 2025
Arxiv paper - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Jan 27, 2025
Arxiv paper - Can We Generate Images with CoT? Let’s Verify and Reinforce Image Generation Step by Step
Jan 24, 2025
Arxiv paper - Improving Factuality with Explicit Working Memory
Jan 23, 2025
Arxiv paper - Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Jan 17, 2025
Arxiv paper - FaceLift: Single Image to 3D Head with View Generation and GS-LRM
Jan 13, 2025
Arxiv paper - GenHMR: Generative Human Mesh Recovery
Jan 08, 2025
Arxiv paper - Video Creation by Demonstration
Jan 06, 2025
Arxiv paper - Byte Latent Transformer: Patches Scale Better Than Tokens
Jan 02, 2025
Arxiv paper - Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
Dec 17, 2024
Arxiv paper - FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Dec 17, 2024
Arxiv paper - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
Dec 11, 2024
Arxiv paper - o1-Coder: an o1 Replication for Coding
Dec 10, 2024
Arxiv paper - DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
Dec 06, 2024
ICLR 2025 submission - CYCLE-CONSISTENT LEARNING FOR JOINT LAYOUT-TO-IMAGE GENERATION AND OBJECT DETECTION
Dec 03, 2024
Arxiv Paper - WonderWorld: Interactive 3D Scene Generation from a Single Image
Nov 26, 2024
Arxiv Paper - Hymba: A Hybrid-head Architecture for Small Language Models
Nov 22, 2024
Arxiv Paper - Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Nov 21, 2024
Arxiv Paper - Video Instruction Tuning With Synthetic Data
Nov 20, 2024
Arxiv Paper - Generative Agent Simulations of 1,000 People
Nov 19, 2024
NeurIPS 2024 - Moving Off-the-Grid: Scene-Grounded Video Representations
Nov 15, 2024
Arxiv Paper - Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
Nov 14, 2024
Arxiv Paper - FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Nov 13, 2024
Arxiv Paper - Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Nov 11, 2024
Arxiv Paper - Long Context RAG Performance of Large Language Models
Nov 08, 2024
Arxiv Paper - NVLM: Open Frontier-Class Multimodal LLMs
Nov 05, 2024
Arxiv Paper - ColPali: Efficient Document Retrieval with Vision Language Models
Nov 01, 2024
Arxiv Paper - Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Oct 31, 2024
Arxiv Paper - Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
Oct 31, 2024
Arxiv Paper - Unbounded: A Generative Infinite Game of Character Life Simulation
Oct 29, 2024
Arxiv Paper - Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
Oct 28, 2024
Arxiv Paper - LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Oct 25, 2024
Arxiv Paper - When Does Perceptual Alignment Benefit Vision Representations?
Oct 23, 2024
Arxiv paper - SceneCraft: Layout-Guided 3D Scene Generation
Oct 22, 2024
arxiv preprint - A Tale of Tails: Model Collapse as a Change of Scaling Laws
Oct 18, 2024
arxiv preprint - Thinking LLMs: General Instruction Following with Thought Generation
Oct 17, 2024
arxiv preprint - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Oct 16, 2024
arxiv preprint - F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Oct 14, 2024
arxiv preprint - One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
Oct 11, 2024
arxiv preprint - Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Oct 10, 2024
arxiv preprint - NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING
Oct 07, 2024
arxiv preprint - SHIC: Shape-Image Correspondences with no Keypoint Supervision
Oct 04, 2024
arxiv preprint - E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Oct 03, 2024
arxiv preprint - LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Oct 01, 2024
arxiv preprint - DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Sep 28, 2024
arxiv preprint - Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
Sep 27, 2024
arxiv preprint - Phantom of Latent for Large Language and Vision Models
Sep 24, 2024
arxiv preprint - Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Sep 20, 2024
arxiv preprint - On the Diagram of Thought
Sep 19, 2024
arxiv preprint - Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
Sep 17, 2024
arxiv preprint - SongCreator: Lyrics-based Universal Song Generation
Sep 12, 2024
arxiv preprint - Achieving Human Level Competitive Robot Table Tennis
Sep 11, 2024
arxiv preprint - Sapiens: Foundation for Human Vision Models
Sep 09, 2024
arxiv preprint - Re-Reading Improves Reasoning in Large Language Models
Sep 06, 2024
arxiv preprint - SPIRE: Semantic Prompt-Driven Image Restoration
Sep 03, 2024
arxiv preprint - Automated Design of Agentic Systems
Aug 31, 2024
arxiv preprint - Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Aug 28, 2024
arxiv preprint - To Code, or Not To Code? Exploring Impact of Code in Pre-training
Aug 26, 2024
arxiv preprint - Segment Anything with Multiple Modalities
Aug 23, 2024
arxiv preprint - JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
Aug 20, 2024
arxiv preprint - Mission: Impossible Language Models
Aug 19, 2024
arxiv preprint - Learning Task Decomposition to Assist Humans in Competitive Programming
Aug 16, 2024
arxiv preprint - IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
Aug 13, 2024
arxiv preprint - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Aug 10, 2024
arxiv preprint - Language Model Can Listen While Speaking
Aug 09, 2024
arxiv preprint - Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning
Aug 07, 2024
arxiv preprint - Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Aug 06, 2024
arxiv preprint - Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Aug 06, 2024
arxiv preprint - Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
Jul 31, 2024
arxiv preprint - LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Jul 30, 2024
arxiv preprint - OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person
Jul 29, 2024
arxiv preprint - DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Jul 27, 2024
arxiv preprint - Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Jul 23, 2024
arxiv preprint - Chameleon: Mixed-Modal Early-Fusion Foundation Models
Jul 22, 2024
arxiv preprint - Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Jul 18, 2024
arxiv preprint - Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Jul 17, 2024
arxiv preprint - Human-like Episodic Memory for Infinite Context LLMs
Jul 15, 2024
arxiv preprint - Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Jul 12, 2024
arxiv preprint - Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Jul 11, 2024
arxiv preprint - Evaluating Human Alignment and Model Faithfulness of LLM Rationale
Jul 09, 2024
arxiv preprint - Detection and Measurement of Syntactic Templates in Generated Text
Jul 08, 2024
arxiv preprint - From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Jul 01, 2024
arxiv preprint - MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Jun 27, 2024
arxiv preprint - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Jun 26, 2024
arxiv preprint - VideoLLM-online: Online Video Large Language Model for Streaming Video
Jun 25, 2024
arxiv preprint - EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
Jun 24, 2024
arxiv preprint - MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Jun 22, 2024
arxiv preprint - An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Jun 20, 2024
arxiv preprint - Graphic Design with Large Multimodal Model
Jun 20, 2024
arxiv preprint - LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Jun 18, 2024
arxiv preprint - Transformers need glasses! Information over-squashing in language tasks
Jun 17, 2024
arxiv preprint - Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback
Jun 14, 2024
arxiv preprint - TextGrad: Automatic ”Differentiation” via Text
Jun 13, 2024
arxiv preprint - SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Jun 12, 2024
arxiv preprint - Open-Endedness is Essential for Artificial Superhuman Intelligence
Jun 11, 2024
arxiv preprint - To Believe or Not to Believe Your LLM
Jun 08, 2024
arxiv preprint - Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Jun 06, 2024
arxiv preprint - Contextual Position Encoding: Learning to Count What’s Important
Jun 04, 2024
arxiv preprint - Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Jun 03, 2024
arxiv preprint - VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
May 31, 2024
arxiv preprint - CinePile: A Long Video Question Answering Dataset and Benchmark
May 30, 2024
arxiv preprint - Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
May 29, 2024
arxiv preprint - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
May 28, 2024
arxiv preprint - Octo: An Open-Source Generalist Robot Policy
May 24, 2024
arxiv preprint - Layer-Condensed KV Cache for Efficient Inference of Large Language Models
May 23, 2024
arxiv preprint - Observational Scaling Laws and the Predictability of Language Model Performance
May 22, 2024
arxiv preprint - Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization
May 21, 2024
arxiv preprint - The Platonic Representation Hypothesis
May 20, 2024
arxiv preprint - Many-Shot In-Context Learning in Multimodal Foundation Models
May 18, 2024
arxiv preprint - Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
May 16, 2024
arxiv preprint - The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
May 15, 2024
arxiv preprint - Memory Mosaics
May 14, 2024
arxiv preprint - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
May 13, 2024
arxiv preprint - LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
May 10, 2024
arxiv preprint - WildChat: 1M ChatGPT Interaction Logs in the Wild
May 09, 2024
arxiv preprint - Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
May 08, 2024
arxiv preprint - NOLA: Compressing LoRA using Linear Combination of Random Basis
May 07, 2024
arxiv preprint - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
May 06, 2024
arxiv preprint - Iterative Reasoning Preference Optimization
May 03, 2024
arxiv preprint - Better & Faster Large Language Models via Multi-token Prediction
May 02, 2024
arxiv preprint - Make Your LLM Fully Utilize the Context
May 01, 2024
arxiv preprint - Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Apr 30, 2024
arxiv preprint - PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Apr 29, 2024
arxiv preprint - Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare
Apr 26, 2024
arxiv preprint - SnapKV: LLM Knows What You are Looking for Before Generation
Apr 25, 2024
arxiv preprint - CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Apr 24, 2024
arxiv preprint - SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Apr 23, 2024
arxiv preprint - TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Apr 22, 2024
arxiv preprint - EdgeFusion: On-Device Text-to-Image Generation
Apr 19, 2024
arxiv preprint - VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Apr 18, 2024
arxiv preprint - Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Apr 17, 2024
arxiv preprint - High-Dimension Human Value Representation in Large Language Models
Apr 16, 2024
arxiv preprint - Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
Apr 15, 2024
arxiv preprint - Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Apr 12, 2024
arxiv preprint - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Apr 11, 2024
arxiv preprint - Evaluating Text-to-Visual Generation with Image-to-Text Generation
Apr 10, 2024
arxiv preprint - Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
Apr 09, 2024
arxiv preprint - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
Apr 08, 2024
arxiv preprint - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Apr 05, 2024
arxiv preprint - WavLLM: Towards Robust and Adaptive Speech Large Language Model
Apr 04, 2024
arxiv preprint - Gecko: Versatile Text Embeddings Distilled from Large Language Models
Apr 03, 2024
arxiv preprint - ReALM: Reference Resolution As Language Modeling
Apr 02, 2024
arxiv preprint - sDPO: Don’t Use Your Data All at Once
Apr 01, 2024
arxiv preprint - LITA: Language Instructed Temporal-Localization Assistant
Mar 29, 2024
arxiv preprint - AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Mar 28, 2024
arxiv preprint - InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Mar 27, 2024
arxiv preprint - Giraffe: Adventures in Expanding Context Lengths in LLMs
Mar 26, 2024
arxiv preprint - Explorative Inbetweening of Time and Space
Mar 25, 2024
arxiv preprint - Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Mar 22, 2024
arxiv preprint - Evaluating Large Language Models at Evaluating Instruction Following
Mar 21, 2024
arxiv preprint - Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation
Mar 20, 2024
arxiv preprint - Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Mar 19, 2024
arxiv preprint - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Mar 18, 2024
arxiv preprint - Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Mar 15, 2024
arxiv preprint - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
Mar 14, 2024
arxiv preprint - Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Mar 13, 2024
arxiv preprint - Is Cosine-Similarity of Embeddings Really About Similarity?
Mar 12, 2024
arxiv preprint - A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mar 11, 2024
arxiv preprint - Self-correcting LLM-controlled Diffusion Models
Mar 08, 2024
arxiv preprint - tinyBenchmarks: evaluating LLMs with fewer examples
Mar 08, 2024
arxiv preprint - Asymmetry in Low-Rank Adapters of Foundation Models
Mar 06, 2024
arxiv preprint - When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Mar 05, 2024
arxiv preprint - EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Mar 04, 2024
arxiv preprint - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Mar 01, 2024
arxiv preprint - Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Feb 29, 2024
arxiv preprint - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Feb 28, 2024
arxiv preprint - Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Feb 27, 2024
arxiv preprint - SciMON: Scientific Inspiration Machines Optimized for Novelty
Feb 26, 2024
arxiv preprint - Speculative Streaming: Fast LLM Inference without Auxiliary Models
Feb 23, 2024
arxiv preprint - LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Feb 22, 2024
arxiv preprint - UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities
Feb 21, 2024
arxiv preprint - Guiding Instruction-based Image Editing via Multimodal Large Language Models
Feb 20, 2024
arxiv preprint - Spectral State Space Models
Feb 16, 2024
arxiv preprint - More Agents Is All You Need
Feb 15, 2024
arxiv preprint - World Model on Million-Length Video And Language With RingAttention
Feb 14, 2024
arxiv preprint - Learning Video Representations from Large Language Models
Feb 13, 2024
arxiv preprint - Can Large Language Models Understand Context?
Feb 12, 2024
arxiv preprint - Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Feb 09, 2024
arxiv preprint - System 2 Attention (is something you might need too)
Feb 08, 2024
arxiv preprint - DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Feb 07, 2024
arxiv preprint - KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Feb 06, 2024
arxiv preprint - Language Model Inversion
Feb 05, 2024
arxiv preprint - Tree Prompting: Efficient Task Adaptation without Fine-Tuning
Feb 02, 2024
arxiv preprint - Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Feb 01, 2024
arxiv preprint - Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Jan 31, 2024
arxiv preprint - RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Jan 30, 2024
arxiv preprint - SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Jan 29, 2024
arxiv preprint - Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
Jan 26, 2024
arxiv preprint - MambaByte: Token-free Selective State Space Model
Jan 25, 2024
arxiv preprint - Lumiere: A Space-Time Diffusion Model for Video Generation
Jan 24, 2024
arxiv preprint - Self-Rewarding Language Models
Jan 23, 2024
arxiv preprint - Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Jan 22, 2024
arxiv preprint - MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding
Jan 19, 2024
arxiv preprint - Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Jan 18, 2024
arxiv preprint - Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Jan 17, 2024
arxiv preprint - Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Jan 16, 2024
arxiv preprint - InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes
Jan 12, 2024
arxiv preprint - A Simple LLM Framework for Long-Range Video Question-Answering
Jan 11, 2024
arxiv preprint - Mixtral of Experts
Jan 09, 2024
arxiv preprint - Weight subcloning: direct initialization of transformers using larger pretrained ones
Jan 08, 2024
arxiv preprint - Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Jan 05, 2024
arxiv preprint - LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Jan 05, 2024
arxiv preprint - The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Jan 02, 2024
arxiv preprint - DreaMoving: A Human Video Generation Framework based on Diffusion Models
Dec 29, 2023
arxiv preprint - Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Dec 28, 2023
arxiv preprint - UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Dec 28, 2023
arxiv preprint - LongNet: Scaling Transformers to 1,000,000,000 Tokens
Dec 27, 2023
arxiv preprint - MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Dec 27, 2023
arxiv preprint - Model-tuning Via Prompts Makes NLP Models Adversarially Robust
Dec 26, 2023
arxiv preprint - Training Chain-of-Thought via Latent-Variable Inference
Dec 22, 2023
arxiv preprint - Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Dec 21, 2023
arxiv preprint - Instruction-tuning Aligns LLMs to the Human Brain
Dec 20, 2023
arxiv preprint - WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Dec 19, 2023
arxiv preprint - DemoFusion: Democratising High-Resolution Image Generation With No $$$
Dec 18, 2023
arxiv preprint - Recommender Systems with Generative Retrieval
Dec 15, 2023
arxiv preprint - Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Dec 14, 2023
arxiv preprint - Block-State Transformers
Dec 13, 2023
arxiv preprint - Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns
Dec 12, 2023
arxiv preprint - LooseControl: Lifting ControlNet for Generalized Depth Conditioning
Dec 11, 2023
Announcement: AI Breakdown Youtube Channel
Dec 08, 2023
arxiv preprint - OneLLM: One Framework to Align All Modalities with Language
Dec 08, 2023
arxiv preprint - The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Dec 08, 2023
arxiv - MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Dec 07, 2023
arxiv preprint - MLP-Mixer: An all-MLP Architecture for Vision
Dec 07, 2023
arxiv preprint - Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Dec 06, 2023
arxiv preprint - Nash Learning from Human Feedback
Dec 05, 2023
arxiv preprint - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Dec 04, 2023
arxiv preprint - Knowledge is a Region in Weight Space for Fine-tuned Language Models
Dec 03, 2023
arxiv preprint - MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Dec 02, 2023
arxiv preprint - Simplifying Transformer Blocks
Dec 01, 2023
arxiv - Visual In-Context Prompting
Nov 30, 2023
Arxiv Preprint - GAIA: a benchmark for General AI Assistants
Nov 29, 2023
Arxiv Preprint - DisCo: Disentangled Control for Realistic Human Dance Generation
Nov 28, 2023
Arxiv Preprint - Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
Nov 27, 2023
Arxiv Preprint - A General Theoretical Paradigm to Understand Learning from Human Preferences
Nov 25, 2023
Arxiv Preprint - ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Nov 22, 2023
ArXiv Preprint - S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Nov 21, 2023
ArXiv Preprint - Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Nov 20, 2023
Arxiv Preprint - LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Nov 17, 2023
ArXiv Preprint - Fine-tuning Language Models for Factuality
Nov 16, 2023
arxiv preprint - Language Models can be Logical Solvers
Nov 15, 2023
ArXiv Preprint - Prompt Engineering a Prompt Engineer
Nov 14, 2023
arxiv preprint - CogVLM: Visual Expert for Pretrained Language Models
Nov 13, 2023
ArXiv Preprint - De-Diffusion Makes Text a Strong Cross-Modal Interface
Nov 10, 2023
ArXiv Preprint - E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Nov 09, 2023
ArXiv Preprint - Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges
Nov 08, 2023
ArXiv Preprint - Learning From Mistakes Makes LLM Better Reasoner
Nov 07, 2023
ArXiv Preprint - The Generative AI Paradox: ”What It Can Create, It May Not Understand”
Nov 06, 2023
ArXiv Preprint - TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Nov 03, 2023
ArXiv Preprint - MM-VID: Advancing Video Understanding with GPT-4V(ision)
Nov 02, 2023
ArXiv Preprint - Zephyr: Direct Distillation of LM Alignment
Nov 01, 2023
ArXiv Preprint - ControlLLM: Augment Language Models with Tools by Searching on Graphs
Oct 31, 2023
ArXiv Preprint - Talk like a Graph: Encoding Graphs for Large Language Models
Oct 30, 2023
arxiv Preprint - AgentTuning: Enabling Generalized Agent Abilities for LLMs
Oct 29, 2023
ArXiv Preprint - Jailbreaking Black Box Large Language Models in Twenty Queries
Oct 28, 2023
ArXiv Preprint - Matryoshka Diffusion Models
Oct 27, 2023
arxiv Preprint - An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning
Oct 26, 2023
arxiv Preprint - Retrieval meets Long Context Large Language Models
Oct 25, 2023
arxiv Preprint - Contrastive Prefence Learning: Learning from Human Feedback without RL
Oct 24, 2023
arxiv Preprint - BitNet: Scaling 1-bit Transformers for Large Language Models
Oct 23, 2023
arxiv Preprint - Automatic Prompt Optimization with ”Gradient Descent” and Beam Search
Oct 22, 2023
arxiv Preprint - Understanding Retrieval Augmentation for Long-Form Question Answering
Oct 21, 2023
arxiv Preprint - On the Connection between Pre-training Data Diversity and Fine-tuning Robustness
Oct 20, 2023
arxiv Preprint - Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness
Oct 19, 2023
arxiv Preprint - In-Context Pretraining: Language Modeling Beyond Document Boundaries
Oct 18, 2023
ICCV 2023 - Sigmoid Loss for Language Image Pre-Training
Oct 17, 2023
arxiv Preprint - Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Oct 16, 2023
arxiv Preprint - HyperAttention: Long-context Attention in Near-Linear Time
Oct 15, 2023
arxiv Preprint - InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
Oct 13, 2023
arxiv Preprint - Large Language Models Cannot Self-Correct Reasoning Yet
Oct 12, 2023
arxiv Preprint - Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Oct 11, 2023
arxiv Preprint - Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
Oct 10, 2023
arxiv Preprint - Improved Baselines with Visual Instruction Tuning
Oct 09, 2023
arxiv Preprint - Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Oct 08, 2023
Neurips 2023 - Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
Oct 07, 2023
ICCV 2023 - Diffusion Models as Masked Autoencoders
Oct 06, 2023
arxiv Preprint - Conditional Diffusion Distillation
Oct 05, 2023
arxiv Preprint - Enable Language Models to Implicitly Learn Self-Improvement From Data
Oct 04, 2023
arxiv Preprint - Efficient Streaming Language Models with Attention Sinks
Oct 03, 2023
Neurips 2023 - PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving
Oct 02, 2023
arxiv Preprint - Vision Transformers Need Registers
Oct 01, 2023
arxiv Preprint - VPA: Fully Test-Time Visual Prompt Adaptation
Sep 30, 2023