Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
| Episode | Date |
|---|---|
|
Beyond Language Modeling: An Exploration of Multimodal Pretraining
|
Mar 06, 2026 |
|
Mode Seeking meets Mean Seeking for Fast Long Video Generation
|
Mar 04, 2026 |
|
Recursive Language Models
|
Mar 04, 2026 |
|
PaperBanana: Automating Academic Illustration for AI Scientists
|
Feb 10, 2026 |
|
World-Gymnast: Training Robots with Reinforcement Learning in a World Model
|
Feb 10, 2026 |
|
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory
|
Jan 29, 2026 |
|
Self-Rewarding Language Models
|
Jan 08, 2026 |
|
On the generalization of language models from in-context learning and finetuning: a controlled study
|
Jan 05, 2026 |
|
OpenThoughts: Data Recipes for Reasoning Models
|
Dec 16, 2025 |
|
Nested Learning: The Illusion of Deep Learning Architecture
|
Dec 13, 2025 |
|
ARC Is a Vision Problem!
|
Dec 09, 2025 |
|
Solving a Million-Step LLM Task with Zero Errors
|
Dec 09, 2025 |
|
DataRater: Meta-Learned Dataset Curation
|
Dec 05, 2025 |
|
Mathematical exploration and discovery at scale
|
Nov 15, 2025 |
|
Kosmos: An AI Scientist for Autonomous Discovery
|
Nov 12, 2025 |
|
World Simulation with Video Foundation Models for Physical AI
|
Nov 08, 2025 |
|
Towards Robust Mathematical Reasoning
|
Nov 06, 2025 |
|
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
|
Nov 04, 2025 |
|
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
|
Oct 28, 2025 |
|
ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases
|
Oct 27, 2025 |
|
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
|
Oct 27, 2025 |
|
Reasoning with Sampling: Your Base Model is Smarter Than You Think
|
Oct 23, 2025 |
|
DeepSeek-OCR: Contexts Optical Compression
|
Oct 21, 2025 |
|
The Markovian Thinker
|
Oct 16, 2025 |
|
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
|
Oct 08, 2025 |
|
Towards a Physics Foundation Model
|
Oct 03, 2025 |
|
Scalable Option Learning in High-Throughput Environments
|
Sep 30, 2025 |
|
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
|
Sep 24, 2025 |
|
Reverse-Engineered Reasoning for Open-Ended Generation
|
Sep 19, 2025 |
|
Scaling Performance of Large Language Model Pretraining
|
Sep 16, 2025 |
|
General Social Agents
|
Sep 15, 2025 |
|
We need a new ethics for a world of AI agents
|
Sep 12, 2025 |
|
Hierarchical Reasoning Model
|
Sep 11, 2025 |
|
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
|
Sep 10, 2025 |
|
Small Language Models are the Future of Agentic AI
|
Sep 09, 2025 |
|
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
|
Sep 08, 2025 |
|
Why Language Models Hallucinate
|
Sep 07, 2025 |
|
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
|
Aug 19, 2025 |
|
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
|
Aug 15, 2025 |
|
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
|
Aug 13, 2025 |
|
Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning
|
Aug 01, 2025 |
|
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
|
Jul 31, 2025 |
|
Working with AI: Measuring the Occupational Implications of Generative AI
|
Jul 31, 2025 |
|
Towards physician-centered oversight of conversational diagnostic AI
|
Jul 30, 2025 |
|
Learning without training: The implicit dynamics of in-context learning
|
Jul 28, 2025 |
|
Aime: Towards Fully-Autonomous Multi-Agent Framework
|
Jul 25, 2025 |
|
ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation
|
Jul 23, 2025 |
|
4KAgent: Agentic Any Image to 4K Super-Resolution
|
Jul 18, 2025 |
|
Critiques of World Models
|
Jul 16, 2025 |
|
Arxiv paper - Expert-level validation of AI-generated medical text with scalable language models
|
Jul 15, 2025 |
|
Arxiv paper - ImplicitQA: Going beyond frames towards Implicit Video Reasoning
|
Jul 11, 2025 |
|
Arxiv paper - BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
|
Jul 08, 2025 |
|
Arxiv paper - Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory
|
Jul 08, 2025 |
|
Blogpost paper - Project Vend: Can Claude run a small shop? (And why does that matter?)
|
Jul 02, 2025 |
|
Arxiv paper - Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
|
Jul 02, 2025 |
|
Arxiv paper - SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
|
Jun 30, 2025 |
|
Arxiv paper - OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
|
Jun 27, 2025 |
|
Arxiv paper - Long-Context State-Space Video World Models
|
Jun 25, 2025 |
|
Arxiv paper - From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
|
Jun 24, 2025 |
|
Arxiv paper - Reinforcement Pre-Training
|
Jun 20, 2025 |
|
Arxiv paper - Token-Efficient Long Video Understanding for Multimodal LLMs
|
Jun 18, 2025 |
|
Arxiv paper - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
|
Jun 11, 2025 |
|
Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
|
Jun 09, 2025 |
|
Arxiv paper - How much do language models memorize?
|
Jun 06, 2025 |
|
Arxiv paper - MMaDA: Multimodal Large Diffusion Language Models
|
Jun 03, 2025 |
|
Arxiv paper - Superhuman performance of a large language model on the reasoning tasks of a physician
|
Jun 03, 2025 |
|
Arxiv paper - The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
|
May 29, 2025 |
|
Arxiv paper - DanceGRPO: Unleashing GRPO on Visual Generation
|
May 28, 2025 |
|
Arxiv paper - Visual Planning: Let’s Think Only with Images
|
May 21, 2025 |
|
Arxiv paper - A Preliminary Study for GPT-4o on Image Restoration
|
May 14, 2025 |
|
Arxiv paper - DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
|
May 12, 2025 |
|
Arxiv paper - RayZer: A Self-supervised Large View Synthesis Model
|
May 09, 2025 |
|
Arxiv paper - Reinforcement Learning for Reasoning in Large Language Models with One Training Example
|
May 08, 2025 |
|
Arxiv paper - MINERVA: Evaluating Complex Video Reasoning
|
May 06, 2025 |
|
Arxiv paper - The Leaderboard Illusion
|
May 06, 2025 |
|
Arxiv paper - Towards Understanding Camera Motions in Any Video
|
May 05, 2025 |
|
Arxiv paper - Describe Anything: Detailed Localized Image and Video Captioning
|
Apr 29, 2025 |
|
Arxiv paper - MCNC: MANIFOLD-CONSTRAINED REPARAMETERIZATION FOR NEURAL COMPRESSION
|
Apr 28, 2025 |
|
Arxiv paper - Self-Improving Robust Preference Optimization
|
Apr 23, 2025 |
|
Arxiv paper - LLM Post-Training: A Deep Dive into Reasoning Large Language Models
|
Apr 22, 2025 |
|
Arxiv paper - Welcome to the Era of Experience
|
Apr 21, 2025 |
|
Arxiv paper - MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
|
Apr 19, 2025 |
|
Arxiv paper - InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
|
Apr 17, 2025 |
|
Arxiv paper - EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise
|
Apr 16, 2025 |
|
Arxiv paper - TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
|
Apr 16, 2025 |
|
Arxiv paper - Reasoning Models Don’t Always Say What They Think
|
Apr 09, 2025 |
|
Arxiv paper - Slow-Fast Architecture for Video Multi-Modal Large Language Models
|
Apr 07, 2025 |
|
Arxiv paper - TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
|
Apr 04, 2025 |
|
Arxiv paper - VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
|
Apr 01, 2025 |
|
Arxiv paper - SynCity: Training-Free Generation of 3D Worlds
|
Mar 28, 2025 |
|
Arxiv paper - HD-EPIC: A Highly-Detailed Egocentric Video Dataset
|
Mar 26, 2025 |
|
Arxiv paper - Video-T1: Test-Time Scaling for Video Generation
|
Mar 25, 2025 |
|
Arxiv paper - Calibrated Multi-Preference Optimization for Aligning Diffusion Models
|
Mar 24, 2025 |
|
Arxiv paper - Personalize Anything for Free with Diffusion Transformer
|
Mar 21, 2025 |
|
Arxiv paper - Story-Adapter: A Training-free Iterative Framework for Long Story Visualization
|
Mar 20, 2025 |
|
Arxiv paper - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
|
Mar 18, 2025 |
|
Arxiv paper - Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
|
Mar 17, 2025 |
|
Arxiv paper - MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
|
Mar 13, 2025 |
|
Arxiv paper - TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
|
Mar 12, 2025 |
|
Arxiv paper - PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
|
Mar 11, 2025 |
|
Arxiv paper - VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
|
Mar 08, 2025 |
|
Arxiv paper - ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
|
Mar 04, 2025 |
|
Arxiv paper - Teaching Language Models to Critique via Reinforcement Learning
|
Mar 03, 2025 |
|
Arxiv paper - PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling
|
Feb 27, 2025 |
|
Arxiv paper - VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
|
Feb 24, 2025 |
|
Arxiv paper - Heuristically Adaptive Diffusion-Model Evolutionary Strategy
|
Feb 22, 2025 |
|
Arxiv paper - Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
|
Feb 20, 2025 |
|
Arxiv paper - EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
|
Feb 19, 2025 |
|
Arxiv paper - VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
|
Feb 14, 2025 |
|
Arxiv paper - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
|
Feb 13, 2025 |
|
Arxiv paper - HunyuanVideo: A Systematic Framework For Large Video Generative Models
|
Feb 12, 2025 |
|
Arxiv paper - s1: Simple test-time scaling
|
Feb 10, 2025 |
|
Arxiv paper - Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
|
Feb 07, 2025 |
|
Arxiv paper - MatAnyone: Stable Video Matting with Consistent Memory Propagation
|
Feb 07, 2025 |
|
Arxiv paper - Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
|
Feb 03, 2025 |
|
Arxiv paper - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
|
Jan 31, 2025 |
|
Arxiv paper - MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
|
Jan 30, 2025 |
|
Arxiv paper - Improving Video Generation with Human Feedback
|
Jan 29, 2025 |
|
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
|
Jan 28, 2025 |
|
Arxiv paper - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
|
Jan 27, 2025 |
|
Arxiv paper - Can We Generate Images with CoT? Let’s Verify and Reinforce Image Generation Step by Step
|
Jan 24, 2025 |
|
Arxiv paper - Improving Factuality with Explicit Working Memory
|
Jan 23, 2025 |
|
Arxiv paper - Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
|
Jan 17, 2025 |
|
Arxiv paper - FaceLift: Single Image to 3D Head with View Generation and GS-LRM
|
Jan 13, 2025 |
|
Arxiv paper - GenHMR: Generative Human Mesh Recovery
|
Jan 08, 2025 |
|
Arxiv paper - Video Creation by Demonstration
|
Jan 06, 2025 |
|
Arxiv paper - Byte Latent Transformer: Patches Scale Better Than Tokens
|
Jan 02, 2025 |
|
Arxiv paper - Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
|
Dec 17, 2024 |
|
Arxiv paper - FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
|
Dec 17, 2024 |
|
Arxiv paper - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
|
Dec 11, 2024 |
|
Arxiv paper - o1-Coder: an o1 Replication for Coding
|
Dec 10, 2024 |
|
Arxiv paper - DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
|
Dec 06, 2024 |
|
ICLR 2025 submission - CYCLE-CONSISTENT LEARNING FOR JOINT LAYOUT-TO-IMAGE GENERATION AND OBJECT DETECTION
|
Dec 03, 2024 |
|
Arxiv Paper - WonderWorld: Interactive 3D Scene Generation from a Single Image
|
Nov 26, 2024 |
|
Arxiv Paper - Hymba: A Hybrid-head Architecture for Small Language Models
|
Nov 22, 2024 |
|
Arxiv Paper - Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
|
Nov 21, 2024 |
|
Arxiv Paper - Video Instruction Tuning With Synthetic Data
|
Nov 20, 2024 |
|
Arxiv Paper - Generative Agent Simulations of 1,000 People
|
Nov 19, 2024 |
|
NeurIPS 2024 - Moving Off-the-Grid: Scene-Grounded Video Representations
|
Nov 15, 2024 |
|
Arxiv Paper - Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
|
Nov 14, 2024 |
|
Arxiv Paper - FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
|
Nov 13, 2024 |
|
Arxiv Paper - Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
|
Nov 11, 2024 |
|
Arxiv Paper - Long Context RAG Performance of Large Language Models
|
Nov 08, 2024 |
|
Arxiv Paper - NVLM: Open Frontier-Class Multimodal LLMs
|
Nov 05, 2024 |
|
Arxiv Paper - ColPali: Efficient Document Retrieval with Vision Language Models
|
Nov 01, 2024 |
|
Arxiv Paper - Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
|
Oct 31, 2024 |
|
Arxiv Paper - Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
|
Oct 31, 2024 |
|
Arxiv Paper - Unbounded: A Generative Infinite Game of Character Life Simulation
|
Oct 29, 2024 |
|
Arxiv Paper - Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
|
Oct 28, 2024 |
|
Arxiv Paper - LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
|
Oct 25, 2024 |
|
Arxiv Paper - When Does Perceptual Alignment Benefit Vision Representations?
|
Oct 23, 2024 |
|
Arxiv paper - SceneCraft: Layout-Guided 3D Scene Generation
|
Oct 22, 2024 |
|
arxiv preprint - A Tale of Tails: Model Collapse as a Change of Scaling Laws
|
Oct 18, 2024 |
|
arxiv preprint - Thinking LLMs: General Instruction Following with Thought Generation
|
Oct 17, 2024 |
|
arxiv preprint - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
|
Oct 16, 2024 |
|
arxiv preprint - F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
|
Oct 14, 2024 |
|
arxiv preprint - One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
|
Oct 11, 2024 |
|
arxiv preprint - Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
|
Oct 10, 2024 |
|
arxiv preprint - NEPTUNE: THE LONG ORBIT TO BENCHMARKING LONG VIDEO UNDERSTANDING
|
Oct 07, 2024 |
|
arxiv preprint - SHIC: Shape-Image Correspondences with no Keypoint Supervision
|
Oct 04, 2024 |
|
arxiv preprint - E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
|
Oct 03, 2024 |
|
arxiv preprint - LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
|
Oct 01, 2024 |
|
arxiv preprint - DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
|
Sep 28, 2024 |
|
arxiv preprint - Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
|
Sep 27, 2024 |
|
arxiv preprint - Phantom of Latent for Large Language and Vision Models
|
Sep 24, 2024 |
|
arxiv preprint - Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
|
Sep 20, 2024 |
|
arxiv preprint - On the Diagram of Thought
|
Sep 19, 2024 |
|
arxiv preprint - Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
|
Sep 17, 2024 |
|
arxiv preprint - SongCreator: Lyrics-based Universal Song Generation
|
Sep 12, 2024 |
|
arxiv preprint - Achieving Human Level Competitive Robot Table Tennis
|
Sep 11, 2024 |
|
arxiv preprint - Sapiens: Foundation for Human Vision Models
|
Sep 09, 2024 |
|
arxiv preprint - Re-Reading Improves Reasoning in Large Language Models
|
Sep 06, 2024 |
|
arxiv preprint - SPIRE: Semantic Prompt-Driven Image Restoration
|
Sep 03, 2024 |
|
arxiv preprint - Automated Design of Agentic Systems
|
Aug 31, 2024 |
|
arxiv preprint - Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
|
Aug 28, 2024 |
|
arxiv preprint - To Code, or Not To Code? Exploring Impact of Code in Pre-training
|
Aug 26, 2024 |
|
arxiv preprint - Segment Anything with Multiple Modalities
|
Aug 23, 2024 |
|
arxiv preprint - JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
|
Aug 20, 2024 |
|
arxiv preprint - Mission: Impossible Language Models
|
Aug 19, 2024 |
|
arxiv preprint - Learning Task Decomposition to Assist Humans in Competitive Programming
|
Aug 16, 2024 |
|
arxiv preprint - IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
|
Aug 13, 2024 |
|
arxiv preprint - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
|
Aug 10, 2024 |
|
arxiv preprint - Language Model Can Listen While Speaking
|
Aug 09, 2024 |
|
arxiv preprint - Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning
|
Aug 07, 2024 |
|
arxiv preprint - Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
|
Aug 06, 2024 |
|
arxiv preprint - Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
|
Aug 06, 2024 |
|
arxiv preprint - Graph-enhanced Large Language Models in Asynchronous Plan Reasoning
|
Jul 31, 2024 |
|
arxiv preprint - LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
|
Jul 30, 2024 |
|
arxiv preprint - OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person
|
Jul 29, 2024 |
|
arxiv preprint - DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
|
Jul 27, 2024 |
|
arxiv preprint - Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
|
Jul 23, 2024 |
|
arxiv preprint - Chameleon: Mixed-Modal Early-Fusion Foundation Models
|
Jul 22, 2024 |
|
arxiv preprint - Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
|
Jul 18, 2024 |
|
arxiv preprint - Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
|
Jul 17, 2024 |
|
arxiv preprint - Human-like Episodic Memory for Infinite Context LLMs
|
Jul 15, 2024 |
|
arxiv preprint - Learning to (Learn at Test Time): RNNs with Expressive Hidden States
|
Jul 12, 2024 |
|
arxiv preprint - Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
|
Jul 11, 2024 |
|
arxiv preprint - Evaluating Human Alignment and Model Faithfulness of LLM Rationale
|
Jul 09, 2024 |
|
arxiv preprint - Detection and Measurement of Syntactic Templates in Generated Text
|
Jul 08, 2024 |
|
arxiv preprint - From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
|
Jul 01, 2024 |
|
arxiv preprint - MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
|
Jun 27, 2024 |
|
arxiv preprint - 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
|
Jun 26, 2024 |
|
arxiv preprint - VideoLLM-online: Online Video Large Language Model for Streaming Video
|
Jun 25, 2024 |
|
arxiv preprint - EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
|
Jun 24, 2024 |
|
arxiv preprint - MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
|
Jun 22, 2024 |
|
arxiv preprint - An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
|
Jun 20, 2024 |
|
arxiv preprint - Graphic Design with Large Multimodal Model
|
Jun 20, 2024 |
|
arxiv preprint - LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
|
Jun 18, 2024 |
|
arxiv preprint - Transformers need glasses! Information over-squashing in language tasks
|
Jun 17, 2024 |
|
arxiv preprint - Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback
|
Jun 14, 2024 |
|
arxiv preprint - TextGrad: Automatic ”Differentiation” via Text
|
Jun 13, 2024 |
|
arxiv preprint - SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
|
Jun 12, 2024 |
|
arxiv preprint - Open-Endedness is Essential for Artificial Superhuman Intelligence
|
Jun 11, 2024 |
|
arxiv preprint - To Believe or Not to Believe Your LLM
|
Jun 08, 2024 |
|
arxiv preprint - Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
|
Jun 06, 2024 |
|
arxiv preprint - Contextual Position Encoding: Learning to Count What’s Important
|
Jun 04, 2024 |
|
arxiv preprint - Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
|
Jun 03, 2024 |
|
arxiv preprint - VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
|
May 31, 2024 |
|
arxiv preprint - CinePile: A Long Video Question Answering Dataset and Benchmark
|
May 30, 2024 |
|
arxiv preprint - Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
|
May 29, 2024 |
|
arxiv preprint - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
|
May 28, 2024 |
|
arxiv preprint - Octo: An Open-Source Generalist Robot Policy
|
May 24, 2024 |
|
arxiv preprint - Layer-Condensed KV Cache for Efficient Inference of Large Language Models
|
May 23, 2024 |
|
arxiv preprint - Observational Scaling Laws and the Predictability of Language Model Performance
|
May 22, 2024 |
|
arxiv preprint - Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization
|
May 21, 2024 |
|
arxiv preprint - The Platonic Representation Hypothesis
|
May 20, 2024 |
|
arxiv preprint - Many-Shot In-Context Learning in Multimodal Foundation Models
|
May 18, 2024 |
|
arxiv preprint - Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
|
May 16, 2024 |
|
arxiv preprint - The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
|
May 15, 2024 |
|
arxiv preprint - Memory Mosaics
|
May 14, 2024 |
|
arxiv preprint - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
|
May 13, 2024 |
|
arxiv preprint - LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
|
May 10, 2024 |
|
arxiv preprint - WildChat: 1M ChatGPT Interaction Logs in the Wild
|
May 09, 2024 |
|
arxiv preprint - Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
|
May 08, 2024 |
|
arxiv preprint - NOLA: Compressing LoRA using Linear Combination of Random Basis
|
May 07, 2024 |
|
arxiv preprint - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
|
May 06, 2024 |
|
arxiv preprint - Iterative Reasoning Preference Optimization
|
May 03, 2024 |
|
arxiv preprint - Better & Faster Large Language Models via Multi-token Prediction
|
May 02, 2024 |
|
arxiv preprint - Make Your LLM Fully Utilize the Context
|
May 01, 2024 |
|
arxiv preprint - Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
|
Apr 30, 2024 |
|
arxiv preprint - PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
|
Apr 29, 2024 |
|
arxiv preprint - Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare
|
Apr 26, 2024 |
|
arxiv preprint - SnapKV: LLM Knows What You are Looking for Before Generation
|
Apr 25, 2024 |
|
arxiv preprint - CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
|
Apr 24, 2024 |
|
arxiv preprint - SpaceByte: Towards Deleting Tokenization from Large Language Modeling
|
Apr 23, 2024 |
|
arxiv preprint - TextSquare: Scaling up Text-Centric Visual Instruction Tuning
|
Apr 22, 2024 |
|
arxiv preprint - EdgeFusion: On-Device Text-to-Image Generation
|
Apr 19, 2024 |
|
arxiv preprint - VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
|
Apr 18, 2024 |
|
arxiv preprint - Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
|
Apr 17, 2024 |
|
arxiv preprint - High-Dimension Human Value Representation in Large Language Models
|
Apr 16, 2024 |
|
arxiv preprint - Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
|
Apr 15, 2024 |
|
arxiv preprint - Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
|
Apr 12, 2024 |
|
arxiv preprint - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
|
Apr 11, 2024 |
|
arxiv preprint - Evaluating Text-to-Visual Generation with Image-to-Text Generation
|
Apr 10, 2024 |
|
arxiv preprint - Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
|
Apr 09, 2024 |
|
arxiv preprint - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
|
Apr 08, 2024 |
|
arxiv preprint - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
|
Apr 05, 2024 |
|
arxiv preprint - WavLLM: Towards Robust and Adaptive Speech Large Language Model
|
Apr 04, 2024 |
|
arxiv preprint - Gecko: Versatile Text Embeddings Distilled from Large Language Models
|
Apr 03, 2024 |
|
arxiv preprint - ReALM: Reference Resolution As Language Modeling
|
Apr 02, 2024 |
|
arxiv preprint - sDPO: Don’t Use Your Data All at Once
|
Apr 01, 2024 |
|
arxiv preprint - LITA: Language Instructed Temporal-Localization Assistant
|
Mar 29, 2024 |
|
arxiv preprint - AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
|
Mar 28, 2024 |
|
arxiv preprint - InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
|
Mar 27, 2024 |
|
arxiv preprint - Giraffe: Adventures in Expanding Context Lengths in LLMs
|
Mar 26, 2024 |
|
arxiv preprint - Explorative Inbetweening of Time and Space
|
Mar 25, 2024 |
|
arxiv preprint - Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
|
Mar 22, 2024 |
|
arxiv preprint - Evaluating Large Language Models at Evaluating Instruction Following
|
Mar 21, 2024 |
|
arxiv preprint - Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation
|
Mar 20, 2024 |
|
arxiv preprint - Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
|
Mar 19, 2024 |
|
arxiv preprint - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
|
Mar 18, 2024 |
|
arxiv preprint - Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
|
Mar 15, 2024 |
|
arxiv preprint - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
|
Mar 14, 2024 |
|
arxiv preprint - Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
|
Mar 13, 2024 |
|
arxiv preprint - Is Cosine-Similarity of Embeddings Really About Similarity?
|
Mar 12, 2024 |
|
arxiv preprint - A Generative Approach for Wikipedia-Scale Visual Entity Recognition
|
Mar 11, 2024 |
|
arxiv preprint - Self-correcting LLM-controlled Diffusion Models
|
Mar 08, 2024 |
|
arxiv preprint - tinyBenchmarks: evaluating LLMs with fewer examples
|
Mar 08, 2024 |
|
arxiv preprint - Asymmetry in Low-Rank Adapters of Foundation Models
|
Mar 06, 2024 |
|
arxiv preprint - When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
|
Mar 05, 2024 |
|
arxiv preprint - EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
|
Mar 04, 2024 |
|
arxiv preprint - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
|
Mar 01, 2024 |
|
arxiv preprint - Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
|
Feb 29, 2024 |
|
arxiv preprint - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
|
Feb 28, 2024 |
|
arxiv preprint - Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
|
Feb 27, 2024 |
|
arxiv preprint - SciMON: Scientific Inspiration Machines Optimized for Novelty
|
Feb 26, 2024 |
|
arxiv preprint - Speculative Streaming: Fast LLM Inference without Auxiliary Models
|
Feb 23, 2024 |
|
arxiv preprint - LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
|
Feb 22, 2024 |
|
arxiv preprint - UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities
|
Feb 21, 2024 |
|
arxiv preprint - Guiding Instruction-based Image Editing via Multimodal Large Language Models
|
Feb 20, 2024 |
|
arxiv preprint - Spectral State Space Models
|
Feb 16, 2024 |
|
arxiv preprint - More Agents Is All You Need
|
Feb 15, 2024 |
|
arxiv preprint - World Model on Million-Length Video And Language With RingAttention
|
Feb 14, 2024 |
|
arxiv preprint - Learning Video Representations from Large Language Models
|
Feb 13, 2024 |
|
arxiv preprint - Can Large Language Models Understand Context?
|
Feb 12, 2024 |
|
arxiv preprint - Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
|
Feb 09, 2024 |
|
arxiv preprint - System 2 Attention (is something you might need too)
|
Feb 08, 2024 |
|
arxiv preprint - DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
|
Feb 07, 2024 |
|
arxiv preprint - KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
|
Feb 06, 2024 |
|
arxiv preprint - Language Model Inversion
|
Feb 05, 2024 |
|
arxiv preprint - Tree Prompting: Efficient Task Adaptation without Fine-Tuning
|
Feb 02, 2024 |
|
arxiv preprint - Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
|
Feb 01, 2024 |
|
arxiv preprint - Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
|
Jan 31, 2024 |
|
arxiv preprint - RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
|
Jan 30, 2024 |
|
arxiv preprint - SliceGPT: Compress Large Language Models by Deleting Rows and Columns
|
Jan 29, 2024 |
|
arxiv preprint - Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
|
Jan 26, 2024 |
|
arxiv preprint - MambaByte: Token-free Selective State Space Model
|
Jan 25, 2024 |
|
arxiv preprint - Lumiere: A Space-Time Diffusion Model for Video Generation
|
Jan 24, 2024 |
|
arxiv preprint - Self-Rewarding Language Models
|
Jan 23, 2024 |
|
arxiv preprint - Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
|
Jan 22, 2024 |
|
arxiv preprint - MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding
|
Jan 19, 2024 |
|
arxiv preprint - Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
|
Jan 18, 2024 |
|
arxiv preprint - Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
|
Jan 17, 2024 |
|
arxiv preprint - Time Travel in LLMs: Tracing Data Contamination in Large Language Models
|
Jan 16, 2024 |
|
arxiv preprint - InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes
|
Jan 12, 2024 |
|
arxiv preprint - A Simple LLM Framework for Long-Range Video Question-Answering
|
Jan 11, 2024 |
|
arxiv preprint - Mixtral of Experts
|
Jan 09, 2024 |
|
arxiv preprint - Weight subcloning: direct initialization of transformers using larger pretrained ones
|
Jan 08, 2024 |
|
arxiv preprint - Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
|
Jan 05, 2024 |
|
arxiv preprint - LLM in a flash: Efficient Large Language Model Inference with Limited Memory
|
Jan 05, 2024 |
|
arxiv preprint - The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
|
Jan 02, 2024 |
|
arxiv preprint - DreaMoving: A Human Video Generation Framework based on Diffusion Models
|
Dec 29, 2023 |
|
arxiv preprint - Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
|
Dec 28, 2023 |
|
arxiv preprint - UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
|
Dec 28, 2023 |
|
arxiv preprint - LongNet: Scaling Transformers to 1,000,000,000 Tokens
|
Dec 27, 2023 |
|
arxiv preprint - MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
|
Dec 27, 2023 |
|
arxiv preprint - Model-tuning Via Prompts Makes NLP Models Adversarially Robust
|
Dec 26, 2023 |
|
arxiv preprint - Training Chain-of-Thought via Latent-Variable Inference
|
Dec 22, 2023 |
|
arxiv preprint - Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
|
Dec 21, 2023 |
|
arxiv preprint - Instruction-tuning Aligns LLMs to the Human Brain
|
Dec 20, 2023 |
|
arxiv preprint - WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
|
Dec 19, 2023 |
|
arxiv preprint - DemoFusion: Democratising High-Resolution Image Generation With No $$$
|
Dec 18, 2023 |
|
arxiv preprint - Recommender Systems with Generative Retrieval
|
Dec 15, 2023 |
|
arxiv preprint - Mamba: Linear-Time Sequence Modeling with Selective State Spaces
|
Dec 14, 2023 |
|
arxiv preprint - Block-State Transformers
|
Dec 13, 2023 |
|
arxiv preprint - Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns
|
Dec 12, 2023 |
|
arxiv preprint - LooseControl: Lifting ControlNet for Generalized Depth Conditioning
|
Dec 11, 2023 |
|
Announcement: AI Breakdown Youtube Channel
|
Dec 08, 2023 |
|
arxiv preprint - OneLLM: One Framework to Align All Modalities with Language
|
Dec 08, 2023 |
|
arxiv preprint - The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
|
Dec 08, 2023 |
|
arxiv - MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
|
Dec 07, 2023 |
|
arxiv preprint - MLP-Mixer: An all-MLP Architecture for Vision
|
Dec 07, 2023 |
|
arxiv preprint - Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
|
Dec 06, 2023 |
|
arxiv preprint - Nash Learning from Human Feedback
|
Dec 05, 2023 |
|
arxiv preprint - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
|
Dec 04, 2023 |
|
arxiv preprint - Knowledge is a Region in Weight Space for Fine-tuned Language Models
|
Dec 03, 2023 |
|
arxiv preprint - MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
|
Dec 02, 2023 |
|
arxiv preprint - Simplifying Transformer Blocks
|
Dec 01, 2023 |
|
arxiv - Visual In-Context Prompting
|
Nov 30, 2023 |
|
Arxiv Preprint - GAIA: a benchmark for General AI Assistants
|
Nov 29, 2023 |
|
Arxiv Preprint - DisCo: Disentangled Control for Realistic Human Dance Generation
|
Nov 28, 2023 |
|
Arxiv Preprint - Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
|
Nov 27, 2023 |
|
Arxiv Preprint - A General Theoretical Paradigm to Understand Learning from Human Preferences
|
Nov 25, 2023 |
|
Arxiv Preprint - ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
|
Nov 22, 2023 |
|
ArXiv Preprint - S-LoRA: Serving Thousands of Concurrent LoRA Adapters
|
Nov 21, 2023 |
|
ArXiv Preprint - Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
|
Nov 20, 2023 |
|
Arxiv Preprint - LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
|
Nov 17, 2023 |
|
ArXiv Preprint - Fine-tuning Language Models for Factuality
|
Nov 16, 2023 |
|
arxiv preprint - Language Models can be Logical Solvers
|
Nov 15, 2023 |
|
ArXiv Preprint - Prompt Engineering a Prompt Engineer
|
Nov 14, 2023 |
|
arxiv preprint - CogVLM: Visual Expert for Pretrained Language Models
|
Nov 13, 2023 |
|
ArXiv Preprint - De-Diffusion Makes Text a Strong Cross-Modal Interface
|
Nov 10, 2023 |
|
ArXiv Preprint - E3 TTS: Easy End-to-End Diffusion-based Text to Speech
|
Nov 09, 2023 |
|
ArXiv Preprint - Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges
|
Nov 08, 2023 |
|
ArXiv Preprint - Learning From Mistakes Makes LLM Better Reasoner
|
Nov 07, 2023 |
|
ArXiv Preprint - The Generative AI Paradox: ”What It Can Create, It May Not Understand”
|
Nov 06, 2023 |
|
ArXiv Preprint - TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
|
Nov 03, 2023 |
|
ArXiv Preprint - MM-VID: Advancing Video Understanding with GPT-4V(ision)
|
Nov 02, 2023 |
|
ArXiv Preprint - Zephyr: Direct Distillation of LM Alignment
|
Nov 01, 2023 |
|
ArXiv Preprint - ControlLLM: Augment Language Models with Tools by Searching on Graphs
|
Oct 31, 2023 |
|
ArXiv Preprint - Talk like a Graph: Encoding Graphs for Large Language Models
|
Oct 30, 2023 |
|
arxiv Preprint - AgentTuning: Enabling Generalized Agent Abilities for LLMs
|
Oct 29, 2023 |
|
ArXiv Preprint - Jailbreaking Black Box Large Language Models in Twenty Queries
|
Oct 28, 2023 |
|
ArXiv Preprint - Matryoshka Diffusion Models
|
Oct 27, 2023 |
|
arxiv Preprint - An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning
|
Oct 26, 2023 |
|
arxiv Preprint - Retrieval meets Long Context Large Language Models
|
Oct 25, 2023 |
|
arxiv Preprint - Contrastive Prefence Learning: Learning from Human Feedback without RL
|
Oct 24, 2023 |
|
arxiv Preprint - BitNet: Scaling 1-bit Transformers for Large Language Models
|
Oct 23, 2023 |
|
arxiv Preprint - Automatic Prompt Optimization with ”Gradient Descent” and Beam Search
|
Oct 22, 2023 |
|
arxiv Preprint - Understanding Retrieval Augmentation for Long-Form Question Answering
|
Oct 21, 2023 |
|
arxiv Preprint - On the Connection between Pre-training Data Diversity and Fine-tuning Robustness
|
Oct 20, 2023 |
|
arxiv Preprint - Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness
|
Oct 19, 2023 |
|
arxiv Preprint - In-Context Pretraining: Language Modeling Beyond Document Boundaries
|
Oct 18, 2023 |
|
ICCV 2023 - Sigmoid Loss for Language Image Pre-Training
|
Oct 17, 2023 |
|
arxiv Preprint - Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
|
Oct 16, 2023 |
|
arxiv Preprint - HyperAttention: Long-context Attention in Near-Linear Time
|
Oct 15, 2023 |
|
arxiv Preprint - InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
|
Oct 13, 2023 |
|
arxiv Preprint - Large Language Models Cannot Self-Correct Reasoning Yet
|
Oct 12, 2023 |
|
arxiv Preprint - Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
|
Oct 11, 2023 |
|
arxiv Preprint - Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
|
Oct 10, 2023 |
|
arxiv Preprint - Improved Baselines with Visual Instruction Tuning
|
Oct 09, 2023 |
|
arxiv Preprint - Tree of Thoughts: Deliberate Problem Solving with Large Language Models
|
Oct 08, 2023 |
|
Neurips 2023 - Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
|
Oct 07, 2023 |
|
ICCV 2023 - Diffusion Models as Masked Autoencoders
|
Oct 06, 2023 |
|
arxiv Preprint - Conditional Diffusion Distillation
|
Oct 05, 2023 |
|
arxiv Preprint - Enable Language Models to Implicitly Learn Self-Improvement From Data
|
Oct 04, 2023 |
|
arxiv Preprint - Efficient Streaming Language Models with Attention Sinks
|
Oct 03, 2023 |
|
Neurips 2023 - PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving
|
Oct 02, 2023 |
|
arxiv Preprint - Vision Transformers Need Registers
|
Oct 01, 2023 |
|
arxiv Preprint - VPA: Fully Test-Time Visual Prompt Adaptation
|
Sep 30, 2023 |