Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.
| Episode | Date |
|---|---|
|
Computer Vision - Thinking with Video Video Generation as a Promising Multimodal Reasoning Paradigm
|
Nov 08, 2025 |
|
Speech & Sound - PromptSep Generative Audio Separation via Multimodal Prompting
|
Nov 08, 2025 |
|
Machine Learning - Optimal Inference Schedules for Masked Diffusion Models
|
Nov 08, 2025 |
|
Computation and Language - Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning
|
Nov 08, 2025 |
|
Computer Vision - InfinityStar Unified Spacetime AutoRegressive Modeling for Visual Generation
|
Nov 08, 2025 |
|
Computer Vision - Landslide Hazard Mapping with Geospatial Foundation Models Geographical Generalizability, Data Scarcity, and Band Adaptability
|
Nov 07, 2025 |
|
Artificial Intelligence - Beyond Shortest Path Agentic Vehicular Routing with Semantic Context
|
Nov 07, 2025 |
|
Artificial Intelligence - Promoting Sustainable Web Agents Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis
|
Nov 07, 2025 |
|
Software Engineering - EDIT-Bench Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
|
Nov 07, 2025 |
|
Artificial Intelligence - GUI-360 A Comprehensive Dataset and Benchmark for Computer-Using Agents
|
Nov 07, 2025 |
|
Computer Vision - Tracking and Understanding Object Transformations
|
Nov 07, 2025 |
|
Computation and Language - Efficient Reasoning via Thought-Training and Thought-Free Inference
|
Nov 06, 2025 |
|
Software Engineering - RefAgent A Multi-agent LLM-based Framework for Automatic Software Refactoring
|
Nov 06, 2025 |
|
Computation and Language - IndicSuperTokenizer An Optimized Tokenizer for Indic Multilingual LLMs
|
Nov 06, 2025 |
|
Machine Learning - GMoPEA Prompt-Expert Mixture Framework for Graph Foundation Models
|
Nov 06, 2025 |
|
Software Engineering - The OpenHands Software Agent SDK A Composable and Extensible Foundation for Production Agents
|
Nov 06, 2025 |
|
Computation and Language - A systematic review of relation extraction task since the emergence of Transformers
|
Nov 06, 2025 |
|
Machine Learning - AnaFlow Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing
|
Nov 06, 2025 |
|
Emerging Technologies - LLM-enhanced Air Quality Monitoring Interface via Model Context Protocol
|
Nov 06, 2025 |
|
Software Engineering - Stitch Step-by-step LLM Guided Tutoring for Scratch
|
Nov 01, 2025 |
|
Computer Vision - All You Need for Object Detection From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles
|
Nov 01, 2025 |
|
Artificial Intelligence - The Era of Agentic Organization Learning to Organize with Language Models
|
Nov 01, 2025 |
|
Computer Vision - Process Integrated Computer Vision for Real-Time Failure Prediction in Steel Rolling Mill
|
Nov 01, 2025 |
|
Artificial Intelligence - Unveiling Intrinsic Text Bias in Multimodal Large Language Models through Attention Key-Space Analysis
|
Nov 01, 2025 |
|
Artificial Intelligence - Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models
|
Nov 01, 2025 |
|
Computation and Language - Kimi Linear An Expressive, Efficient Attention Architecture
|
Nov 01, 2025 |
|
Software Engineering - Using Copilot Agent Mode to Automate Library Migration A Quantitative Assessment
|
Nov 01, 2025 |
|
Artificial Intelligence - Delegated Authorization for Agents Constrained to Semantic Task-to-Scope Matching
|
Nov 01, 2025 |
|
Image and Video Processing - ProstNFound+ A Prospective Study using Medical Foundation Models for Prostate Cancer Detection
|
Nov 01, 2025 |
|
Computation and Language - Value Drifts Tracing Value Alignment During LLM Post-Training
|
Nov 01, 2025 |
|
Machine Learning - LSM-MS2 A Foundation Model Bridging Spectral Identification and Biological Interpretation
|
Oct 31, 2025 |
|
Artificial Intelligence - LLMs Process Lists With General Filter Heads
|
Oct 31, 2025 |
|
Machine Learning - Learning Pseudorandom Numbers with Transformers Permuted Congruential Generators, Curricula, and Interpretability
|
Oct 31, 2025 |
|
Computer Vision - ChartAB A Benchmark for Chart Grounding & Dense Alignment
|
Oct 31, 2025 |
|
Computer Vision - Masked Diffusion Captioning for Visual Feature Learning
|
Oct 31, 2025 |
|
Computer Vision - Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
|
Oct 31, 2025 |
|
Multiagent Systems - A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation
|
Oct 31, 2025 |
|
Artificial Intelligence - The Oversight Game Learning to Cooperatively Balance an AI Agent’s Safety and Autonomy
|
Oct 31, 2025 |
|
Computer Vision - SteerVLM Robust Model Control through Lightweight Activation Steering for Vision Language Models
|
Oct 31, 2025 |
|
Software Engineering - Investigating Software Aging in LLM-Generated Software Systems
|
Oct 30, 2025 |
|
Networking - Enabling Near-realtime Remote Sensing via Satellite-Ground Collaboration of Large Vision-Language Models
|
Oct 30, 2025 |
|
Computation and Language - Evaluating LLMs on Generating Age-Appropriate Child-Like Conversations
|
Oct 30, 2025 |
|
Computation and Language - Can LLMs Translate Human Instructions into a Reinforcement Learning Agent’s Internal Emergent Symbolic Representation?
|
Oct 30, 2025 |
|
Computer Vision - ViPER Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
|
Oct 30, 2025 |
|
Artificial Intelligence - Discovering Heuristics with Large Language Models (LLMs) for Mixed-Integer Programs Single-Machine Scheduling
|
Oct 30, 2025 |
|
Information Retrieval - Retrieval-Augmented Search for Large-Scale Map Collections with ColPali
|
Oct 30, 2025 |
|
Computer Vision - MIC-BEV Multi-Infrastructure Camera Bird’s-Eye-View Transformer with Relation-Aware Fusion for 3D Object Detection
|
Oct 29, 2025 |
|
Speech & Sound - STAR-Bench Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
|
Oct 29, 2025 |
|
Computation and Language - Repurposing Synthetic Data for Fine-grained Search Agent Supervision
|
Oct 29, 2025 |
|
Computation and Language - AgentFrontier Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
|
Oct 29, 2025 |
|
Computation and Language - AgentFold Long-Horizon Web Agents with Proactive Context Management
|
Oct 29, 2025 |
|
Machine Learning - Greedy Sampling Is Provably Efficient for RLHF
|
Oct 29, 2025 |
|
Computation and Language - ParallelMuse Agentic Parallel Thinking for Deep Information Seeking
|
Oct 29, 2025 |
|
Computation and Language - Agent Data Protocol Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
|
Oct 29, 2025 |
|
Computation and Language - ComboBench Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?
|
Oct 29, 2025 |
|
Computer Vision - Routing Matters in MoE Scaling Diffusion Transformers with Explicit Routing Guidance
|
Oct 29, 2025 |
|
Computer Vision - Uniform Discrete Diffusion with Metric Path for Video Generation
|
Oct 29, 2025 |
|
Machine Learning - SEMPO Lightweight Foundation Models for Time Series Forecasting
|
Oct 23, 2025 |
|
Artificial Intelligence - Memo Training Memory-Efficient Embodied Agents with Reinforcement Learning
|
Oct 23, 2025 |
|
Computation and Language - Zhyper Factorized Hypernetworks for Conditioned LLM Fine-Tuning
|
Oct 23, 2025 |
|
Artificial Intelligence - Misalignment Bounty Crowdsourcing AI Agent Misbehavior
|
Oct 23, 2025 |
|
Machine Learning - When Do Transformers Learn Heuristics for Graph Connectivity?
|
Oct 23, 2025 |
|
Software Engineering - Review of Tools for Zero-Code LLM Based Application Development
|
Oct 23, 2025 |
|
Machine Learning - A Survey on Cache Methods in Diffusion Models Toward Efficient Multi-Modal Generation
|
Oct 23, 2025 |
|
Computation and Language - SmartSwitch Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration
|
Oct 23, 2025 |
|
Artificial Intelligence - Beyond Reactivity Measuring Proactive Problem Solving in LLM Agents
|
Oct 23, 2025 |
|
Computer Vision - OmniMotion-X Versatile Multimodal Whole-Body Motion Generation
|
Oct 23, 2025 |
|
Computation and Language - ToolDreamer Instilling LLM Reasoning Into Tool Retrievers
|
Oct 23, 2025 |
|
Computers and Society - Integrating Transparent Models, LLMs, and Practitioner-in-the-Loop A Case of Nonprofit Program Evaluation
|
Oct 23, 2025 |
|
Computer Vision - MedReason-R1 Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom
|
Oct 23, 2025 |
|
Computer Vision - Class-Aware Prototype Learning with Negative Contrast for Test-Time Adaptation of Vision-Language Models
|
Oct 23, 2025 |
|
Computation and Language - Scaf-GRPO Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
|
Oct 23, 2025 |
|
Computation and Language - Hubble a Model Suite to Advance the Study of LLM Memorization
|
Oct 23, 2025 |
|
Machine Learning - Transformers are almost optimal metalearners for linear classification
|
Oct 23, 2025 |
|
Robotics - Learning Affordances at Inference-Time for Vision-Language-Action Models
|
Oct 23, 2025 |
|
Artificial Intelligence - VAR Visual Attention Reasoning via Structured Search and Backtracking
|
Oct 22, 2025 |
|
Machine Learning - When LRP Diverges from Leave-One-Out in Transformers
|
Oct 22, 2025 |
|
Machine Learning - Online SFT for LLM Reasoning Surprising Effectiveness of Self-Tuning without Rewards
|
Oct 22, 2025 |
|
Computation and Language - Fine-Tuned Thoughts Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring
|
Oct 22, 2025 |
|
Computer Vision - Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework
|
Oct 22, 2025 |
|
Computation and Language - MTraining Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
|
Oct 22, 2025 |
|
Software Engineering - EffiReasonTrans RL-Optimized Reasoning for Code Translation
|
Oct 22, 2025 |
|
Computation and Language - How Do LLMs Use Their Depth?
|
Oct 22, 2025 |
|
Computer Vision - Grasp Any Region Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
|
Oct 22, 2025 |
|
Speech & Sound - WaveNet A Generative Model for Raw Audio
|
Oct 21, 2025 |
|
Speech & Sound - A Generative Model for Raw Audio Using Transformer Architectures
|
Oct 21, 2025 |
|
Galactic Astrophysics - Effective cosmic ray diffusion in multiphase galactic environments
|
Oct 21, 2025 |
|
Computation and Language - EgMM-Corpus A Multimodal Vision-Language Dataset for Egyptian Culture
|
Oct 21, 2025 |
|
Computer Vision - Embody 3D A Large-scale Multimodal Motion and Behavior Dataset
|
Oct 21, 2025 |
|
Computation and Language - MoReBench Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
|
Oct 21, 2025 |
|
Software Engineering - SemOpt LLM-Driven Code Optimization via Rule-Based Analysis
|
Oct 21, 2025 |
|
Computer Vision - SSL4RL Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning
|
Oct 21, 2025 |
|
Computation and Language - Multimodal Latent Language Modeling with Next-Token Diffusion
|
Oct 21, 2025 |
|
Computation and Language - EmbeddingGemma Powerful and Lightweight Text Representations
|
Oct 21, 2025 |
|
Computation and Language - AgenTracer Who Is Inducing Failure in the LLM Agentic Systems?
|
Oct 21, 2025 |
|
Cryptography and Security - VERA-V Variational Inference Framework for Jailbreaking Vision-Language Models
|
Oct 21, 2025 |
|
Computation and Language - REFRAG Rethinking RAG based Decoding
|
Oct 21, 2025 |
|
Computation and Language - Evaluating Medical LLMs by Levels of Autonomy A Survey Moving from Benchmarks to Applications
|
Oct 21, 2025 |
|
Artificial Intelligence - Seeing but Not Believing Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
|
Oct 21, 2025 |
|
Computer Vision - Towards Explainable Skin Cancer Classification A Dual-Network Attention Model with Lesion Segmentation and Clinical Metadata Fusion
|
Oct 21, 2025 |