Trending Research

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

tencent-hunyuan/hunyuanvideo-avatar ? ? 26 May 2025

This ensures the dynamic motion and strong character consistency; (ii) An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video, enabling fine-grained and accurate emotion style control; (iii) A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask, enabling independent audio injection via cross-attention for multi-character scenarios.

Human Animation

945

3.02 stars / hour

Paper
Code

Emerging Properties in Unified Multimodal Pretraining

ByteDance-Seed/Bagel ? ? 20 May 2025

Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems.

Image Manipulation multimodal generation +1

3,812

2.62 stars / hour

Paper
Code

AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

codelion/openevolve ? 30 Mar 2021

In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes.

AutoML Stock Prediction

2,233

2.39 stars / hour

Paper
Code

RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

microsoft/renderformer ? ? 28 May 2025

We present RenderFormer, a neural rendering pipeline that directly renders an image from a triangle-based representation of a scene with full global illumination effects and that does not require per-scene training or fine-tuning.

Neural Rendering

412

2.18 stars / hour

Paper
Code

WebDancer: Towards Autonomous Information Seeking Agency

alibaba-nlp/webwalker ? 28 May 2025

We instantiate this framework in a web agent based on the ReAct, WebDancer.

895

1.99 stars / hour

Paper
Code

syftr: Pareto-Optimal Generative AI

datarobot/syftr ? ? 26 May 2025

Retrieval-Augmented Generation (RAG) pipelines are central to applying large language models (LLMs) to proprietary or dynamic data.

Bayesian Optimization RAG +1

251

1.72 stars / hour

Paper
Code

Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

bytedance/dolphin ? ? 20 May 2025

Document image parsing is challenging due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables.

1,236

1.63 stars / hour

Paper
Code

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

charlesq9/alita ? 26 May 2025

For Maximal self-evolution, we enable the creativity of Alita by providing a suite of general-purpose components to autonomously construct, refine, and reuse external capabilities by generating task-related model context protocols (MCPs) from open source, which contributes to scalable agentic reasoning.

381

1.46 stars / hour

Paper
Code

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

index-tts/index-tts ? ? 8 Feb 2025

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities. Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model.

Decoder Language Modeling +6

2,420

1.34 stars / hour

Paper
Code

ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation

chartgalaxy/chartgalaxy ? 24 May 2025

We showcase the utility of this dataset through: 1) improving infographic chart understanding via fine-tuning, 2) benchmarking code generation for infographic charts, and 3) enabling example-based infographic chart generation.

Benchmarking Chart Understanding +2

154

1.26 stars / hour

Paper
Code