HunyuanVideo-Avatar

Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model

This is an exact mirror of the HunyuanVideo-Avatar project, hosted at https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar. SourceForge is not affiliated with HunyuanVideo-Avatar.

Downloads: 2 This Week

Last Update: 2025-12-16

Get an email when there's a new version of HunyuanVideo-Avatar

Windows Mac Linux BSD ChromeOS

HunyuanVideo-Avatar is a multimodal diffusion transformer (MM-DiT) model by Tencent Hunyuan for animating static avatar images into dynamic, emotion-controllable, and multi-character dialogue videos, conditioned on audio. It addresses challenges of motion realism, identity consistency, and emotional alignment. Innovations include a character image injection module, an Audio Emotion Module for transferring emotion cues, and a Face-Aware Audio Adapter to isolate audio effects on faces, enabling multiple characters to be animated in a scene. Character image injection module for better consistency between training and inference conditioning. Emotion control by extracting emotion reference images and transferring emotional style into video sequences.

Features

Animates avatars (photorealistic, cartoon, rendered, anthropomorphic) across dynamic movement and backgrounds under audio cues
Emotion control by extracting emotion reference images and transferring emotional style into video sequences
Multi-character capability: supports more than one avatar in dialogue scenarios
Character image injection module for better consistency between training and inference conditioning
Face-Aware Audio Adapter (FAA) isolates audio effects through a latent face mask, enabling cross-attention control of multiple characters
High and scalable resource requirements: minimum and recommended GPU memory, supports variable resolutions and frame lengths

Project Samples

Project Activity

See All Activity >

{{ this.obj.activity_extras.summary }}

{{/each}}

Categories

AI Video Generators, AI Models

Follow HunyuanVideo-Avatar

HunyuanVideo-Avatar Web Site

Other Useful Business Software

Hybrid Bare Metal Cloud Infrastructure | Servers.com Icon

Hybrid Bare Metal Cloud Infrastructure | Servers.com

Scale, customize and manage your bare metal servers - all in one place.

Three bare metal hosting solutions on one global network. Spin up on demand to cover peaks, then optimize for cost when usage stabilizes.

Learn More

Rate This Project

Login To Rate This Project

User Reviews

Be the first to post a review of HunyuanVideo-Avatar!

Additional Project Details

Programming Language

Related Categories

Python AI Video Generators, Python AI Models

Registered

2025-09-23

Similar Business Software

HunyuanVideo-Avatar

HunyuanVideo‑Avatar supports animating any input avatar images to high‑dynamic, emotion‑controllable videos using simple audio conditions. It is a multimodal diffusion transformer (MM‑DiT)‑based model capable of generating dynamic, emotion‑controllable, multi‑character dialogue videos. It...

See Software
LTX

Control every aspect of your video using AI, from ideation to final edits, on one holistic platform. We’re pioneering the integration of AI and video production, enabling the transformation of a single idea into a cohesive, AI-generated video. LTX empowers individuals to share their visions,...

See Software
Adobe Firefly

Adobe Firefly is an AI-powered creative platform that enables users to generate and edit images, videos, and other media using simple text prompts. It provides an intuitive workspace where users can create content on an infinite canvas and experiment with different creative ideas. The platform...

See Software
Gemini Enterprise Agent Platform

Gemini Enterprise Agent Platform is a comprehensive solution from Google Cloud designed to help organizations build, scale, govern, and optimize AI agents. It represents the evolution of Vertex AI, combining advanced model development with new capabilities for agent orchestration and...

See Software
Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3.5. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
LM-Kit.NET

LM-Kit.NET is a complete local AI runtime for .NET that lets engineering teams ship AI-powered features without cloud dependencies, per-token costs, or data leaving the network. Most .NET AI integrations stop at inference. LM-Kit.NET covers the full range of capabilities production...

See Software

Report inappropriate content

Hybrid Bare Metal Cloud Infrastructure | Servers.com

Scale, customize and manage your bare metal servers - all in one place.

Three bare metal hosting solutions on one global network. Spin up on demand to cover peaks, then optimize for cost when usage stabilizes.

Learn More

Recommended Projects

openEAR
openEAR is the Munich Open-Source Emotion and Affect Recognition Toolkit developed at the Technische Universität München (TUM). It provides efficient (audio) feature extraction algorithms implemented in C++, classfiers, and pre-trained models on well-known emotion databases. It is now maintained and supported by audEERING. Updates will follow soon.
DramaBox
super expressive prompting model based on ltx2.3
SenseVoice
Multilingual speech recognition and audio understanding model
Amica
Amica is an open source interface for interactive communication
VoxCPM2
Tokenizer-Free TTS for Multilingual Speech Generation