Audience
Designers, marketers, and content creators who need precise, high-quality AI image generation with strong control over layout, text, and visual composition
About ERNIE-Image
ERNIE-Image is an open text-to-image generation model developed by Baidu, designed to deliver high-quality visuals with strong instruction accuracy and controllability. It is built on a single-stream Diffusion Transformer (DiT) architecture with around 8 billion parameters, allowing it to achieve state-of-the-art performance among open-weight image models while remaining relatively efficient. The model includes a built-in prompt enhancement system that expands simple user inputs into richer, structured descriptions, improving the quality and consistency of generated images. ERNIE-Image is optimized for complex instruction following, enabling accurate rendering of text within images, structured layouts, and multi-element compositions, making it particularly suitable for use cases like posters, comics, and multi-panel designs. It supports multilingual prompts, including English, Chinese, and Japanese, broadening accessibility and usability across regions.
