Qwen3.5-Plus
Qwen3.5-Plus is a high-performance native vision-language model designed for efficient text generation, deep reasoning, and multimodal understanding. Built on a hybrid architecture that combines linear attention with a sparse mixture-of-experts design, it delivers strong performance while optimizing inference efficiency. The model supports text, image, and video inputs and produces text outputs, making it suitable for complex multimodal workflows. With a massive 1 million token context window and up to 64K output tokens, Qwen3.5-Plus enables long-form reasoning and large-scale document analysis. It includes advanced capabilities such as structured outputs, function calling, web search, and tool integration via the Responses API. The model supports prefix continuation, caching, batch processing, and fine-tuning for flexible deployment. Designed for developers and enterprises, Qwen3.5-Plus provides scalable, high-throughput AI performance with OpenAI-compatible API access.
Learn more
Qwen2.5-VL-32B
Qwen2.5-VL-32B is a state-of-the-art AI model designed for multimodal tasks, offering advanced capabilities in both text and image reasoning. It builds upon the earlier Qwen2.5-VL series, improving response quality with more human-like, formatted answers. The model excels in mathematical reasoning, fine-grained image understanding, and complex, multi-step reasoning tasks, such as those found in MathVista and MMMU benchmarks. Its superior performance has been demonstrated in comparison to other models, outperforming the larger Qwen2-VL-72B in certain areas. With improved image parsing and visual logic deduction, Qwen2.5-VL-32B provides a detailed, accurate analysis of images and can generate responses based on complex visual inputs. It has been optimized for both text and image tasks, making it ideal for applications requiring sophisticated reasoning and understanding across different media.
Learn more
Qwen3.7-Plus
Qwen3.7-Plus is a multimodal agent model that unifies vision and language into a single, versatile agent foundation. Building on Qwen3.7’s agentic intelligence, it extends Qwen’s capabilities into visual understanding, visual reasoning, grounded interaction, and multimodal tool use, enabling agents to perceive, analyze, and act across text, images, documents, screens, and complex real-world contexts. It is designed for tasks that require more than static question answering, including visual search, document comprehension, chart and table analysis, screen understanding, GUI interaction, image-grounded reasoning, and agent workflows that combine perception with planning and execution. Qwen3.7-Plus strengthens the connection between language reasoning and visual evidence, allowing users to ask questions about images, interpret dense multimodal inputs, extract structured information, and generate responses that reflect both context and visual details.
Learn more
Aya Vision
Aya Vision is a research model advancing in multilingual multimodal AI through innovative synthetic data generation, cross-modal model merging, and a comprehensive benchmark suite. It achieves state-of-the-art performance across 23 languages, surpassing larger models while efficiently addressing data scarcity and catastrophic forgetting by reducing computational overhead up to 40% via optimized training techniques.
Learn more