Text Generation Inference

Large Language Model Text Generation Inference

This is an exact mirror of the Text Generation Inference project, hosted at https://github.com/huggingface/text-generation-inference. SourceForge is not affiliated with Text Generation Inference.

Add a Review

Downloads: 8 This Week

Last Update: 2025-12-18

Download

Get an email when there's a new version of Text Generation Inference

Linux Mac Windows

Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.

Features

Optimized for serving large language models (LLMs)
Supports batching and parallelism for high throughput
Quantization support for improved performance
API-based deployment for easy integration
GPU acceleration and multi-node scaling
Built-in token streaming for real-time responses

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Text Generation Inference

Text Generation Inference Web Site

Other Useful Business Software

One App to Replace Your Entire SaaS Stack

Projects, docs, chat, and AI in a single workspace.

ClickUp consolidates project management, documents, whiteboards, time tracking, and team chat into one platform. Cut software costs and eliminate the context-switching tax across disconnected tools.

Get Started Free

Rate This Project

User Reviews

Be the first to post a review of Text Generation Inference!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Natural Language Processing (NLP) Tool, Python LLM Inference Tool

Registered

2025-01-21

Similar Business Software

LM-Kit.NET

LM-Kit.NET is a complete local AI runtime for .NET that lets engineering teams ship AI-powered features without cloud dependencies, per-token costs, or data leaving the network. Most .NET AI integrations stop at inference. LM-Kit.NET covers the full range of capabilities production...

See Software
Gemini Enterprise Agent Platform

Gemini Enterprise Agent Platform is a comprehensive solution from Google Cloud designed to help organizations build, scale, govern, and optimize AI agents. It represents the evolution of Vertex AI, combining advanced model development with new capabilities for agent orchestration and...

See Software
Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3.5. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
RunPod

RunPod offers a cloud-based platform designed for running AI workloads, focusing on providing scalable, on-demand GPU resources to accelerate machine learning (ML) model training and inference. With its diverse selection of powerful GPUs like the NVIDIA A100, RTX 3090, and H100, RunPod supports...

See Software
FriendliAI

FriendliAI is a generative AI infrastructure platform that offers fast, efficient, and reliable inference solutions for production environments. It provides a suite of tools and services designed to optimize the deployment and serving of large language models (LLMs) and other generative AI...

See Software
Baseten

Baseten is a high-performance platform designed for mission-critical AI inference workloads. It supports serving open-source, custom, and fine-tuned AI models on infrastructure built specifically for production scale. Users can deploy models on Baseten’s cloud, their own cloud, or in a hybrid...

See Software

Report inappropriate content

One App to Replace Your Entire SaaS Stack

Projects, docs, chat, and AI in a single workspace.

ClickUp consolidates project management, documents, whiteboards, time tracking, and team chat into one platform. Cut software costs and eliminate the context-switching tax across disconnected tools.

Get Started Free

Recommended Projects

SageMaker Hugging Face Inference Toolkit
Library for serving Transformers models on Amazon SageMaker
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis
Text Embeddings Inference
High-performance inference server for text embeddings models API layer
NNCF
Neural Network Compression Framework for enhanced OpenVINO
vLLM
A high-throughput and memory-efficient inference and serving engine