Instructions to use Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens

SGLang

How to use Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens with Docker Model Runner:
```
docker model run hf.co/Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens
```

Bharat MiniGPT 350M (3.5B tokens Experiment)

Bharat MiniGPT 350M is a custom GPT-style causal language model trained from scratch by Harshvardhan Mishra using modern LLM architecture components such as RoPE, RMSNorm, SwiGLU, and SDPA Attention.

This is not a fine-tuned GPT-2 or LLaMA variant. The architecture and training pipeline were implemented manually in PyTorch and later integrated into the HuggingFace ecosystem.

A better version with more tokens and fine tune version available SOON.

Best suited for:

Knowledge completion
Educational prompts
Article continuation
English prose generation

This is a pretrained foundation model and is not instruction tuned yet.

Explore More:https://iotbyhvm.ooo/bharat-minigpt-350m-a-custom-gpt-style-llm-built-from-scratch-in-india/

Model Details

Model Name: Bharat MiniGPT 350M
Parameters: ~350 Million
Architecture: Decoder-only Transformer
Training Tokens: 3.5 Billion
Framework: PyTorch + Custom Hugging Face Transformers integration
Developer: Harshvardhan Mishra
Organization: HVM Smart Solutions

Architecture

Component	Details
Layers	24 Transformer Blocks
Heads	16 Attention Heads
Embedding Size	1024
Context Length	768 Tokens
Vocabulary Size	50,257
Position Encoding	RoPE (Rotary Position Embedding)
Normalization	RMSNorm
Feed Forward	SwiGLU
Attention	SDPA / Flash Attention Compatible
Weight Tying	Yes
Precision	FP16 Training

Training Data

The model was trained using a weighted mixture of:

Dataset	Weight
HuggingFaceFW/fineweb (sample-10BT)	40%
HuggingFaceFW/fineweb-edu (sample-10BT)	30%
Wikimedia Wikipedia (20231101.en)	30%

Training Setup

Setting	Value
Optimizer	AdamW
Learning Rate	3e-4
Min LR	3e-5
Warmup Steps	51,200
LR Scheduler	Cosine Decay
Gradient Accumulation	128
Mixed Precision	FP16
Gradient Clipping	1.0

Features

Custom GPT architecture
RoPE positional embeddings
RMSNorm normalization
SwiGLU feed-forward layers
Flash Attention compatible SDPA
HuggingFace generate() support
KV-cache compatible
Weight tying support
Gradient checkpointing during training

Benchmark Results

Evaluated using: EleutherAI LM Evaluation Harness

Tasks	Version	Filter	Metric		Value		Stderr
arc_easy	1	none	acc	↑	0.3312	±	0.0097
		none	acc_norm	↑	0.3413	±	0.0097
hellaswag	1	none	acc	↑	0.2650	±	0.0044
		none	acc_norm	↑	0.2636	±	0.0044
piqa	1	none	acc	↑	0.5631	±	0.0116
		none	acc_norm	↑	0.5533	±	0.0116

Notes:

Results are from the current 3B tokens pretrained base checkpoint.
This model is not instruction-tuned yet.
Further tokenizer and training improvements are planned.

Installation

pip install transformers torch

Usage

Load Model from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained( "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens", trust_remote_code=True )

tokenizer = AutoTokenizer.from_pretrained( "Harshhvm/bharat-minigpt-350m-pretrain-3b-tokens", trust_remote_code=True )

Generate Text

import torch

prompt = "India is a land of"

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad(): outputs = model.generate( **inputs,

  max_new_tokens=80,
    
  temperature=0.45,
    
  top_p=0.82,
    
  top_k=40,
    
  repetition_penalty=1.35,
    
  no_repeat_ngram_size=4,
    
  do_sample=True,
    
  use_cache=True,
    
  eos_token_id=tokenizer.eos_token_id,
  pad_token_id=tokenizer.eos_token_id,
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Downloads last month: 1,941

Safetensors

Model size

0.4B params

Tensor type

F32

Harshhvm
/

bharat-minigpt-350m-pretrain-3b-tokens