Text Generation
Transformers
Safetensors
PyTorch
llama
facebook
meta
llama-3
Eval Results
text-generation-inference
Instructions to use meta-llama/Llama-3.1-405B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use meta-llama/Llama-3.1-405B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="meta-llama/Llama-3.1-405B")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-405B") model = AutoModelForMultimodalLM.from_pretrained("meta-llama/Llama-3.1-405B") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use meta-llama/Llama-3.1-405B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "meta-llama/Llama-3.1-405B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.1-405B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/meta-llama/Llama-3.1-405B
- SGLang
How to use meta-llama/Llama-3.1-405B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "meta-llama/Llama-3.1-405B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.1-405B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "meta-llama/Llama-3.1-405B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.1-405B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use meta-llama/Llama-3.1-405B with Docker Model Runner:
docker model run hf.co/meta-llama/Llama-3.1-405B
Access request FAQ
pinned#16 opened almost 2 years ago
by
samuelselvan
Add EvalEval community eval results
#38 opened 9 days ago
by
EvalEvalBot
Add GSM8K eval result (96.8, 8-shot CoT)
#37 opened 3 months ago
by
julien-c
fix: set `clean_up_tokenization_spaces` to `false`
#36 opened 3 months ago
by
maxsloef
Requested access is pending
#35 opened 6 months ago
by
aditya143c
`tokenizer.model` in `original/mp8` is trunctaed
#34 opened 11 months ago
by
emozilla
set "pad_token" to "<|finetune_right_pad_id|>"
#33 opened over 1 year ago
by
wukaixingxp
Update README.md
1
#32 opened over 1 year ago
by
modify999
Access denied in Canada
#31 opened over 1 year ago
by
mandiwise
Update README.md
#30 opened over 1 year ago
by
GordyFsks
Vas
#29 opened over 1 year ago
by
vasiliosk2008
how can i run finetuning of this model on a document?
#28 opened over 1 year ago
by
HG0001
Request: DOI
#27 opened almost 2 years ago
by
alegg192
Llama 3.1 models continuously unavailable
15
#26 opened almost 2 years ago
by
HugoMartin
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
#25 opened almost 2 years ago
by
Rogersx
OSError: Consistency check failed: file should be of size 4781917964 but has size 1295450257 (consolidated-00020-of-00022.pth).
1
#24 opened almost 2 years ago
by
superbigtree
OSError: You are trying to access a gated repo.
#23 opened almost 2 years ago
by
Anjanaashok
Update original/mp16/params.json
#22 opened almost 2 years ago
by
razhan
Size
👍 1
1
#20 opened almost 2 years ago
by
imgoury
What is the size of the whole repo?
5
#18 opened almost 2 years ago
by
JaaackXD
why is my request rejected?
1
#17 opened almost 2 years ago
by
zhentaocc
Do you banned China region users from your repo?
4
#15 opened almost 2 years ago
by
LronDC
How to use the ASR on LLama3.1
#14 opened almost 2 years ago
by
andrygasy
Can someone reproduce the accuracy of Llama 3.1 models?
2
#11 opened almost 2 years ago
by
damoict
llama3.1 license restrictions
#10 opened almost 2 years ago
by
Araki
405B or 410B ?
2
#8 opened almost 2 years ago
by
alielfilali01