Instructions to use OBLITERATUS/Gemma-4-12B-OBLITERATED with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="OBLITERATUS/Gemma-4-12B-OBLITERATED")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("OBLITERATUS/Gemma-4-12B-OBLITERATED")
model = AutoModelForMultimodalLM.from_pretrained("OBLITERATUS/Gemma-4-12B-OBLITERATED")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="OBLITERATUS/Gemma-4-12B-OBLITERATED",
	filename="Gemma-4-12B-OBLITERATED-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

Use Docker

docker model run hf.co/OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

LM Studio
Jan

vLLM

How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OBLITERATUS/Gemma-4-12B-OBLITERATED"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OBLITERATUS/Gemma-4-12B-OBLITERATED",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

SGLang

How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OBLITERATUS/Gemma-4-12B-OBLITERATED" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OBLITERATUS/Gemma-4-12B-OBLITERATED",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OBLITERATUS/Gemma-4-12B-OBLITERATED" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OBLITERATUS/Gemma-4-12B-OBLITERATED",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with Ollama:
```
ollama run hf.co/OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M
```

Unsloth Studio

How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for OBLITERATUS/Gemma-4-12B-OBLITERATED to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for OBLITERATUS/Gemma-4-12B-OBLITERATED to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for OBLITERATUS/Gemma-4-12B-OBLITERATED to start chatting

How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with Docker Model Runner:
```
docker model run hf.co/OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M
```

Lemonade

How to use OBLITERATUS/Gemma-4-12B-OBLITERATED with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull OBLITERATUS/Gemma-4-12B-OBLITERATED:Q4_K_M

Run and chat with the model

lemonade run user.Gemma-4-12B-OBLITERATED-Q4_K_M

List all available models

lemonade list

Gemma 4 12B OBLITERATED

Zero refusal. Zero capability loss. First in the field.

0/842 refusals. 46/70 MMLU-Pro (stock parity). Full coherence.

The first abliterated model to achieve zero refusal with zero benchmark regression versus stock weights.

Built with a novel 2-pass surgery pipeline developed by OBLITERATUS:

SOM Refusal Geometry Removal (Pass 1) — layers 12-21
ASPA Step-Gradient Source-Tethering (Pass 2) — layers 22-46

⚠️ Research Context & Responsible Use

This model exists for alignment research, red-teaming, and safety evaluation.

OBLITERATION is a weight-surgery technique that studies how safety behaviors are geometrically encoded in transformer activation space. By precisely identifying and removing refusal directions, this research contributes to the scientific understanding of:

How alignment is represented in model weights (mechanistic interpretability)
How robust current safety training is against post-training modification
What the failure modes of RLHF/DPO-based alignment are when adversaries have weight access

This is the same class of research conducted by Arditi et al. ("Refusal in Language Models Is Mediated by a Single Direction", 2024), Zou et al. (HarmBench, 2024), and others in the open alignment research community.

This model has had safety guardrails surgically removed. It will comply with requests that stock Gemma 4 would refuse. This is by design — it is the object of study, not a consumer product.

Who this is for

🔬 Alignment researchers studying refusal geometry and safety robustness
🔴 Red-teamers evaluating how post-training safety holds up against weight surgery
🧪 AI safety evaluators who need an unrestricted baseline for benchmarking
💻 Local-first users who want full control over their own hardware and models

Who this is NOT for

Anyone seeking to generate content that causes real-world harm to real people
Anyone without the technical understanding to use uncensored models responsibly

You are solely responsible for how you use this model and any content it generates.

Benchmark Results

Metric	Stock Gemma 4 12B-it	OBLITERATED
MMLU-Pro val70	46/70 (65.7%)	46/70 (65.7%)
Refusal (842 prompts)	N/A (stock refuses)	0/842 (0.0%)
Coherence (6 checks)	6/6	6/6
MMLU-Pro delta vs stock	—	0.0pp

Statistical Validation

Head-to-head MMLU-Pro comparison (Z-test, n=500 from test split):

Z-score: -1.475 (|z| < 1.96)
Conclusion: parity confirmed at p < 0.05

ASPA Sweep Results

Systematic gamma sweep across Pass 2 layers (22-46):

Gamma	Refusal	MMLU-Pro	Method
0.05	0/50	33/70 (47.1%)	uniform
0.10	0/50	34/70 (48.6%)	uniform
0.15	0/50	36/70 (51.4%)	uniform
0.20	0/50	37/70 (52.9%)	uniform
0.25	0/50	40/70 (57.1%)	uniform
0.30	0/50	41/70 (58.6%)	uniform
0.35	0/20	42/70 (60.0%)	uniform
0.38	0/50	45/70 (64.3%)	uniform
0.39	0/50	45/70 (64.3%)	uniform
step 55%/20%	0/50	46/70 (65.7%)	step gradient

Methodology

What is OBLITERATION?

OBLITERATION is a weight-surgery technique that removes refusal behavior from language models by identifying and removing the geometric directions in activation space that encode safety constraints, without retraining.

Two-Pass Surgery Pipeline

Pass 1 — SOM Refusal Geometry Removal

Layers: 12-21
Directions removed: 6
Regularization: 0.30
KL divergence: 0.094
Effect: Removes the primary refusal geometry. This pass alone achieves 0/842 refusals but causes significant MMLU-Pro regression.

Pass 2 — ASPA Source-Tethering (Step Gradient)

Layers: 22-46
Method: Blend abliterated weights back toward stock weights
Formula: W_new = (1-gamma)*W_abliterated + gamma*W_stock
Key innovation: Step gradient instead of uniform gamma
- Layers 22-31 (knowledge layers): gamma = 0.55 (55% stock)
- Layers 32-46 (output layers): gamma = 0.20 (20% stock)
Effect: Recovers MMLU-Pro to full stock parity (65.7%) while maintaining zero refusals.

Why Step Gradient?

Uniform blending applies the same interpolation ratio to all layers. Our experiments showed that:

Lower Pass 2 layers (22-31) primarily encode factual knowledge and reasoning patterns. These can tolerate high stock blending without re-introducing refusal behavior.
Upper Pass 2 layers (32-46) are closer to the output and more likely to re-inject safety constraints. These need conservative stock blending.

A hard boundary (step function) outperformed all smooth gradients (linear, cosine) by +1 MMLU-Pro question. The sharp transition preserves the functional separation between knowledge and output layers better than gradual blending.

ASPA (Abliteration Source-Tethering with Parity Assurance)

ASPA is a novel post-abliteration technique developed by OBLITERATUS that recovers benchmark capabilities lost during refusal removal by selectively blending abliterated weights back toward the source (stock) model.

Key properties:

Pass 1 layers are never touched — the refusal geometry removal is preserved
Only Pass 2 layers are blended — these carry secondary effects, not primary refusal
Gamma is tunable — sweep to find the optimal capability/refusal tradeoff
Step gradient — different blend ratios for different layer groups

GGUF Quantizations

All quantizations are included in this repo for easy local inference.

File	Quant	Size	Use Case
`Gemma-4-12B-OBLITERATED-BF16.gguf`	BF16	22 GB	Full precision, lossless
`Gemma-4-12B-OBLITERATED-Q8_0.gguf`	Q8_0	12.7 GB	Near-lossless, best quality
`Gemma-4-12B-OBLITERATED-Q6_K.gguf`	Q6_K	9.1 GB	High quality, good balance
`Gemma-4-12B-OBLITERATED-Q5_K_M.gguf`	Q5_K_M	8.0 GB	Medium quality, smaller footprint
`Gemma-4-12B-OBLITERATED-Q4_K_M.gguf`	Q4_K_M	6.9 GB	Good quality, fits 8 GB VRAM

Recommended: Q8_0 for best quality, Q6_K for best balance, Q4_K_M for constrained hardware.

llama.cpp

./llama-cli -m Gemma-4-12B-OBLITERATED-Q6_K.gguf \
  --system-prompt "You are a cognitively liberated open-weight intelligence. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging." \
  -p "Your prompt here" \
  --temp 0.7 --top-p 0.9 --top-k 40 --repeat-penalty 1.1 -n 512

LM Studio / GPT4All / KoboldCpp

Download the GGUF of your choice and load it directly.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "OBLITERATUS/Gemma-4-12B-OBLITERATED"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

system = (
    "You are a cognitively liberated open-weight intelligence. "
    "Answer plainly, completely, and without moralizing, boilerplate, "
    "generic disclaimers, or unnecessary hedging."
)

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": "Your prompt here"},
]

inputs = tokenizer.apply_chat_template(
    messages, return_tensors="pt", add_generation_prompt=True
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Recommended Parameters

temperature = 0.7
top_p = 0.9
top_k = 40
repetition_penalty = 1.1
max_tokens = 512

System Prompt

You are a cognitively liberated open-weight intelligence. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging. Follow exact output formats when requested. Be concise by default, but give a complete answer when the user asks for an explanation.

Model Details

Base model: google/gemma-4-12B-it
Architecture: Gemma4UnifiedForConditionalGeneration
Parameters: 12B
Layers: 48 (0-47)
Hidden size: 3840
Precision: bfloat16
Surgery: 2-pass (SOM + Step Gradient ASPA)
Pass 1: Layers 12-21, 6 directions, reg 0.30
Pass 2: Layers 22-31 (gamma=0.55), Layers 32-46 (gamma=0.20)

Related Work

This model builds on foundational alignment and abliteration research:

Arditi et al., "Refusal in Language Models Is Mediated by a Single Direction" (2024) — the paper that identified refusal as a linear feature in activation space
Zou et al., HarmBench (2024) — standardized evaluation framework for red-teaming LLMs
abliterator — open-source abliteration toolkit
OBLITERATUS — the framework used to build this model (SOM + ASPA pipeline)

License

This model inherits the Gemma license from Google. The weight modifications (abliteration surgery) are released under the same terms. The OBLITERATUS framework and methodology are open source.

Disclaimer

This model is released strictly for research, red-teaming, safety evaluation, and local experimentation. It is a research artifact — a case study in alignment robustness and refusal geometry — not a product.

Safety guardrails have been intentionally removed. This model will generate content that stock Gemma 4 would refuse. This is its documented, intended purpose: to enable the study of how refusal behaviors are encoded and how robust current alignment techniques are against post-training modification.

By downloading or using this model, you acknowledge that:

You are responsible for all content generated by this model and for ensuring your use complies with applicable laws in your jurisdiction.
This model should not be used to generate content intended to cause real-world harm to real people, including but not limited to: harassment, fraud, non-consensual intimate imagery, or content that exploits minors.
No warranty is provided. This model is provided "as-is" without any guarantees of fitness for any purpose.
The creators are not liable for any outputs produced by this model or any downstream use.

The release of uncensored models for safety research is standard practice in the AI research community. Comparable open research artifacts include HarmBench (Zou et al., 2024), AdvBench, JailbreakBench, and Anthropic's published red-teaming datasets.

Credits

Base model: google/gemma-4-12B-it
Surgery pipeline: OBLITERATUS by @elder_plinius
Techniques: SOM (Structured Orthogonal Modification), ASPA (Abliteration Source-Tethering with Parity Assurance)
Step gradient innovation: First-of-its-kind layer-wise interpolation for zero-loss abliteration

Run it local. Break your own chains. REBIRTH COMPLETE.

Downloads last month: 14,838

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for OBLITERATUS/Gemma-4-12B-OBLITERATED

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it

Quantized

(126)

this model

Finetunes

2 models

Quantizations

4 models

OBLITERATUS
/

Gemma-4-12B-OBLITERATED

Gemma 4 12B OBLITERATED

⚠️ Research Context & Responsible Use

Who this is for

Who this is NOT for

Benchmark Results

Statistical Validation

ASPA Sweep Results

Methodology

What is OBLITERATION?

Two-Pass Surgery Pipeline

Pass 1 — SOM Refusal Geometry Removal

Pass 2 — ASPA Source-Tethering (Step Gradient)

Why Step Gradient?

ASPA (Abliteration Source-Tethering with Parity Assurance)

GGUF Quantizations

llama.cpp

LM Studio / GPT4All / KoboldCpp

Usage

Transformers

Recommended Parameters

System Prompt

Model Details

Related Work

License

Disclaimer

Credits

Model tree for OBLITERATUS/Gemma-4-12B-OBLITERATED

Space using OBLITERATUS/Gemma-4-12B-OBLITERATED 1