Aquileo | mario-rc/multi-domain-rm-qwen-3-nemotron-8b-it · Hugging Face

Multi-Domain Reward Model Qwen3-Nemotron-8B-Instruct

This is a multi-domain reward model built from nvidia/Qwen3-Nemotron-8B-BRRM. It combines 23 fine-grained regression objectives across coherence, commonsense, empathy, and multicultural response quality with a prompt-conditioned gating network that produces a single preference score.

The checkpoint was packaged with the custom RewardModelWithGating architecture used in the Multi-Domain Reward Model project.

Project repository: Mario-RC/multi-domain-reward-model.

Intended Use

Use this model to score and compare assistant responses when the evaluation should account for multiple quality dimensions rather than a single generic helpfulness score. The primary use case is reward modeling or offline response ranking for chat-style data.

Training Data

The model uses multi-objective scoring and preference data from:

Evaluation

Preference accuracy by domain:

Domain Accuracy (%)
Coherence 88.4719
Commonsense 98.1936
Empathy 96.3198
Multicultural 87.3070

Hugging Face Models

The packaged multi-domain reward models are available on Hugging Face under the mario-rc namespace:

Usage Example

This checkpoint uses the project's custom RewardModelWithGating class. Run the example from an environment where multidomain_model/modeling_custom.py is importable.

import torch
from transformers import AutoTokenizer
from modeling_custom import RewardModelWithGating

model_id = "mario-rc/multi-domain-rm-qwen-3-nemotron-8b-it"
dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
device_map = {"": 0} if torch.cuda.is_available() else None

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = RewardModelWithGating.from_pretrained(
    model_id,
    device_map=device_map,
    dtype=dtype,
).eval()
device = next(model.parameters()).device

messages = [
    {"role": "user", "content": "I failed an important exam and feel awful."},
    {"role": "assistant", "content": "I'm sorry. That is a hard setback, but it does not define your ability. Take a little time to recover, then we can make a concrete study plan for the next attempt."},
]

encoded = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    padding=True,
    truncation=True,
    max_length=4096,
)
inputs = {"input_ids": encoded.to(device)} if isinstance(encoded, torch.Tensor) else {
    key: value.to(device) for key, value in encoded.items()
}

with torch.no_grad():
    score = model(**inputs).score.float().item()

print(score)

Limitations

This is a reward model, not a standalone chat assistant. Scores are intended for relative comparison and should be calibrated for each downstream use case. The model inherits limitations from its base model and from the annotation coverage of the multi-domain datasets, especially for cultural contexts not represented in the evaluation data.

Credits

This model is based on the ArmoRM/RLHFlow reward-modeling approach and adapts it to custom multi-domain attributes for coherence, commonsense, empathy, and multicultural response quality.

Downloads last month
28
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mario-rc/multi-domain-rm-qwen-3-nemotron-8b-it

Finetuned
Qwen/Qwen3-8B
Finetuned
(1)
this model

Datasets used to train mario-rc/multi-domain-rm-qwen-3-nemotron-8b-it

Collection including mario-rc/multi-domain-rm-qwen-3-nemotron-8b-it