Skip to content

Draft: PnC with LLM for audio pipeline#2006

Open
sushmitha-deva-09 wants to merge 123 commits into
NVIDIA-NeMo:mainfrom
sushmitha-deva-09:audio_core_3
Open

Draft: PnC with LLM for audio pipeline#2006
sushmitha-deva-09 wants to merge 123 commits into
NVIDIA-NeMo:mainfrom
sushmitha-deva-09:audio_core_3

Conversation

@sushmitha-deva-09

@sushmitha-deva-09 sushmitha-deva-09 commented May 21, 2026

Copy link
Copy Markdown
Contributor

Description

Adds Punctuation and Capitalization (PnC) with LLM inference to the audio tagging pipeline. This enables using vLLM-backed language models (e.g. Qwen/Qwen2.5-1.5B-Instruct) to generate punctuated text from ASR output, with validation and fallback logic.

Key components

  • PNCwithvLLMInferenceStage — Batched GPU inference stage that generates punctuated/capitalised text using vLLM. Supports both segment-level and top-level text processing.
  • CleanLLMOutputStage — Post-processing stage that validates LLM output against original ASR text using CER, flags entries exceeding the threshold with pnc_fallback=True, and optionally updates word-level alignment timestamps.
  • VLLMBase model interface — Shared vLLM engine wrapper with prompt templating, tokenizer management, and GPU lifecycle management.

Usage

 stages:
  - _target_: nemo_curator.stages.audio.tagging.text.pnc.PNCwithvLLMInferenceStage
    name: "PNCwithvLLM"
    model_name: "Qwen/Qwen2.5-1.5B-Instruct"
    text_key: "text"
    generation_field: "text_pnc"
    batch_size: 64
    resources:
      gpus: 1.0

  - _target_: nemo_curator.stages.audio.tagging.text.pnc.CleanLLMOutputStage
    name: "CleanLLMOutput"
    generation_field: "text_pnc"
    asr_pred_text_key: "text"
    cer_threshold: 2.0

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
Comment thread nemo_curator/models/vllm_model.py Outdated
Comment thread nemo_curator/stages/audio/tagging/text/pnc.py Outdated
Comment thread nemo_curator/stages/audio/tagging/text/pnc.py
Signed-off-by: Sushmitha Deva <sdeva@nvidia.com>
@sushmitha-deva-09

Copy link
Copy Markdown
Contributor Author

/ok to test e95efb7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants