Dawarich is a command-line tool (likely Ruby-based) for transforming and analyzing Arabic text data with normalization, diacritic handling, segmentation, and morphological tokenization. Designed for text mining and NLP workflows in Arabic-language contexts.

Features

  • Normalizes Arabic script variants and punctuation
  • Removes or processes diacritics for text standardization
  • Tokenization and segmentation suited to Arabic morphology
  • Supports stop word removal and light stemming
  • Command‑line interface for batch NLP preprocessing
  • Output formats compatibility: plain text, CSV/JSON

Project Samples

Project Activity

See All Activity >