Perstem is a Persian (Farsi) stemmer, morphological analyzer, transliterator, and partial part-of-speech tagger. Inflexional morphemes are separated or removed from their stems. Perstem can also tokenize and transliterate between various character set encodings and romanizations.

Features

  • Stems
  • Analyzes Morphology
  • Accepts & Transliterates between UTF-8, Windows-1256, ISIRI-3342, HTML-style Numeric Character References, ArabTeX romanization, and Dehdari transliteration
  • Displays Part-of-Speech Tags for Many Words
  • Tokenizes
  • Handles Irregular Verbs, Semi-Regular Verbs, and Many Broken Plurals
  • Very Fast
  • Small Single File, Requiring no External Data

Project Activity

See All Activity >