KALIMAT a Multipurpose Arabic Corpus

We are pleased to announce the immediate availability of KALIMAT 1.0,

KALIMAT is an Arabic natural language resource that consists of:
1) 20,291 Arabic articles collected from the Omani newspaper Alwatan by (Abbas et al. 2011).
2) 20,291 Extractive Single-document system summaries.
3) 2,057 Extractive Multi-document system summaries.
4) 20,291 Named Entity Recognised articles.
5) 20,291 Part of Speech Tagged articles.
6) 20,291 Morphologically Analyse articles.

The data collection articles fall into six categories:
culture, economy, local-news, international-news, religion, and sports.

The process of creating KALIMAT was applied to the entire data collection (20,291 articles).

Features

  • corpus
  • natural language processing
  • resources
  • Arabic NLP
  • NLP
  • Arabic
  • Morphological Analyser
  • Named Entity Recognition
  • Part of Speech Tagger
  • Summarization
  • Morphosyntactic Analyser

Project Activity

See All Activity >