tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.

Features

  • Tokenizes text into tidy format (unnest_tokens)
  • Supports sentiment lexicons (e.g. Bing, NRC) and TF-IDF computation
  • Converts tm or quanteda objects into tidy data formats
  • Easy integration with dplyr/ggplot2 for analysis and visualization
  • Functions for n-grams, word co-occurrence, and document-term matrices
  • Compatible with existing tidy data pipelines in R

Project Samples

Project Activity

See All Activity >