Apache Hadoop: It is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.
Apache Storm: It is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and the team at BackType, the project was open-sourced after being acquired by Twitter.
Below is a table of differences between Apache Hadoop and Apache Storm:
| Features | Apache Hadoop | Apache Storm |
|---|---|---|
| Processing | Distributed batch processing which uses MapReduce | Distributed real-time data processing which uses DAGs |
| Latency | High Latency i.e slow computation | Low Latency i.e fast computation |
| Written Language | Whole frame work is written in Java | Frame work is written in Clojure and Java |
| Streaming processing | It is State-full streaming processing | It is State-less streaming processing |
| Setup | Easy to setup but operating cluster is hard | Easy to use |
| Data streaming | Data is dynamic and continuously streamed | Data is static and nonvolatile i.e data is persistence |
| Speed | Slow | Fast |
| Use cases | It is used in Twitter, Navisite, Wego etc | It is used in Black Box Data, Search Engine Data etc |
| Architecture | Hadoop comprises HDFS (used for data storage) and MapReduce (used for Computation) as architectural units. | Storm comprises streams, spouts, and bolts as their architectural units. |