Primary tab navigation
Primary tab navigation

What is Hadoop?


Apache™ Hadoop® is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer.

What is hadoop?
What is hadoop?
Hadoop is defined in 3 minutes with Rafael Coss, manager Big Data Enablement for IBM


High-level architecture


Apache Hadoop has two pillars:

Hadoop is supplemented by an ecosystem of Apache projects, such as Pig, Hive and Zookeeper, that extend the value of Hadoop and improves its usability.


So what’s the big deal?


Hadoop changes the economics and the dynamics of large scale computing. Its impact can be boiled down to four salient characteristics.

Hadoop enables a computing solution that is:


Featured resources


Featured Resources
Big Data Hadoop Solutions, Q1 2014
Read the report to see why IBM InfoSphere BigInsights was named a leader and how it stands in relation to other big data Hadoop vendors.
Get the report

Featured Resources
Hadoop in the cloud
Leverage big data analytics easily and cost-effectively with IBM InfoSphere BigInsights
Get the eBook

Featured Resources
SQL-on-Hadoop without compromise
How Big SQL 3.0 from IBM represents an important leap forward for speed, portability and robust functionality in SQL-on-Hadoop solutions
Get the white paper

Featured Resources
Understanding Big Data
Analytics for Enterprise Class Hadoop and Streaming Data
Download the ebook


Try it now

Contact IBM

Considering a purchase?


IBM Hadoop Business Partners


Advanced Hadoop Tutorials