IBM delivers Hadoop
with its free IBM Open Platform

Primary tab navigation


Open data platform

Open data platform


Try it

Try it


Enterprise capabilities

Enterprise capabilities

Get a comprehensive Hadoop distribution


IBM Open Platform is an industry standard Hadoop distribution that is built to Open Data Platform (ODP) standards and available for free. Begin your journey into the world of Big Data today.
 

 

Open Data Platform (ODP)

ODP is a shared industry effort focused on promoting and advancing the state of Apache Hadoop for the enterprise. ODP certifies across key components that include:

 
  • Ambari
  • HDFS
  • MapReduce
  • YARN
 
 

The completely free IBM Open Platform (IOP) builds the platform for big data projects and provides the most current Apache Hadoop open source content which includes :

  • Full support for the Open Data Platform
  • Native support for rolling upgrades for Hadoop services
  • Support for long-running applications within YARN for enhanced reliability & security
  • Heterogeneous storage in HDFS for in-memory, SSD in addition to HDD
  • Spark in-memory distributed compute engine for dramatic performance increases over
  • MapReduce and simplifies developer experience, leveraging Java, Python & Scala languages
  • Ambari operational framework for provisioning, managing & monitoring Apache Hadoop clusters
 
Apache Components Version
Ambari V1.7
Flume V1.5.2
Ganglia V3.1.7
Hadoop V2.6.0
HBase V0.98.8
Hive V0.14.0
Knox V0.5.0
Lucene V4.7.0
Nagios V3.5.1
Oozie V4.1.0
Parquet V4.0
Parquet (MR / format) V1.5.0/2.1
Pig V0.14.0
Slider V0.60.0
Soir V4.10.0
Spark V1.2.1
Sqoop V1.4.5
Zookeeper V3.4.6

Try it on Premises

IBM Open Platform with Apache Hadoop

Builds the platform for big data projects and provides the most current Apache Hadoop open source components. IBM offers this open source Apache distribution as a free download as well as a supported offering.

Learn more

 

Try it on Cloud

IBM BigInsights on Cloud

A Hadoop-as-a-service offering delivered on IBM’s world-class, global SoftLayer cloud infrastructure. It provides the rich features of IBM BigInsights for Apache Hadoop without the cost, complexity and risk of managing the infrastructure.

Learn more


Enterprise capabilities


IBM BigInsights for Apache Hadoop extends open-source components with value-added capabilities that customers can choose to take advantage of without compromising on openness or adherence to standards.

Experiment with large data sets and explore different use cases, on your own timeframe with IBM BigInsights Quick Start Edition, which is a free, downloadable, non-production version.

Best in class SQL-on-Hadoop


Compatibility and performance with Big SQL
IBM Big SQL delivers unmatched simplicity, performance and standards compliance. Unlike other SQL-on-Hadoop implementations, Big SQL works with what you have. It runs against native Hadoop data sources and provides federated access to third-party databases, preserving your investments in tools, applications and expertise.

Easy-to-use tools for business users


Spreadsheet-style access with BigSheets
IBM BigSheets is a web-based analysis and visualization tool with a familiar, spreadsheet-like interface and rich graphing capabilities. Non-technical users can load, filter, analyze and visualize large datasets from both in and out of Hadoop, boosting productivity and avoiding the need for programming or scripting.


Data exploration with Watson Explorer
When open-source tooling is not enough, IBM BigInsights extends the capabilities of Hadoop with IBM Watson Explorer, combining content and data from many systems throughout the enterprise and presenting it to users via a single, intuitive interface.

Built-in advanced analytics—descriptive, predictive, prescriptive


Big R
Big R enables data scientists to run native R functions to explore, visualize, transform and model big data right from within the R environment. Data scientists can now run scalable machine learning algorithms with a wide class of algorithms and growing R-like syntax for new algorithms & customize existing algorithms. Only IBM can use the entire cluster memory, spill to disk and run thousands of models in parallel.


Text analytics
A sophisticated text analytics capability unique to BigInsights allows developers to easily build high-quality applications able to process text in multiple written languages, and derive insights from large amounts of native textual data in various formats.

Social data analytics
BigInsights provides the capability to ingest and process large volumes of social media data from various sources. A ready-to-use Twitter data feed is included with select configurations of IBM BigInsights on Cloud to help organizations get productive with social data quickly.

Machine data analytics
BigInsights provides the capability to ingest and process large volumes of machine data from sources such as system log files, sensor data, GPS devices and more. Data scientists can easily apply advanced machine learning algorithms to collected data seamlessly from within the R language environment.

Accelerators that speed time to value


Application accelerators
Whether the data being analyzed includes text, machine data or social data, pre-written accelerators included in BigInsights help organizations realize value more quickly by leveraging pre-written application components for a variety of common big data use cases.


Rich development tools
Developers can quickly develop and deploy big data applications from within the familiar Eclipse interface. Pre-built wizards and numerous implementation examples help speed development and improve application quality, enabling applications to be deployed to the BigInsights console from within Eclipse.

Performance optimized


Adaptive MapReduce
Adaptive MapReduce is a drop-in replacement for Apache MapReduce that can be optionally enabled in BigInsights Enterprise Edition. It provides high-performance scheduling and flexible management of MapReduce workloads. In an independently audited report, BigInsights with Adaptive MapReduce was found to deliver on average four times the application performance compared to the open-source MapReduce.


Faster processing of streaming data
For clients needing fast and reliable processing of real-time data feeds, a limited-use license of InfoSphere Streams included in BigInsights extends open-source Hadoop delivering faster and more efficient processing of streaming data.

Enterprise-grade management


Management console
A comprehensive web-based interface included in BigInsights simplifies cluster management, service management, job management and file management. Administrators and users can share the same interface, launching applications and viewing a variety of configurable reports and dashboards.


Built-in security
BigInsights was designed with security in mind, supporting Kerberos authentication and providing data privacy, masking and granular access controls with auditing and monitoring functions to ensure that the environments stays secure.


Fault-tolerant POSIX file system
GPFS FPO provides an optionally deployable POSIX file system fully compatible with HDFS. GPFS allows both Hadoop and non-Hadoop applications to share the same file system avoiding replicated data, reducing costs, and simplifying workflows that frequently copy data in and out of Hadoop. Users can take advantage of enterprise-grade features like snapshots, off-site block replication and hierarchical storage management enabling infrequently accessed data to be transparently migrated to lower-cost storage tiers.

Seamless data integration


Code-less integration of data
IBM InfoSphere DataStage enables code-less creation of data integration logic and jobs, reusable across the enterprise. Enable data governance including data lineage, business rule and policy management and data quality.


A unified view of data
Unified view of all data-driven information, including on Hadoop, for a comprehensive, contextually-relevant view powered by Watson Explorer.

Entity matching with Big Match
When integrating data from multiple sources matching like data quickly emerges as a major challenge. Available as an optional add-on to BigInsights, InfoSphere Big Match for Hadoop uses statistical learning algorithms and probabilistic matching to provide fast and efficient linking of data sources for more complete and accurate information.



Complementary products and capabilities


IBM BigInsights is part of a rich portfolio of data management and analysis tools that can help organizations get the most out of their data regardless of its form and where it resides. IBM BigInsights includes limited use licenses for Watson Explorer, InfoSphere Streams and Cognos Business Intelligence. IBM also offers a wide range of analytics products, information integration and governance offerings, and other solutions that complement the IBM BigInsights capabilities.

Get started with Hadoop