IBM delivers Hadoop
with IBM BigInsights

Primary tab navigation

Get a comprehensive Hadoop distribution

IBM® BigInsights™ for Apache™ Hadoop® is an industry standard Hadoop offering that combines the best of open source software with enterprise-grade capabilities. It helps organizations to cost effectively manage and analyze big data – the volume and variety of data that customers and businesses create and collect every day.

What’s new

IBM® BigInsights™ for Apache™ Hadoop® v4 supports data science teams and business analysts with:


The completely free IBM Open Platform with Apache Hadoop builds the platform for big data projects and provides the most current Apache Hadoop open source content.

Deployment options

IBM BigInsights on Cloud

IBM BigInsights on Cloud is a Hadoop-as-a-service offering delivered on IBM’s world-class, global SoftLayer cloud infrastructure. It provides the rich features of IBM BigInsights for Apache Hadoop without the cost, complexity and risk of managing the infrastructure.

IBM BigInsights for Apache™ Hadoop

IBM BigInsights for Apache Hadoop offers enterprise management and analytic capabilities for big data deployments. IBM BigInsights can accelerate time to value for a wide variety of big data & analytic workloads through innovative features. IBM Open Platform with Apache Hadoop builds the platform for big data projects and provides the most current Apache Hadoop open source components. IBM offers this open source Apache distribution as a free download as well as a supported offering.

Available Offerings

IBM BigInsights Analyst
Provides business user tools for data access and visualization for greater insight

IBM BigInsights Data Scientist
Offers in-Hadoop analytics and predictive modeling for data science teams

IBM BigInsights Enterprise Management
Ensures performance, security and scalability of Hadoop clusters

Enterprise-grade features

IBM BigInsights for Apache Hadoop extends open-source components with value-added capabilities that customers can choose to take advantage of without compromising on openness or adherence to standards.

Best in class SQL-on-Hadoop

Compatibility and performance with Big SQL
IBM Big SQL delivers unmatched simplicity, performance and standards compliance. Unlike other SQL-on-Hadoop implementations, Big SQL works with what you have. It runs against native Hadoop data sources and provides federated access to third-party databases, preserving your investments in tools, applications and expertise.

Easy-to-use tools for business users

Spreadsheet-style access with BigSheets
IBM BigSheets is a web-based analysis and visualization tool with a familiar, spreadsheet-like interface and rich graphing capabilities. Non-technical users can load, filter, analyze and visualize large datasets from both in and out of Hadoop, boosting productivity and avoiding the need for programming or scripting.

Data exploration with Watson Explorer
When open-source tooling is not enough, IBM BigInsights extends the capabilities of Hadoop with IBM Watson Explorer, combining content and data from many systems throughout the enterprise and presenting it to users via a single, intuitive interface.

Built-in advanced analytics—descriptive, predictive, prescriptive

Big R
Big R enables data scientists to run native R functions to explore, visualize, transform and model big data right from within the R environment. Data scientists can now run scalable machine learning algorithms with a wide class of algorithms and growing R-like syntax for new algorithms & customize existing algorithms. Only IBM can use the entire cluster memory, spill to disk and run thousands of models in parallel.

Text analytics
A sophisticated text analytics capability unique to BigInsights allows developers to easily build high-quality applications able to process text in multiple written languages, and derive insights from large amounts of native textual data in various formats.

Social data analytics
BigInsights provides the capability to ingest and process large volumes of social media data from various sources. A ready-to-use Twitter data feed is included with select configurations of IBM BigInsights on Cloud to help organizations get productive with social data quickly.

Machine data analytics
BigInsights provides the capability to ingest and process large volumes of machine data from sources such as system log files, sensor data, GPS devices and more. Data scientists can easily apply advanced machine learning algorithms to collected data seamlessly from within the R language environment.

Accelerators that speed time to value

Application accelerators
Whether the data being analyzed includes text, machine data or social data, pre-written accelerators included in BigInsights help organizations realize value more quickly by leveraging pre-written application components for a variety of common big data use cases.

Rich development tools
Developers can quickly develop and deploy big data applications from within the familiar Eclipse interface. Pre-built wizards and numerous implementation examples help speed development and improve application quality, enabling applications to be deployed to the BigInsights console from within Eclipse.

Performance optimized

Adaptive MapReduce
Adaptive MapReduce is a drop-in replacement for Apache MapReduce that can be optionally enabled in BigInsights Enterprise Edition. It provides high-performance scheduling and flexible management of MapReduce workloads. In an independently audited report, BigInsights with Adaptive MapReduce was found to deliver on average four times the application performance compared to the open-source MapReduce.

Faster processing of streaming data
For clients needing fast and reliable processing of real-time data feeds, a limited-use license of InfoSphere Streams included in BigInsights extends open-source Hadoop delivering faster and more efficient processing of streaming data.

Enterprise-grade management

Management console
A comprehensive web-based interface included in BigInsights simplifies cluster management, service management, job management and file management. Administrators and users can share the same interface, launching applications and viewing a variety of configurable reports and dashboards.

Built-in security
BigInsights was designed with security in mind, supporting Kerberos authentication and providing data privacy, masking and granular access controls with auditing and monitoring functions to ensure that the environments stays secure.

Fault-tolerant POSIX file system
GPFS FPO provides an optionally deployable POSIX file system fully compatible with HDFS. GPFS allows both Hadoop and non-Hadoop applications to share the same file system avoiding replicated data, reducing costs, and simplifying workflows that frequently copy data in and out of Hadoop. Users can take advantage of enterprise-grade features like snapshots, off-site block replication and hierarchical storage management enabling infrequently accessed data to be transparently migrated to lower-cost storage tiers.

Seamless data integration

Code-less integration of data
IBM InfoSphere DataStage enables code-less creation of data integration logic and jobs, reusable across the enterprise. Enable data governance including data lineage, business rule and policy management and data quality.

A unified view of data
Unified view of all data-driven information, including on Hadoop, for a comprehensive, contextually-relevant view powered by Watson Explorer.

Entity matching with Big Match
When integrating data from multiple sources matching like data quickly emerges as a major challenge. Available as an optional add-on to BigInsights, InfoSphere Big Match for Hadoop uses statistical learning algorithms and probabilistic matching to provide fast and efficient linking of data sources for more complete and accurate information.

Complementary products and capabilities

IBM BigInsights is part of a rich portfolio of data management and analysis tools that can help organizations get the most out of their data regardless of its form and where it resides. IBM BigInsights includes limited use licenses for Watson Explorer, InfoSphere Streams and Cognos Business Intelligence. IBM also offers a wide range of analytics products, information integration and governance offerings, and other solutions that complement the IBM BigInsights capabilities.

Get started with Hadoop for the Enterprise