IBM delivers Hadoop
with InfoSphere BigInsights

Primary tab navigation

Get a comprehensive Hadoop distribution

IBM InfoSphere BigInsights is a secure, resilient and high-performance Hadoop distribution that is based on open standards and includes the rich tools that Hadoop users expect. IBM provides value-added features that are carefully implemented to give you the choice as to whether to use IBM enhancements or standard Hadoop functionality.

Deployment options

BigInsights on Cloud

BigInsights on Cloud is a Hadoop-as-a-service offering delivered on IBM’s world-class, global SoftLayer cloud infrastructure. It provides the rich features of InfoSphere BigInsights without the cost, complexity and risk of managing the infrastructure.

InfoSphere BigInsights

InfoSphere BigInsights comes in a Standard edition, for those who want to get started with the basics of Hadoop, and an Enterprise edition, which enables massive scale-out analysis of a wide variety of unconventional information types and formats.

IBM features included in BigInsights

IBM InfoSphere BigInsights extends open-source components with value-added capabilities that customers can choose to take advantage of without compromising on openness or adherence to standards.

Best-in-class SQL-on-Hadoop

Compatibility and performance with Big SQL
IBM Big SQL delivers unmatched simplicity, performance and standards compliance. Unlike other SQL-on-Hadoop implementations, Big SQL works with what you have. It runs against native Hadoop data sources and provides federated access to third-party databases, preserving your investments in tools, applications and expertise.

Easy-to-use tooling for business users

Spreadsheet-style access with BigSheets
IBM BigSheets is a web-based analysis and visualization tool with a familiar, spreadsheet-like interface and rich graphing capabilities. Non-technical users can load, filter, analyze and visualize large datasets from both in and out of Hadoop, boosting productivity and avoiding the need for programming or scripting.

Data exploration with Watson Explorer
When open-source tooling is not enough, BigInsights Enterprise Edition extends the capabilities of Hadoop with IBM Watson Explorer, combining content and data from many systems throughout the enterprise and presenting it to users via a single, intuitive interface.

Built-in advanced analytics—descriptive, predictive, prescriptive

IBM InfoSphere BigInsights Big R
Big R enables data scientists to use the popular R language to explore, visualize, transform and model big data right from within the R environment without the need to program using MapReduce.

Text analytics
A sophisticated text analytics capability unique to BigInsights allows developers to easily build high-quality applications able to process text in multiple written languages, and derive insights from large amounts of native textual data in various formats.

Social Data Analytics
BigInsights provides the capability to ingest and process large volumes of social media data from various sources. A ready-to-use Twitter data feed is included with select configurations of IBM BigInsights on Cloud to help organizations get productive with social data quickly.

Machine Data Analytics
BigInsights provides the capability to ingest and process large volumes of machine data from sources such as system log files, sensor data, GPS devices and more. Data scientists can easily apply advanced machine learning algorithms to collected data seamlessly from within the R language environment.

Accelerators that speed time to value

Application accelerators
Whether the data being analyzed includes text, machine data or social data, pre-written accelerators included in BigInsights help organizations realize value more quickly by leveraging pre-written application components for a variety of common big data use cases.

Rich development tools
Developers can quickly develop and deploy big data applications from within the familiar Eclipse interface. Pre-built wizards and numerous implementation examples help speed development and improve application quality, enabling applications to be deployed to the BigInsights console from within Eclipse.

Performance optimized

Adaptive MapReduce
Adaptive MapReduce is a drop-in replacement for Apache MapReduce that can be optionally enabled in BigInsights Enterprise Edition. It provides high-performance scheduling and flexible management of MapReduce workloads. In an independently audited report, BigInsights with Adaptive MapReduce was found to deliver on average four times the application performance compared to the open-source MapReduce.

Blistering-fast SQL
Big SQL provides both strict SQL language compliance as well as exceptional performance. Users can access native Hadoop sources using their choice of tools, including Hive, or they can choose to use Big SQL on the same datasets leveraging its native high-performance MPP query engine. In an audited result, Big SQL was shown to outperform Hive on a standard set of queries by a factor of five.

Faster Processing of Streaming Data
For clients needing fast and reliable processing of real-time data feeds, a limited-use license of InfoSphere Streams included in BigInsights extends open-source Hadoop delivering faster and more efficient processing of streaming data.

Enterprise-grade management

Management console
A comprehensive web-based interface included in BigInsights simplifies cluster management, service management, job management and file management. Administrators and users can share the same interface, launching applications and viewing a variety of configurable reports and dashboards.

Security built-in
InfoSphere BigInsights was designed with security in mind, supporting Kerberos authentication and providing data privacy, masking and granular access controls with auditing and monitoring functions to ensure that the environments stays secure.

Fault-tolerant POSIX file system
GPFS FPO, included in BigInsights Enterprise Edition, provides an optionally deployable POSIX file system fully compatible with HDFS. GPFS allows both Hadoop and non-Hadoop applications to share the same file system avoiding replicated data, reducing costs, and simplifying workflows that frequently copy data in and out of Hadoop. Users can take advantage of enterprise-grade features like snapshots, off-site block replication and hierarchical storage management enabling infrequently accessed data to be transparently migrated to lower-cost storage tiers.

Seamless data integration

Code-less Integration of data
IBM DataStage for BigInsights Enterprise Edition enables code-less creation of data integration logic and jobs, reusable across the enterprise. Enable data governance including data lineage, business rule and policy management and data quality.

A Unified view of data
Unified view of all data-driven information, including on Hadoop, for a comprehensive, contextually-relevant view powered by Watson Explorer.

Probabilistic matching with Big Match
When integrating data from multiple sources matching like data quickly emerges as a major challenge. Available as an optional add-on to BigInsights, InfoSphere Big Match for Hadoop uses statistical learning algorithms and probabilistic matching to provide fast and efficient linking of data sources for more complete and accurate information.

Complementary products and capabilities

IBM InfoSphere BigInsights is part of a rich portfolio of data management and analysis tools that can help organizations get the most out of their data regardless of its form and where it resides. InfoSphere BigInsights Enterprise Edition includes limited use licenses for Watson Explorer, InfoSphere Streams and Cognos Business Intelligence. IBM also offers a wide range of analytics products, information integration and governance offerings, and other solutions that complement the InfoSphere BigInsights capabilities.

Get started with Hadoop for the Enterprise