There is data mining–and then there is predictive data mining, also called predictive analytics. Businesses use data mining to paint a picture of what has happened in the past and what is happening now. They use predictive analytics to go a step further and make reliable forecasts of what is likely to happen in the future–and act accordingly.
IBM® SPSS® Modeler, a predictive intelligence workbench, gives them a powerful and comprehensive set of tools to evaluate trends and their likely outcomes. SPSS Modeler delivers results for almost any user, from expert data miners to business and data analysts. It does not require programming skills or a knowledge of advanced statistics, and is widely used in business areas such as customer relations management (CRM), inventory management and resource planning, employee and customer acquisition and retention, asset management and many other predictive modelling applications.
SPSS Modeler is available in different editions, each of which provides enhanced features and functionality. View Editions Comparison for an overview of the functionality of each edition.
SPSS Modeler - Features and benefits
Data understanding
- A wide range of interactive graphs. Regions or elements of a graph can be selected and viewed, or selected for analysis, while visual link analysis reveals associations in the data.
- IBM SPSS Statistics graphs and reporting tools can be accessed directly from the interface.
Data preparation
- Data sources include Cognos Business Intelligence, IBM DB2, Oracle, Microsoft SQL Server, Informix®, Neoview, Netezza, mySQL (Sun) and Teradata, as well as mainframe data through zDB2 and IBM Classic Federation Server support. Also delimited and fixed-width text files, SPSS Statistics files, SAS, IBM SPSS Data Collection data sources or XML.
- A range of data-cleaning options are available to remove or replace invalid data, automatically impute missing values, and mitigate for outliers.
- Automatic data preparation methods interrogate and condition data for analysis.
- Data preparation and conditioning methods include field filtering, naming, derivation, binning, re-categorization, value replacement and field re-ordering; record selection, sampling, merging and concatenation; sorting, aggregation and balancing; and data restructuring, partitioning and transposition.
- String functions include string creation, substitution, search and matching; whitespace removal and truncation.
- Data management and transformations performed in SPSS Statistics can be accessed directly the interface.
Data modeling and evaluation
- Advanced data mining algorithms deliver the best results from the data.
- Interactive model and equation browsers and an advanced statistical output viewer.
- Variable importance graphs show the relative impact of data attributes on predicted outcomes.
- Several models can be combined, or one model used to analyze another.
- Automatic (binary and numeric) classification and clustering can be used in place of individual algorithms.
- Custom algorithms can be integrated with the Component-Level Extension Framework.
- Integration of SPSS Statistics allows the use of R to extend analysis.
- SPSS Modeler includes a dedicated RFM scoring algorithm to provide Recency, Frequency, and Monetary value scores.
Modeling algorithms
- Decision tree algorithms, including interactive tree building (C&RT, C5.0, CHAID & QUEST).
- An interactive rule-building algorithm (Decision List).
- Clustering and segmentation algorithms (K-Means, Kohonen, Two Step, Discriminant, Support Vector Machine).
- Data reduction algorithms (Factor/PCA, Feature Selection).
- Linear equation modeling (Regression, Linear, GenLin).
- Bayesian model with incremental learning (self-learning response model).
- Time-series forecasting models.
- Neural Networks (multi-layer perceptrons with back-propagation learning, and radial basis function networks).
- An advanced algorithm for wide datasets (Support Vector Machine).
- Graphical probabilistic models (Bayesian networks).
- Calculate the likely time to an event (Cox regression).
- Cluster-based algorithm for detecting unusual results (anomaly detection).
- Nearest neighbor modeling and scoring algorithm (KNN).
- Association discovery algorithm with advanced evaluation functions (Apriori).
- Association algorithm that supports multiple consequents (CARMA).
- Sequential association algorithm for order-sensitive analyses (Sequence).
Deployment
- Model export to SQL or PMML (the XML-based standard format for predictive models).
- Data export to delimited text files, Excel, SPSS Statistics, SAS, Cognos Business Intelligence packages, and operational databases.
- Enhanced deployment with IBM SPSS Collaboration and Deployment Services for innovative analytics management, process automation and deployment capabilities.
In addition to the core features and benefits, SPSS Modeler Premium provides advanced text analytics functionality.
Text mining and analysis–understanding and preparation
- Text extraction from files, operational databases and RSS feeds (for example, blogs, web feeds).
- Native language extractor options for Dutch, English, French, German, Italian, Portuguese, Spanish or Japanese, or translate almost any language using third-party translation software.
- Domain-specific extraction of concepts such as uniterms, expressions, abbreviations, acronyms and other terms.
- Sophisticated linguistic algorithms and embedded or user-specified linguistic resources for calculating synonyms.
- Concept naming by person, organization, term, product, location and other user- defined types.
- Non-linguistic extraction of addresses, currencies, times, phone numbers and social security numbers.
- Pre-built and customizable templates and libraries for sentiment analysis, CRM, security and intelligence, market intelligence, life sciences and IT.
- Pre-packaged Text Analytics Packages (TAPs) are provided for the most common business applications, or custom packages can be created.
- Concept clustering algorithms create clusters based on term co-occurrence, providing an at-a-glance view of main topics and their relationships.
- Text classification algorithms intelligently group text documents and records based on content.
- Advanced concept selection and deselection for use in predictive modeling.
- Text-based and visual reports to interrogate concept relationship, occurrence, frequency and type.
Text mining
- Text link analysis identifies links and associations between, for example, people and events or diseases and genes.
- Sentiment identification and extraction (for example, likes and dislikes) from text in Dutch, English, French, German and Spanish.
- Content from URLs within blogs can be identified and extracted.
- Opinions, semantic relationships and linked events in deployable predictive models.
- Interactive graphs reveal complex relationships by showing the semantic links between concepts.
IBM SPSS Modeler Professional Server and Modeler Premium Server editions are high-performance data mining tools for mining data more efficiently and effectively when the data comprises millions of records. Each edition offers all the features of the desktop data mining software, plus specialized capabilities that deliver faster performance, more efficient administration and greater security in enterprise environments.
- In-database algorithms for InfoSphere Warehouse (Logistic Regression, Naïve Bayes, Time- Series and Radial Basis Function).
- In-database algorithms for Netezza (Netezza Decision Tree and Netezza K-Means Clustering).
- Extended InfoSphere / DB2 integration, including access to the full range of InfoSphere data mining algorithms, DB2 compression and partitioning.
- Integration into IBM Smart Analytics Systems, an integrated platform that provides broad analytics capabilities on a powerful warehouse foundation with IBM server and storage.
- zLinux system support enables enterprise customers to quickly and cost effectively deploy advanced analytics on System z machines running a Linux operating system.
- Microsoft SQL Server in-database mining algorithms (Decision Tree, Association Rules, Linear Regression, Clustering, Sequence Clustering, Naïve Bayes, Time-Series and Neural Network ).
- Oracle in-database mining algorithms (Decision Tree, General Linear Model, Orthogonal Partitioning Clustering, KMeans, Apriori, Minimum Description Length, Support Vector Machine, Naïve Bayes, Adaptive Bayes, Non-Negative Matrix Factorization and Artificial Intelligence).
- SQL-pushback to push data transformations and select modeling algorithms directly into operational databases .
- Parallel execution of streams and multiple models .
- Secure sockets layer encryption for transmitting sensitive data securely between SPSS Modeler Professional desktop and server editions.

