9026

Features of Salford Predictive Modeler

General Features

Feature V 8.3 V 8.2 V 8.0
Modeling Engine: CART (Decision Trees) yes yes yes
Modeling Engine: MARS (Nonlinear Regression) yes yes yes
Modeling Engine: TreeNet (Stochastic Gradient Boosting) yes yes yes
Modeling Engine: RandomForests for Classification yes yes yes
Additional Modeling Engines: Regularized Regression (LASSO/Ridge/LARS/Elastic Net/GPS) yes yes yes
Reporting ROC curves during model building and model scoring yes yes yes
Model performance stats based on Cross Validation yes yes yes
Model performance stats based on out of bag data during bootstrapping yes yes yes
Reporting performance summaries on learn and test data partitions yes yes yes
Reporting Gains and Lift Charts during model building and model scoring yes yes yes
Automatic creation of Command Logs yes yes yes
Built-in support to create, edit, and execute command files yes yes yes
Reading and writing datasets all current database/statistical file formats yes yes yes
Option to save processed datasets into all current database/statistical file formats yes yes yes
Select Cases in Score Setup yes yes yes
TreeNet scoring offset in Score Setup yes yes yes
Setting of focus class supported for all categorical variables yes yes yes
Scalable limits on terminal nodes. This is a special mode that will ensure the ATOM and/or MINCHILD size yes yes yes
Descriptive Statisics: Summary Stats, Stratified Stats, Charts and Histograms yes yes yes
Activity Window: Brief data description, quick navigation to most common activities yes yes yes
Translating models into SAS®-compatible language yes yes yes
Data analysis Binning Engine yes yes yes
Automatic creation of missing value indicators yes yes yes
Option to treat missing value in a categorical predictor as a new level yes yes yes
64 bit support; large memory capacity limited only by hardware yes    
License to any level supported by RAM (32 MB to 1 TB)   yes yes
License for multi-core capabilities yes yes yes
Using built-in BASIC Programming Language during data preparation yes yes yes
Automatic creation of lag variables based on user specifications during data preparation yes yes yes
Automatic creation and reporting of key overall and stratified summary statistics for user supplied list of variables yes yes yes
Display charts, histograms, and scatter plots for user selected variables yes yes yes
Command Line GUI Assistant to simplify creating and editing command files yes yes yes
Translating models into SAS/PMML/C/Java/Classic and ability to create classic and specialized reports for existing models yes yes yes
Unsupervised Learning - Breiman's column scrambler yes yes yes
Scoring any Automate (pre-packaged scenario of runs) as an ensemble model yes yes yes
Summary statistics based missing value imputation using scoring mechanism yes yes yes
Impute options in Score Setup yes yes yes
GUI support of SCORE PARTITIONS (GUI feature, SCORE PARTITIONS=YES) yes yes  
Quick Impute Analysis Engine: One-step statistical and model based imputation yes yes yes
Advanced Imputation via Automate TARGET. Control over fill selection and new impute variable creation yes yes yes
Correlation computation of over 10 different types of correlation yes yes yes
Save OOB predictions from cross-validation models yes yes yes
Custom selection of a new predictors list from an existing variable importance report yes yes yes
User defined bins for Cross Validation yes yes yes
Modeling Pipelines: RuleLearner, ISLE yes yes yes
Cross-Validation models can be scored as an Ensemble yes yes  
An alternative to variable importance based on Leo Breiman's scrambler yes yes  
Data Binning Results display (GUI feature) yes yes  
Data Binning Analysis Engine bins variables using model-based binning (via AUTOMATE BIN), or using weights of evidence coding. yes yes  
BIN ROUND, ADAPTIVEROUND methods (BIN METHOD=ROUND/ADAPTIVEROUND) yes yes  
Controls for number of Bins and Deciles (BOPTIONS NBINS, NDECILES) yes yes  
EVAL command and GUI display (GUI feature) yes yes  
Summary stats for the correlations (Correlation Stats tab) (GUI feature) yes yes  
TONUMERIC: create contiguous integer variables from other variables yes yes  
Automated imputation of all missing values (via Automate Target) yes yes  
Save out of bag predictions during Cross Validation yes yes  
Use TREATMENT variables when scoring uplift models (SCORE EVAL) yes yes  
Use TREATMENT variables when evaluating uplift model predictions (EVAL) yes yes  
Automation      
Generate detailed univariate stats on every continuous predictor to spot potential outliers and problematic records (AUTOMATE OUTLIERS) yes yes  
Automate ENABLETIMING=YES|NO to control timing reporting in Automates yes yes  
Build two models reversing the roles of the learn and test samples (Automate FLIP) yes yes yes
Explore model stability by repeated random drawing of the learn sample from the original dataset (Automate DRAW) yes yes yes
For time series applications, build models based on sliding time window using a large array of user options (Automate DATASHIFT) yes yes yes
Explore mutual multivariate dependencies among available predictors (Automate TARGET) yes yes yes
Explore the effects of the learn sample size on the model performance (Automate LEARN CURVE) yes yes yes
Build a series of models by varying the random number seed (Automate SEED) yes yes yes
Explore the marginal contribution of each predictor to the existing model (Automate LOVO) yes yes yes
Explore model stability by repeated repartitioning of the data into learn, test, and possibly hold-out samples (Automate PARTITION) yes yes yes
Explore nonlinear univariate relationships between the target and each available predictor (Automate ONEOFF) yes yes yes
Bootstrapping process (sampling with replacement from the learn sample) with a large array of user options (Random Forests-style sampling of predictors, saving in-bag and out-of-bag scores, proximity matrix, and node dummies) (Automate BOOTSTRAP) *not available in RandomForests yes yes yes
"Shifts" the "crossover point" between learn and test samples with each cycle of the Automate (Automate LTCROSSOVER) yes yes yes
Build a series of models using different backward variable selection strategies (Automate SHAVING) yes yes yes
Build a series of models using the forward-stepwise variable selection strategy (Automate STEPWISE) yes yes yes
Explore nonlinear univariate relationships between each available predictor and the target (Automate XONY) yes yes yes
Build a series of models using randomly sampled predictors (Automate KEEP) yes yes yes
Explore the impact of a potential replacement of a given predictor by another one (Automate SWAP) yes yes yes
Parametric bootstrap process (Automate PBOOT) yes yes yes
Build a series of models for each strata defined in the dataset (Automate STRATA) yes yes yes
Build a series of models using every available data mining engine (Automate MODELS) yes yes yes
Model is built in each possible data mining engine (Automate EVERYTHING)     yes
Run TreeNet for Predictor selection, Auto-bin predictors, then build a series of models using every available data mining engine (Automate GLM) yes yes yes

CART - Features

The CART methodology is based on landmark mathematical theory introduced in 1984 by four world-renowned statisticians at Stanford University and the University of California at Berkeley.
Patented extensions to the CART modeling engine are specifically designed to enhance results for market research and web analytics.

Feature V 8.3 V 8.2 V 8.0
User defined linear combination lists for splitting yes    
Constrains on trees yes    
Automatic addition of missing value indicators yes    
Enhanced GUI reporting yes    
User controlled Cross Validation yes    
Out-of-bag performance stats and predictions yes    
Profiling terminals nodes based on user supplied variables yes    
Comparison of Train vs. Test consistency across nodes yes    
RandomForests-style variable importance yes    
Linear Combination Splits yes yes yes
Optimal tree selection based on area under ROC curve yes yes yes
User defined splits for the root node and its children yes yes yes
Translating models into Topology yes yes yes
Edit and modify the CART trees via FORCE command structures yes yes yes
RATIO of the improvements of the primary splitter and the first competitor yes yes  
Scoring of CV models as an Ensemble yes yes  
Report impact of penalties in root node yes yes  
New penalty against biased splits PENALTY BIAS (PENALTY / BIAS, CONTBIAS, CATBIAS) yes yes  
Hotspot detection for Automate UNSUPERVISED yes yes yes
Hotspot detection for Automate TARGET yes yes yes
Hotspot detection to identify the richest nodes across the multiple trees yes yes yes
Differential Lift Modeling (Netlift/Uplift) yes yes yes
Profile tab in CART Summary window yes yes yes
Multiple user defined lists for linear combinations yes yes yes
Constrained trees yes yes yes
Ability to create and save dummy variables for every node in the tree during scoring yes yes yes
Report basic stats on any v ariable of user choice at every node in the tree yes yes yes
Comparison of learn vs. test performance at every node of every tree in the sequence yes yes yes
Build a Random Forests model utlizing the CART engine to gain alternative handling of missing values via surrogate splits (Automate BOOTSTRAP RSPLIT) yes yes yes
Automation      
Generate models with alternative handling of missing values (Automate MISSING_PENALTY) yes yes yes
Build a model using each splitting rule (six for classification, two for regression) (Automate RULES) yes yes yes
Build a series of models varying the depth of the tree (Automate DEPTH) yes yes yes
Build a series of models changing the minimum required size on parent nodes (Automate ATOM) yes yes yes
Build a series of models changing the minimum required size on child nodes (Automate MINCHILD) yes yes yes
Explore accuracy versus speed trade-off due to potential sampling of records at each node in a tree (Automate SUBSAMPLE) yes yes yes
Generates a series of N unsupervised-learning models (Automate UNSUPERVISED) yes yes yes
Varies the RIN (Regression In the Node) parameter through the series of values (Automate RIN) yes yes yes
Varying the number of "folds" used in crossvalidation (Automate CVFOLDS) yes yes yes
Repeat cross-validation process many times to explore the variance of estimates (Automate CVREPEATED) yes yes yes
Build a series of models using a user-supplied list of binning variables for cross-validation (Automate CVBIN) yes yes yes
Check the validity of model performance using Monte Carlo shuffling of the target (Automate TARGETSHUFFLE) yes yes yes
Build two linked models, where the first one predicts the binary event while the second one predicts the amount (Automate RELATED). For example, predicting whether someone will buy and how much they will spend yes yes yes
Indicates whether a variable importance matrix report should be produced when possible (Automate VARIMP) yes yes yes
Saves the variable importance matrix to a comma-separated file (Automate VARIMPFILE) yes yes yes
Generate models with alternative handling of missing v alues (AUTOMATE MVI) yes yes  
Vary the priors f or the specified class (Automate PRIORS) yes yes yes
Build a series of models by progressively removing misclassified records thus increasing the robustness of trees and posssibly reducing model complexity (Automate REFINE) yes yes yes
Bagging and ARCing using the legacy code (COMBINE) yes yes yes
Build a series of models limiting the number of nodes in a tree (Automate NODES) yes yes yes
Build a series of models trying each available predictor as the root node splitter (Automate ROOT) yes yes yes
Explore the impact of favoring equal sized child nodes by varying CART's end cut parameter (Automate POWER) yes yes yes
Explore the impact of penalty on categorical predictors (Automate PENALTY=HLC) yes yes yes

MARS - Features

The MARS modeling engine builds its model by piecing together a series of straight lines with each allowed its own slope.
The MARS Model is designed to predict numeric outcomes such as the average monthly bill of a mobile phone customer or the amount that a shopper is expected to spend in a web site visit. Areas where the MARS engine has exhibited very high-performance results include forecasting electricity demand for power generating companies, relating customer satisfaction scores to the engineering specifications of products, and presence/absence modeling in geographical information systems (GIS).

Feature V 8.3 V 8.2 V 8.0
Updated GUI interface yes    
Model performance based on independent test sample or Cross Validation yes    
Support for time series models yes    
Save MARS basis functions in Score Setup yes yes yes
MARS basis functions will be added during scoring to the output dataset. yes yes  
Ridge parameter in MARS yes yes  
Automation      
Build a series of models vary ing the maximum number of basis f unctions (Automate BASIS) yes yes yes
Varying the number of "folds" used in cross-validation (Automate CVFOLDS) yes yes yes
Repeat cross-validation process many times to explore the variance of estimates (Automate CVREPEATED) yes yes yes
Build a series of models using a user-supplied list of binning variables for cross-validation (Automate CVBIN) yes yes yes
Build a series of models varying the smoothness parameter (Automate MINSPAN) yes yes yes
Build a series of models varying the order of interactions (Automate INTERACTIONS) yes yes yes
Build a series of models varying the modeling speed (Automate SPEED) yes yes yes
Explore the impact of penalty on categorical predictors (Automate PENALTY=HLC) yes yes yes
Explore the impact of penalty on missing values (Automate PENALTY=MISSING) yes yes yes
Build a series of models using varying degree of penalty on added variables (Automate PENALTY MARS) yes yes yes

TreeNet - Features

The TreeNet modeling engine adds the advantage of a degree of accuracy usually not attainable by a single model or by ensembles such as bagging or conventional boosting. As opposed to neural networks, the TreeNet methodology is not sensitive to data errors and needs no time-consuming data preparation, pre-processing or imputation of missing values.

Feature V 8.3 V 8.2 V 8.0
One-Tree TreeNet (CART alternative) yes    
RandomForests via TreeNet (RandomForests regression alternative) Interaction Control Language (ICL) yes    
Enhanced partial dependency plots yes    
RandomForests-style randomized splits yes    
Spline-based approximations to the TreeNet dependency plots yes yes yes
Exporting TreeNet dependency plots into XML file yes yes yes
Interactions: allow interactions penalty which inhibits TreeNet from introducing new variables (and thus interactions) within a branch of a tree yes yes yes
Auto creation of new spline-based approximation variables. One step creation and savings of transformed variable to new dataset yes yes yes
Flexible control over interactions in a TreeNet model (ICL) yes yes yes
Interaction strength reporting yes yes yes
Interactions: Generate reports describing pairwise interactons of predictors yes yes yes
Interaction Control Lists (ICL): gives you complete control over structural interactions allowed or not allowed during model building. yes yes  
Interactions: compute interaction statistics among predictors, for regression and logistic models only. yes yes  
Subsample separately by target class. Specify separate sampling rates for the target classes in binary logistic models yes yes yes
Control number of top ranked models for which performance measures will be computed and saved yes yes yes
Advanced controls to reduce required memory (RAM) yes yes yes
Extended Influence Trimming Controls: ability to limit influence trimming to focus class and/or correctly classified yes yes yes
Differential Lift Modeling (Netlift/Uplift) yes yes yes
Delta ROC Uplift as a performance measure yes yes  
Uplift Profile tab for Uplift Results yes yes  
TreeNet Newton Split Search and Regularization penalties (RGBoost) (TN NEWTON=YES, RGBL0, RGBL1, RGBL2) yes yes  
Save information for further processing and individual tree predictions yes yes  
TreeNet Monotonicity Controls yes yes  
Added Sample with Replacement option to GUI dialog yes yes  
Hessian to control tree growing in TreeNet yes yes  
Newton-style splitting is available for TreeNet Uplift loss yes yes  
QUANTILE specifies which quantile will be used during LOSS=LAD yes yes yes
POISSON: Designed for the regression modeling of integer COUNT data yes yes yes
GAMMA distribution loss, used strictly for positive targets yes yes yes
NEGBIN: Negative Binomial distribution loss, used for counted targets (0,1,2,3,…) yes yes yes
COX where the target (MODEL) variable is the non-negative survival time while the CENSOR variable indicates yes yes yes
Tweedie loss function yes yes yes
Table showing "Top Interactions Pairs" yes yes  
Control over number of bins reported in Uplift tables yes yes  
Translation of models with INIT option yes yes  
Random Selection of Predictors: first for tree then random subset from that list for node yes yes  
Save detailed 2-way interaction statistics to a file yes yes  
Control the depth of each tree yes yes  
Modeling Pipelines: RuleLearner, ISLE yes yes yes
Build a CART tree utilizing the TreeNet engine to gain speed as well as alternative reporting, and control over interactions using ICL yes yes yes
Build a RandomForests model utilizing the TreeNet engine to gain speed as well as partial dependency plots, spline approximatons, variable interaction statistics, and control over interactions using ICL yes yes yes
RandomForests inspired sampling of predictors at each node during model building yes yes yes
TreeNet Two-Variable dependency plots (3D plots) on-demand based on pairwise Interaction scores yes yes  
TreeNet One-Variable dependency plots based on interaction scores yes yes  
TreeNet in RandomForests mode for Classification yes yes  
Random split selection (RSPLIT) yes yes  
Median split selection (MSPLIT) yes yes  
Automation      
Build a series of models changing the minimum required size on child nodes (Automate MINCHILD) yes yes yes
Varying the number of "folds" used in cross-validation (Automate CVFOLDS) yes yes yes
Repeat cross-validation process many times to explore the variance of estimates (Automate CVREPEATED) yes yes yes
Build a series of models using a user-supplied list of binning variables for cross-validation (Automate CVBIN) yes yes yes
Check the validity of model performance using Monte Carlo shuffling of the target (Automate TARGETSHUFFLE) yes yes yes
Indicates whether a variable importance matrix report should be produced when possible (Automate VARIMP) yes yes yes
Saves the variable importance matrix to a commaseparated file (Automate VARIMPFILE) yes yes yes
Build a series of models by varying subsampling fraction (Automate TNSUBSAMPLE) yes yes  
Build a series of models by varying the quantile value when using the QUANTILE loss function (Automate TNQUANTILE) yes yes  
Build a series of models by varying the class weights between UNIT and BALANCED in N Steps (Automate TNCLASSWEIGHTS) yes yes  
Build two linked models, where the first one predicts the binary event while the second one predicts the amount (Automate RELATED). For example, predicting whether someone will buy and how much they will spend yes yes yes
Build a series of models limiting the number of nodes in a tree (Automate NODES) yes yes yes
Convert (bin) all continuous variables into categorical (discrete) versions using a large array of user options (equal width, weights of evidence, Naïve Bayes, supervised) (Automate BIN) yes yes yes
Produces a series of three TreeNet models, making use of the TREATMENT variable specified on the TreeNet command (Automate DIFFLIFT) yes yes yes
Build a series of models varying the speed of learning (Automate LEARNRATE) yes yes yes
Build a series of models by progressively imposing additivity on individual predictors (Automate ADDITIVE) yes yes yes
Build a series of models utilizing different regression loss functions (Automate TNREG) yes yes yes
Build a series of models by varying subsampling fraction (Automate TNSUBSAMPLE) yes yes yes
Build a series of models using varying degree of penalty on added variables (Automate ADDEDVAR) yes yes yes
Explore the impact of influence trimming (outlier removal) for logistic and classification models (Automate INFLUENCE) yes yes yes
Stochastic search for the optimal regularization penalties (Automate TNRGBOOST) yes yes  
Explore the impact of influence trimming (outlier removal) for logistic and classification models (Automate INFLUENCE) yes yes  
Exhaustive search and ranking for all interactions of the specified order (Automate ICL) yes yes yes
Varies the number of predictors that can participate in a TreeNet branch, using interaction controls to constrain interactions (Automate ICL NWAY) yes yes yes
Stochastic search of the core TreeNet modeling parameters (Automate TNOPTIMIZE) yes yes  

Random Forests - Features

Random Forests modeling engine is a collection of many CART® trees that are not influenced by each other when constructed. The method was developed by Leo Breiman and Adele Cutler of the University of California, Berkeley, and is licensed exclusively to Minitab.
Random Forests is best suited for the analysis of complex data structures embedded in small to moderate data sets containing less than 10,000 rows but potentially millions of columns.

Feature V 8.3 V 8.2 V 8.0
RandomForests regression yes    
Saving out-of-bag scores yes    
Speed enhancements yes    
RF modified version of random split point selection (RANDOMMODE, JITTERSPLITS options) yes yes  
Random Split Point is exposed in GUI yes yes  
Breiman's 2000 theory paper measures of STRENGTH and CORRELATION in the forest. (CORR, BCORR) yes yes  
Penalty configuration for RF engine yes yes  
RF: preserve prototype nucleus and consider variations to prototype algorithm (SVPROTOTYPES, PROTOREPORT) yes yes  
GUI RF Advanced tab yes yes  
in-bag / out-of-bag indicator to diagnostics dataset to faciliate testing (SVDIAG) yes yes  
Reporting of "raw" permutation-based variable importance yes yes  
Accuracy-based variable importance to RF, classification first yes yes  
Saving of "margins" to output dataset (SVMARGIN) yes yes  
Alternative, non-bootstrap forms of tree-by-tree sampling (SAMPLEAMOUNT, SAMPLEMODE, SAMPLEBYCLASS options) yes yes  
RF report: summarize N times each predictor appears in model, and N distinct split points yes yes  
GUI controls for new Variable Importance measures yes yes  
Flexible controls over interactions in a Random Forests for Regression model (requires TreeNet license) yes yes yes
Interaction strength reporting (requires TreeNet license) yes yes yes
Spline-based approximations to the Random Forests for Regression dependency plots (requires TreeNet license) yes yes yes
Exporting Random Forests for Regression dependency plots into XML files (requires TreeNet license) yes yes yes
Build a CART tree utilizing the Random Forests for Regression engine to gain speed as well as alternative reporting yes yes  
Automation      
Varies the bootstrap sample size (Automate RFBOOTSTRAP) yes yes yes
Vary the number of randomly selected predictors at the node-level (Automate RFNPREDS) yes yes yes
Explore the impact of influence trimming (outlier removal) for logistic and classification models (Automate INFLUENCE) yes yes yes
Exhaustive search and ranking for all interactions of the specified order (Automate ICL) yes yes yes

Regression (OLS) - Features

Feature V 8.3 V 8.2 V 8.0
Automation: Generate detailed univariate distributional reports for every continuous variable on the KEEP list (Automate OUTLIERS)   yes yes

GPS - Features

Feature V 8.3 V 8.2 V 8.0
Modeling Engines: Regularized Regression (LASSO/Ridge/LARS/Elastic Net/GPS) yes yes yes
Automation      
Build a series of models by forcing different limit on the maximum correlation among predictors (Automate MAXCORR) yes yes yes