Features of Salford Predictive Modeler

General Features

Feature	V 8.3	V 8.2	V 8.0
Modeling Engine: CART (Decision Trees)
Modeling Engine: MARS (Nonlinear Regression)
Modeling Engine: TreeNet (Stochastic Gradient Boosting)
Modeling Engine: RandomForests for Classification
Additional Modeling Engines: Regularized Regression (LASSO/Ridge/LARS/Elastic Net/GPS)
Reporting ROC curves during model building and model scoring
Model performance stats based on Cross Validation
Model performance stats based on out of bag data during bootstrapping
Reporting performance summaries on learn and test data partitions
Reporting Gains and Lift Charts during model building and model scoring
Automatic creation of Command Logs
Built-in support to create, edit, and execute command files
Reading and writing datasets all current database/statistical file formats
Option to save processed datasets into all current database/statistical file formats
Select Cases in Score Setup
TreeNet scoring offset in Score Setup
Setting of focus class supported for all categorical variables
Scalable limits on terminal nodes. This is a special mode that will ensure the ATOM and/or MINCHILD size
Descriptive Statisics: Summary Stats, Stratified Stats, Charts and Histograms
Activity Window: Brief data description, quick navigation to most common activities
Translating models into SAS^®-compatible language
Data analysis Binning Engine
Automatic creation of missing value indicators
Option to treat missing value in a categorical predictor as a new level
64 bit support; large memory capacity limited only by hardware
License to any level supported by RAM (32 MB to 1 TB)
License for multi-core capabilities
Using built-in BASIC Programming Language during data preparation
Automatic creation of lag variables based on user specifications during data preparation
Automatic creation and reporting of key overall and stratified summary statistics for user supplied list of variables
Display charts, histograms, and scatter plots for user selected variables
Command Line GUI Assistant to simplify creating and editing command files
Translating models into SAS/PMML/C/Java/Classic and ability to create classic and specialized reports for existing models
Unsupervised Learning - Breiman's column scrambler
Scoring any Automate (pre-packaged scenario of runs) as an ensemble model
Summary statistics based missing value imputation using scoring mechanism
Impute options in Score Setup
GUI support of SCORE PARTITIONS (GUI feature, SCORE PARTITIONS=YES)
Quick Impute Analysis Engine: One-step statistical and model based imputation
Advanced Imputation via Automate TARGET. Control over fill selection and new impute variable creation
Correlation computation of over 10 different types of correlation
Save OOB predictions from cross-validation models
Custom selection of a new predictors list from an existing variable importance report
User defined bins for Cross Validation
Modeling Pipelines: RuleLearner, ISLE
Cross-Validation models can be scored as an Ensemble
An alternative to variable importance based on Leo Breiman's scrambler
Data Binning Results display (GUI feature)
Data Binning Analysis Engine bins variables using model-based binning (via AUTOMATE BIN), or using weights of evidence coding.
BIN ROUND, ADAPTIVEROUND methods (BIN METHOD=ROUND/ADAPTIVEROUND)
Controls for number of Bins and Deciles (BOPTIONS NBINS, NDECILES)
EVAL command and GUI display (GUI feature)
Summary stats for the correlations (Correlation Stats tab) (GUI feature)
TONUMERIC: create contiguous integer variables from other variables
Automated imputation of all missing values (via Automate Target)
Save out of bag predictions during Cross Validation
Use TREATMENT variables when scoring uplift models (SCORE EVAL)
Use TREATMENT variables when evaluating uplift model predictions (EVAL)
Automation
Generate detailed univariate stats on every continuous predictor to spot potential outliers and problematic records (AUTOMATE OUTLIERS)
Automate ENABLETIMING=YES\|NO to control timing reporting in Automates
Build two models reversing the roles of the learn and test samples (Automate FLIP)
Explore model stability by repeated random drawing of the learn sample from the original dataset (Automate DRAW)
For time series applications, build models based on sliding time window using a large array of user options (Automate DATASHIFT)
Explore mutual multivariate dependencies among available predictors (Automate TARGET)
Explore the effects of the learn sample size on the model performance (Automate LEARN CURVE)
Build a series of models by varying the random number seed (Automate SEED)
Explore the marginal contribution of each predictor to the existing model (Automate LOVO)
Explore model stability by repeated repartitioning of the data into learn, test, and possibly hold-out samples (Automate PARTITION)
Explore nonlinear univariate relationships between the target and each available predictor (Automate ONEOFF)
Bootstrapping process (sampling with replacement from the learn sample) with a large array of user options (Random Forests-style sampling of predictors, saving in-bag and out-of-bag scores, proximity matrix, and node dummies) (Automate BOOTSTRAP) *not available in RandomForests
"Shifts" the "crossover point" between learn and test samples with each cycle of the Automate (Automate LTCROSSOVER)
Build a series of models using different backward variable selection strategies (Automate SHAVING)
Build a series of models using the forward-stepwise variable selection strategy (Automate STEPWISE)
Explore nonlinear univariate relationships between each available predictor and the target (Automate XONY)
Build a series of models using randomly sampled predictors (Automate KEEP)
Explore the impact of a potential replacement of a given predictor by another one (Automate SWAP)
Parametric bootstrap process (Automate PBOOT)
Build a series of models for each strata defined in the dataset (Automate STRATA)
Build a series of models using every available data mining engine (Automate MODELS)
Model is built in each possible data mining engine (Automate EVERYTHING)
Run TreeNet for Predictor selection, Auto-bin predictors, then build a series of models using every available data mining engine (Automate GLM)

CART - Features

The CART methodology is based on landmark mathematical theory introduced in 1984 by four world-renowned statisticians at Stanford University and the University of California at Berkeley.
Patented extensions to the CART modeling engine are specifically designed to enhance results for market research and web analytics.

Feature	V 8.3	V 8.2	V 8.0
User defined linear combination lists for splitting
Constrains on trees
Automatic addition of missing value indicators
Enhanced GUI reporting
User controlled Cross Validation
Out-of-bag performance stats and predictions
Profiling terminals nodes based on user supplied variables
Comparison of Train vs. Test consistency across nodes
RandomForests-style variable importance
Linear Combination Splits
Optimal tree selection based on area under ROC curve
User defined splits for the root node and its children
Translating models into Topology
Edit and modify the CART trees via FORCE command structures
RATIO of the improvements of the primary splitter and the first competitor
Scoring of CV models as an Ensemble
Report impact of penalties in root node
New penalty against biased splits PENALTY BIAS (PENALTY / BIAS, CONTBIAS, CATBIAS)
Hotspot detection for Automate UNSUPERVISED
Hotspot detection for Automate TARGET
Hotspot detection to identify the richest nodes across the multiple trees
Differential Lift Modeling (Netlift/Uplift)
Profile tab in CART Summary window
Multiple user defined lists for linear combinations
Constrained trees
Ability to create and save dummy variables for every node in the tree during scoring
Report basic stats on any v ariable of user choice at every node in the tree
Comparison of learn vs. test performance at every node of every tree in the sequence
Build a Random Forests model utlizing the CART engine to gain alternative handling of missing values via surrogate splits (Automate BOOTSTRAP RSPLIT)
Automation
Generate models with alternative handling of missing values (Automate MISSING_PENALTY)
Build a model using each splitting rule (six for classification, two for regression) (Automate RULES)
Build a series of models varying the depth of the tree (Automate DEPTH)
Build a series of models changing the minimum required size on parent nodes (Automate ATOM)
Build a series of models changing the minimum required size on child nodes (Automate MINCHILD)
Explore accuracy versus speed trade-off due to potential sampling of records at each node in a tree (Automate SUBSAMPLE)
Generates a series of N unsupervised-learning models (Automate UNSUPERVISED)
Varies the RIN (Regression In the Node) parameter through the series of values (Automate RIN)
Varying the number of "folds" used in crossvalidation (Automate CVFOLDS)
Repeat cross-validation process many times to explore the variance of estimates (Automate CVREPEATED)
Build a series of models using a user-supplied list of binning variables for cross-validation (Automate CVBIN)
Check the validity of model performance using Monte Carlo shuffling of the target (Automate TARGETSHUFFLE)
Build two linked models, where the first one predicts the binary event while the second one predicts the amount (Automate RELATED). For example, predicting whether someone will buy and how much they will spend
Indicates whether a variable importance matrix report should be produced when possible (Automate VARIMP)
Saves the variable importance matrix to a comma-separated file (Automate VARIMPFILE)
Generate models with alternative handling of missing v alues (AUTOMATE MVI)
Vary the priors f or the specified class (Automate PRIORS)
Build a series of models by progressively removing misclassified records thus increasing the robustness of trees and posssibly reducing model complexity (Automate REFINE)
Bagging and ARCing using the legacy code (COMBINE)
Build a series of models limiting the number of nodes in a tree (Automate NODES)
Build a series of models trying each available predictor as the root node splitter (Automate ROOT)
Explore the impact of favoring equal sized child nodes by varying CART's end cut parameter (Automate POWER)
Explore the impact of penalty on categorical predictors (Automate PENALTY=HLC)

MARS - Features

The MARS modeling engine builds its model by piecing together a series of straight lines with each allowed its own slope.
The MARS Model is designed to predict numeric outcomes such as the average monthly bill of a mobile phone customer or the amount that a shopper is expected to spend in a web site visit. Areas where the MARS engine has exhibited very high-performance results include forecasting electricity demand for power generating companies, relating customer satisfaction scores to the engineering specifications of products, and presence/absence modeling in geographical information systems (GIS).

Feature	V 8.3	V 8.2	V 8.0
Updated GUI interface
Model performance based on independent test sample or Cross Validation
Support for time series models
Save MARS basis functions in Score Setup
MARS basis functions will be added during scoring to the output dataset.
Ridge parameter in MARS
Automation
Build a series of models vary ing the maximum number of basis f unctions (Automate BASIS)
Varying the number of "folds" used in cross-validation (Automate CVFOLDS)
Repeat cross-validation process many times to explore the variance of estimates (Automate CVREPEATED)
Build a series of models using a user-supplied list of binning variables for cross-validation (Automate CVBIN)
Build a series of models varying the smoothness parameter (Automate MINSPAN)
Build a series of models varying the order of interactions (Automate INTERACTIONS)
Build a series of models varying the modeling speed (Automate SPEED)
Explore the impact of penalty on categorical predictors (Automate PENALTY=HLC)
Explore the impact of penalty on missing values (Automate PENALTY=MISSING)
Build a series of models using varying degree of penalty on added variables (Automate PENALTY MARS)

TreeNet - Features

The TreeNet modeling engine adds the advantage of a degree of accuracy usually not attainable by a single model or by ensembles such as bagging or conventional boosting. As opposed to neural networks, the TreeNet methodology is not sensitive to data errors and needs no time-consuming data preparation, pre-processing or imputation of missing values.

Feature	V 8.3	V 8.2	V 8.0
One-Tree TreeNet (CART alternative)
RandomForests via TreeNet (RandomForests regression alternative) Interaction Control Language (ICL)
Enhanced partial dependency plots
RandomForests-style randomized splits
Spline-based approximations to the TreeNet dependency plots
Exporting TreeNet dependency plots into XML file
Interactions: allow interactions penalty which inhibits TreeNet from introducing new variables (and thus interactions) within a branch of a tree
Auto creation of new spline-based approximation variables. One step creation and savings of transformed variable to new dataset
Flexible control over interactions in a TreeNet model (ICL)
Interaction strength reporting
Interactions: Generate reports describing pairwise interactons of predictors
Interaction Control Lists (ICL): gives you complete control over structural interactions allowed or not allowed during model building.
Interactions: compute interaction statistics among predictors, for regression and logistic models only.
Subsample separately by target class. Specify separate sampling rates for the target classes in binary logistic models
Control number of top ranked models for which performance measures will be computed and saved
Advanced controls to reduce required memory (RAM)
Extended Influence Trimming Controls: ability to limit influence trimming to focus class and/or correctly classified
Differential Lift Modeling (Netlift/Uplift)
Delta ROC Uplift as a performance measure
Uplift Profile tab for Uplift Results
TreeNet Newton Split Search and Regularization penalties (RGBoost) (TN NEWTON=YES, RGBL0, RGBL1, RGBL2)
Save information for further processing and individual tree predictions
TreeNet Monotonicity Controls
Added Sample with Replacement option to GUI dialog
Hessian to control tree growing in TreeNet
Newton-style splitting is available for TreeNet Uplift loss
QUANTILE specifies which quantile will be used during LOSS=LAD
POISSON: Designed for the regression modeling of integer COUNT data
GAMMA distribution loss, used strictly for positive targets
NEGBIN: Negative Binomial distribution loss, used for counted targets (0,1,2,3,…)
COX where the target (MODEL) variable is the non-negative survival time while the CENSOR variable indicates
Tweedie loss function
Table showing "Top Interactions Pairs"
Control over number of bins reported in Uplift tables
Translation of models with INIT option
Random Selection of Predictors: first for tree then random subset from that list for node
Save detailed 2-way interaction statistics to a file
Control the depth of each tree
Modeling Pipelines: RuleLearner, ISLE
Build a CART tree utilizing the TreeNet engine to gain speed as well as alternative reporting, and control over interactions using ICL
Build a RandomForests model utilizing the TreeNet engine to gain speed as well as partial dependency plots, spline approximatons, variable interaction statistics, and control over interactions using ICL
RandomForests inspired sampling of predictors at each node during model building
TreeNet Two-Variable dependency plots (3D plots) on-demand based on pairwise Interaction scores
TreeNet One-Variable dependency plots based on interaction scores
TreeNet in RandomForests mode for Classification
Random split selection (RSPLIT)
Median split selection (MSPLIT)
Automation
Build a series of models changing the minimum required size on child nodes (Automate MINCHILD)
Varying the number of "folds" used in cross-validation (Automate CVFOLDS)
Repeat cross-validation process many times to explore the variance of estimates (Automate CVREPEATED)
Build a series of models using a user-supplied list of binning variables for cross-validation (Automate CVBIN)
Check the validity of model performance using Monte Carlo shuffling of the target (Automate TARGETSHUFFLE)
Indicates whether a variable importance matrix report should be produced when possible (Automate VARIMP)
Saves the variable importance matrix to a commaseparated file (Automate VARIMPFILE)
Build a series of models by varying subsampling fraction (Automate TNSUBSAMPLE)
Build a series of models by varying the quantile value when using the QUANTILE loss function (Automate TNQUANTILE)
Build a series of models by varying the class weights between UNIT and BALANCED in N Steps (Automate TNCLASSWEIGHTS)
Build two linked models, where the first one predicts the binary event while the second one predicts the amount (Automate RELATED). For example, predicting whether someone will buy and how much they will spend
Build a series of models limiting the number of nodes in a tree (Automate NODES)
Convert (bin) all continuous variables into categorical (discrete) versions using a large array of user options (equal width, weights of evidence, Naïve Bayes, supervised) (Automate BIN)
Produces a series of three TreeNet models, making use of the TREATMENT variable specified on the TreeNet command (Automate DIFFLIFT)
Build a series of models varying the speed of learning (Automate LEARNRATE)
Build a series of models by progressively imposing additivity on individual predictors (Automate ADDITIVE)
Build a series of models utilizing different regression loss functions (Automate TNREG)
Build a series of models by varying subsampling fraction (Automate TNSUBSAMPLE)
Build a series of models using varying degree of penalty on added variables (Automate ADDEDVAR)
Explore the impact of influence trimming (outlier removal) for logistic and classification models (Automate INFLUENCE)
Stochastic search for the optimal regularization penalties (Automate TNRGBOOST)
Explore the impact of influence trimming (outlier removal) for logistic and classification models (Automate INFLUENCE)
Exhaustive search and ranking for all interactions of the specified order (Automate ICL)
Varies the number of predictors that can participate in a TreeNet branch, using interaction controls to constrain interactions (Automate ICL NWAY)
Stochastic search of the core TreeNet modeling parameters (Automate TNOPTIMIZE)

Random Forests - Features

Random Forests modeling engine is a collection of many CART^® trees that are not influenced by each other when constructed. The method was developed by Leo Breiman and Adele Cutler of the University of California, Berkeley, and is licensed exclusively to Minitab.
Random Forests is best suited for the analysis of complex data structures embedded in small to moderate data sets containing less than 10,000 rows but potentially millions of columns.

Feature	V 8.3	V 8.2	V 8.0
RandomForests regression
Saving out-of-bag scores
Speed enhancements
RF modified version of random split point selection (RANDOMMODE, JITTERSPLITS options)
Random Split Point is exposed in GUI
Breiman's 2000 theory paper measures of STRENGTH and CORRELATION in the forest. (CORR, BCORR)
Penalty configuration for RF engine
RF: preserve prototype nucleus and consider variations to prototype algorithm (SVPROTOTYPES, PROTOREPORT)
GUI RF Advanced tab
in-bag / out-of-bag indicator to diagnostics dataset to faciliate testing (SVDIAG)
Reporting of "raw" permutation-based variable importance
Accuracy-based variable importance to RF, classification first
Saving of "margins" to output dataset (SVMARGIN)
Alternative, non-bootstrap forms of tree-by-tree sampling (SAMPLEAMOUNT, SAMPLEMODE, SAMPLEBYCLASS options)
RF report: summarize N times each predictor appears in model, and N distinct split points
GUI controls for new Variable Importance measures
Flexible controls over interactions in a Random Forests for Regression model (requires TreeNet license)
Interaction strength reporting (requires TreeNet license)
Spline-based approximations to the Random Forests for Regression dependency plots (requires TreeNet license)
Exporting Random Forests for Regression dependency plots into XML files (requires TreeNet license)
Build a CART tree utilizing the Random Forests for Regression engine to gain speed as well as alternative reporting
Automation
Varies the bootstrap sample size (Automate RFBOOTSTRAP)
Vary the number of randomly selected predictors at the node-level (Automate RFNPREDS)
Explore the impact of influence trimming (outlier removal) for logistic and classification models (Automate INFLUENCE)
Exhaustive search and ranking for all interactions of the specified order (Automate ICL)

Regression (OLS) - Features

Feature	V 8.3	V 8.2	V 8.0
Automation: Generate detailed univariate distributional reports for every continuous variable on the KEEP list (Automate OUTLIERS)

GPS - Features

Feature	V 8.3	V 8.2	V 8.0
Modeling Engines: Regularized Regression (LASSO/Ridge/LARS/Elastic Net/GPS)
Automation
Build a series of models by forcing different limit on the maximum correlation among predictors (Automate MAXCORR)

KNOW-HOW

Praxisorientierte Soft- und Hardware-Schulungen
für Einsteiger, Fortgeschrittene und erfahrene Anwender.

Kontakt

Lernressourcen

Features of Salford Predictive Modeler

General Features

CART - Features

MARS - Features

TreeNet - Features

Random Forests - Features

Regression (OLS) - Features

GPS - Features

Über uns

Kontakt

Zentrale

Auftragsabwicklung

Zertifikate & Gütesiegel

KNOW-HOW

Praxisorientierte Soft- und Hardware-Schulungenfür Einsteiger, Fortgeschrittene und erfahrene Anwender.

Kontakt

Lernressourcen

Features of Salford Predictive Modeler

General Features

CART - Features

MARS - Features

TreeNet - Features

Random Forests - Features

Regression (OLS) - Features

GPS - Features

Zentrale

Auftragsabwicklung

Praxisorientierte Soft- und Hardware-Schulungen
für Einsteiger, Fortgeschrittene und erfahrene Anwender.