Feature |
V 8.3 |
V 8.2 |
V 8.0 |
Modeling Engine: CART (Decision Trees) |
 |
 |
 |
Modeling Engine: MARS (Nonlinear Regression) |
 |
 |
 |
Modeling Engine: TreeNet (Stochastic Gradient Boosting) |
 |
 |
 |
Modeling Engine: RandomForests for Classification |
 |
 |
 |
Additional Modeling Engines: Regularized Regression (LASSO/Ridge/LARS/Elastic Net/GPS) |
 |
 |
 |
Reporting ROC curves during model building and model scoring |
 |
 |
 |
Model performance stats based on Cross Validation |
 |
 |
 |
Model performance stats based on out of bag data during bootstrapping |
 |
 |
 |
Reporting performance summaries on learn and test data partitions |
 |
 |
 |
Reporting Gains and Lift Charts during model building and model scoring |
 |
 |
 |
Automatic creation of Command Logs |
 |
 |
 |
Built-in support to create, edit, and execute command files |
 |
 |
 |
Reading and writing datasets all current database/statistical file formats |
 |
 |
 |
Option to save processed datasets into all current database/statistical file formats |
 |
 |
 |
Select Cases in Score Setup |
 |
 |
 |
TreeNet scoring offset in Score Setup |
 |
 |
 |
Setting of focus class supported for all categorical variables |
 |
 |
 |
Scalable limits on terminal nodes. This is a special mode that will ensure the ATOM and/or MINCHILD size |
 |
 |
 |
Descriptive Statisics: Summary Stats, Stratified Stats, Charts and Histograms |
 |
 |
 |
Activity Window: Brief data description, quick navigation to most common activities |
 |
 |
 |
Translating models into SAS®-compatible language |
 |
 |
 |
Data analysis Binning Engine |
 |
 |
 |
Automatic creation of missing value indicators |
 |
 |
 |
Option to treat missing value in a categorical predictor as a new level |
 |
 |
 |
64 bit support; large memory capacity limited only by hardware |
 |
|
|
License to any level supported by RAM (32 MB to 1 TB) |
|
 |
 |
License for multi-core capabilities |
 |
 |
 |
Using built-in BASIC Programming Language during data preparation |
 |
 |
 |
Automatic creation of lag variables based on user specifications during data preparation |
 |
 |
 |
Automatic creation and reporting of key overall and stratified summary statistics for user supplied list of variables |
 |
 |
 |
Display charts, histograms, and scatter plots for user selected variables |
 |
 |
 |
Command Line GUI Assistant to simplify creating and editing command files |
 |
 |
 |
Translating models into SAS/PMML/C/Java/Classic and ability to create classic and specialized reports for existing models |
 |
 |
 |
Unsupervised Learning - Breiman's column scrambler |
 |
 |
 |
Scoring any Automate (pre-packaged scenario of runs) as an ensemble model |
 |
 |
 |
Summary statistics based missing value imputation using scoring mechanism |
 |
 |
 |
Impute options in Score Setup |
 |
 |
 |
GUI support of SCORE PARTITIONS (GUI feature, SCORE PARTITIONS=YES) |
 |
 |
|
Quick Impute Analysis Engine: One-step statistical and model based imputation |
 |
 |
 |
Advanced Imputation via Automate TARGET. Control over fill selection and new impute variable creation |
 |
 |
 |
Correlation computation of over 10 different types of correlation |
 |
 |
 |
Save OOB predictions from cross-validation models |
 |
 |
 |
Custom selection of a new predictors list from an existing variable importance report |
 |
 |
 |
User defined bins for Cross Validation |
 |
 |
 |
Modeling Pipelines: RuleLearner, ISLE |
 |
 |
 |
Cross-Validation models can be scored as an Ensemble |
 |
 |
|
An alternative to variable importance based on Leo Breiman's scrambler |
 |
 |
|
Data Binning Results display (GUI feature) |
 |
 |
|
Data Binning Analysis Engine bins variables using model-based binning (via AUTOMATE BIN), or using weights of evidence coding. |
 |
 |
|
BIN ROUND, ADAPTIVEROUND methods (BIN METHOD=ROUND/ADAPTIVEROUND) |
 |
 |
|
Controls for number of Bins and Deciles (BOPTIONS NBINS, NDECILES) |
 |
 |
|
EVAL command and GUI display (GUI feature) |
 |
 |
|
Summary stats for the correlations (Correlation Stats tab) (GUI feature) |
 |
 |
|
TONUMERIC: create contiguous integer variables from other variables |
 |
 |
|
Automated imputation of all missing values (via Automate Target) |
 |
 |
|
Save out of bag predictions during Cross Validation |
 |
 |
|
Use TREATMENT variables when scoring uplift models (SCORE EVAL) |
 |
 |
|
Use TREATMENT variables when evaluating uplift model predictions (EVAL) |
 |
 |
|
Automation |
|
|
|
Generate detailed univariate stats on every continuous predictor to spot potential outliers and problematic records (AUTOMATE OUTLIERS) |
 |
 |
|
Automate ENABLETIMING=YES|NO to control timing reporting in Automates |
 |
 |
|
Build two models reversing the roles of the learn and test samples (Automate FLIP) |
 |
 |
 |
Explore model stability by repeated random drawing of the learn sample from the original dataset (Automate DRAW) |
 |
 |
 |
For time series applications, build models based on sliding time window using a large array of user options (Automate DATASHIFT) |
 |
 |
 |
Explore mutual multivariate dependencies among available predictors (Automate TARGET) |
 |
 |
 |
Explore the effects of the learn sample size on the model performance (Automate LEARN CURVE) |
 |
 |
 |
Build a series of models by varying the random number seed (Automate SEED) |
 |
 |
 |
Explore the marginal contribution of each predictor to the existing model (Automate LOVO) |
 |
 |
 |
Explore model stability by repeated repartitioning of the data into learn, test, and possibly hold-out samples (Automate PARTITION) |
 |
 |
 |
Explore nonlinear univariate relationships between the target and each available predictor (Automate ONEOFF) |
 |
 |
 |
Bootstrapping process (sampling with replacement from the learn sample) with a large array of user options (Random Forests-style sampling of predictors, saving in-bag and out-of-bag scores, proximity matrix, and node dummies) (Automate BOOTSTRAP) *not available in RandomForests |
 |
 |
 |
"Shifts" the "crossover point" between learn and test samples with each cycle of the Automate (Automate LTCROSSOVER) |
 |
 |
 |
Build a series of models using different backward variable selection strategies (Automate SHAVING) |
 |
 |
 |
Build a series of models using the forward-stepwise variable selection strategy (Automate STEPWISE) |
 |
 |
 |
Explore nonlinear univariate relationships between each available predictor and the target (Automate XONY) |
 |
 |
 |
Build a series of models using randomly sampled predictors (Automate KEEP) |
 |
 |
 |
Explore the impact of a potential replacement of a given predictor by another one (Automate SWAP) |
 |
 |
 |
Parametric bootstrap process (Automate PBOOT) |
 |
 |
 |
Build a series of models for each strata defined in the dataset (Automate STRATA) |
 |
 |
 |
Build a series of models using every available data mining engine (Automate MODELS) |
 |
 |
 |
Model is built in each possible data mining engine (Automate EVERYTHING) |
|
|
 |
Run TreeNet for Predictor selection, Auto-bin predictors, then build a series of models using every available data mining engine (Automate GLM) |
 |
 |
 |