Sign up FAST! Login

Comparing Statistical Software - great guide


http://stanfordphd.com/Statistical_Software.html

statistician

STATISTICAL SOFTWARE

Almost all serious statistical analysis is done in one of the following packages: R (S-PLUS), Matlab, SAS, SPSS and Stata. I have expertise in each of those packages but it does not mean that each of those packages is good for a specific type of analysis. In fact, for most advanced areas, only 2-3 packages will be suitable, providing enough functionality or enough tools to implement this functionality easily. For example, a very important area of Markov Chain Monte Carlo is doable in R, Matlab and SAS only, unless you want to rely on convoluted macros written by random users on the web. The table at the end of this page compares the five packages in great detail. 

R & MATLAB

R and Matlab are the richest systems by far. They contain an impressive amount of libraries, which is growing each day. Even if a desired very specific model is not part of the standard functionality, you can implement it yourself, because R and Matlab are really programming languages with relatively simple syntaxes. As "languages" they allow you to express any idea. The question is whether you are a good writer or not. In terms of modern applied statistics tools, R libraries are somewhat richer than those of Matlab. Also R is free. On the flip side, Matlab has much better graphics, which you will not be ashamed to put in a paper or a presentation.

SPSS

On the other end of the spectrum is a package like SPSS. SPSS is quite narrow in its capabilities and allows you to do only about half of the mainstream statistics. It is quite useless for ambitious modeling and estimation procedures which are part of kernel smoothing, pattern recognition or signal processing. Nonetheless, SPSS is very popular among the practitioners because it does not require almost any programming training. All you have to do is hit several buttons and SPSS does all the calculations for you. In those cases when you need something standard, SPSS may have it implemented fully. The SPSS output will be quite detailed and visually pleasing. It will contain all the major tests and diagnostic tools associated with the method and will allow you to write an informative statistics section of your empirical analysis. In short, when the method is there, it is faster to run than a similar functionality in R or Matlab. So I use SPSS often for standard requests from my clients, like running linear regression, ANOVA or principal components analysis. SPSS gives you the ability to program macros, but that feature is quite inflexible. 

SAS & STATA

Somewhere in-between R, Matlab and SPSS lie SAS and Stata. SAS is more extensive analytics than Stata. It is composed of dozens of procedures with massive, massive output, often covering more than ten pages. The idea of SAS is not to listen to you that much. It is like an old grandfather, which you approach with a simple question but instead he tells you the story of his life. Many procedures contain three times more than what you need to know about that segment. So some time has to be spent on filtering in the relevant output. SAS procedures are invoked using simple scripts. Stata procedures can be invoked by clicking buttons in the menu or by running simple scripts. In the menu part, Stata resembles SPSS. Both SAS and Stata are programming languages, so they allow you to build analytics around standard procedures. Stata is somewhat more flexible than SAS. Still, in terms of programming flexibility, Stata and SAS do not come even close to R or Matlab. Selected strengths of SAS compared to all other packages: large data sets, speed, beautiful graphics, flexibility in formatting the output, time series procedures, counting processes. Selected strengths of Stata compared to all other packages: manipulation of survey data (stratified samples, clustering), robust estimation and tests, longitudinal data methods, multivariate time series. 

THE TABLE

The following table compares the standard procedures of the five packages in detail. By "standard" I mean built-in or readily available from the official or widely known and reliable public web-sites.

(This table is unreadable here, but please go to http://stanfordphd.com/Statistical_Software.html for the proper chart

 TYPE OF STATISTICAL ANALYSIS MATLABSAS STATA  SPSS       Nonparametric Tests Yes Yes Yes Yes Yes T-test Yes Yes Yes Yes Yes ANOVA & MANOVA Yes Yes Yes Yes Yes ANCOVA & MANCOVA Yes Yes Yes Yes Yes Linear Regression Yes Yes Yes Yes Yes Generalized Least Squares Yes Yes Yes  Yes Yes  Ridge Regression Yes Yes Yes    Lasso Yes Yes Yes    Generalized Linear Models Yes Yes Yes Yes Yes Mixed Effects Models Yes Yes Yes Yes Yes Logistic Regression Yes Yes Yes Yes Yes Nonlinear Regression Yes Yes Yes    Discriminant Analysis Yes Yes Yes  Yes  Yes  Nearest Neighbor Yes Yes Yes   Yes  Factor & Principal Components Analysis Yes Yes Yes Yes Yes Copula Models Yes Yes Experimental   Cross-Validation Yes Yes Yes    Bayesian Statistics Yes Yes Limited   Monte Carlo, Classic Methods Yes Yes Yes  Yes  Limited Markov Chain Monte Carlo Yes Yes Yes    Bootstrap & Jackknife Yes Yes Yes  Yes  EM Algorithm Yes Yes Yes    Missing Data Imputation Yes Yes Yes  Yes  Yes  Outlier Diagnostics Yes Yes Yes  Yes  Yes Robust Estimation Yes Yes Yes  Yes  Longitudinal (Panel) Data Yes Yes Yes  Yes  Limited Survival Analysis Yes Yes Yes  Yes  Yes  Path Analysis Yes Yes Yes    Propensity Score Matching Yes Yes Limited  Limited   Stratified Samples (Survey Data) Yes Yes Yes  Yes  Yes  Experimental Design Yes Yes    Quality Control Yes Yes  Yes  Yes  Reliability Theory Yes Yes Yes  Yes  Yes Univariate Time Series Yes Yes Yes  Yes  Limited Multivariate Time Series Yes Yes Yes  Yes   Markov Chains Yes Yes    Hidden Markov Models Yes Yes    Stochastic Volatility Models Yes Yes Limited Limited  Limited  Diffusions Yes Yes    Counting Processes Yes Yes Yes    Filtering Yes Yes Limited  Limited  Instrumental Variables Yes Yes Yes Yes  Simultaneous Equations Yes Yes Yes  Yes  Splines Yes Yes Yes  Yes  Nonparametric Smoothing Methods Yes Yes Yes  Yes   Extreme Value Theory Yes Yes    Variance Stabilization Yes Yes    Cluster Analysis Yes Yes Yes  Yes  Yes  Neural Networks Yes Yes Yes   Limited Classification & Regression Trees Yes Yes Yes   Limited Boosting Classification & Regression Trees Yes Yes    Random Forests Yes Yes    Support Vector Machines Yes Yes Yes   Signal Processing Yes Yes    Wavelet Analysis Yes Yes Yes   ROC Curves Yes Yes Yes  Yes  Yes  Optimization Yes Yes Yes  Limited Please read the detailed description of the services offered in the areas of statistical consulting and financial consulting: home pagetypes of serviceexperiencecase studies and payment options. You may also find the following pages useful: statistics resources and finance resources.

Stashed in: Big Data!

To save this post, select a stash from drop-down menu or type in a new one:

You May Also Like: