Comparing Statistical Software - great guide
Mo Data stashed this in Big Data Technologies
Almost all serious statistical analysis is done in one of the following packages: R (S-PLUS), Matlab, SAS, SPSS and Stata. I have expertise in each of those packages but it does not mean that each of those packages is good for a specific type of analysis. In fact, for most advanced areas, only 2-3 packages will be suitable, providing enough functionality or enough tools to implement this functionality easily. For example, a very important area of Markov Chain Monte Carlo is doable in R, Matlab and SAS only, unless you want to rely on convoluted macros written by random users on the web. The table at the end of this page compares the five packages in great detail.
R & MATLAB
R and Matlab are the richest systems by far. They contain an impressive amount of libraries, which is growing each day. Even if a desired very specific model is not part of the standard functionality, you can implement it yourself, because R and Matlab are really programming languages with relatively simple syntaxes. As "languages" they allow you to express any idea. The question is whether you are a good writer or not. In terms of modern applied statistics tools, R libraries are somewhat richer than those of Matlab. Also R is free. On the flip side, Matlab has much better graphics, which you will not be ashamed to put in a paper or a presentation.
On the other end of the spectrum is a package like SPSS. SPSS is quite narrow in its capabilities and allows you to do only about half of the mainstream statistics. It is quite useless for ambitious modeling and estimation procedures which are part of kernel smoothing, pattern recognition or signal processing. Nonetheless, SPSS is very popular among the practitioners because it does not require almost any programming training. All you have to do is hit several buttons and SPSS does all the calculations for you. In those cases when you need something standard, SPSS may have it implemented fully. The SPSS output will be quite detailed and visually pleasing. It will contain all the major tests and diagnostic tools associated with the method and will allow you to write an informative statistics section of your empirical analysis. In short, when the method is there, it is faster to run than a similar functionality in R or Matlab. So I use SPSS often for standard requests from my clients, like running linear regression, ANOVA or principal components analysis. SPSS gives you the ability to program macros, but that feature is quite inflexible.
SAS & STATA
Somewhere in-between R, Matlab and SPSS lie SAS and Stata. SAS is more extensive analytics than Stata. It is composed of dozens of procedures with massive, massive output, often covering more than ten pages. The idea of SAS is not to listen to you that much. It is like an old grandfather, which you approach with a simple question but instead he tells you the story of his life. Many procedures contain three times more than what you need to know about that segment. So some time has to be spent on filtering in the relevant output. SAS procedures are invoked using simple scripts. Stata procedures can be invoked by clicking buttons in the menu or by running simple scripts. In the menu part, Stata resembles SPSS. Both SAS and Stata are programming languages, so they allow you to build analytics around standard procedures. Stata is somewhat more flexible than SAS. Still, in terms of programming flexibility, Stata and SAS do not come even close to R or Matlab. Selected strengths of SAS compared to all other packages: large data sets, speed, beautiful graphics, flexibility in formatting the output, time series procedures, counting processes. Selected strengths of Stata compared to all other packages: manipulation of survey data (stratified samples, clustering), robust estimation and tests, longitudinal data methods, multivariate time series.
The following table compares the standard procedures of the five packages in detail. By "standard" I mean built-in or readily available from the official or widely known and reliable public web-sites.
(This table is unreadable here, but please go to http://stanfordphd.com/Statistical_Software.html for the proper chart
TYPE OF STATISTICAL ANALYSISR MATLABSAS STATA SPSS Nonparametric Tests Yes Yes Yes Yes Yes T-test Yes Yes Yes Yes Yes ANOVA & MANOVA Yes Yes Yes Yes Yes ANCOVA & MANCOVA Yes Yes Yes Yes Yes Linear Regression Yes Yes Yes Yes Yes Generalized Least Squares Yes Yes Yes Yes Yes Ridge Regression Yes Yes Yes Lasso Yes Yes Yes Generalized Linear Models Yes Yes Yes Yes Yes Mixed Effects Models Yes Yes Yes Yes Yes Logistic Regression Yes Yes Yes Yes Yes Nonlinear Regression Yes Yes Yes Discriminant Analysis Yes Yes Yes Yes Yes Nearest Neighbor Yes Yes Yes Yes Factor & Principal Components Analysis Yes Yes Yes Yes Yes Copula Models Yes Yes Experimental Cross-Validation Yes Yes Yes Bayesian Statistics Yes Yes Limited Monte Carlo, Classic Methods Yes Yes Yes Yes Limited Markov Chain Monte Carlo Yes Yes Yes Bootstrap & Jackknife Yes Yes Yes Yes EM Algorithm Yes Yes Yes Missing Data Imputation Yes Yes Yes Yes Yes Outlier Diagnostics Yes Yes Yes Yes Yes Robust Estimation Yes Yes Yes Yes Longitudinal (Panel) Data Yes Yes Yes Yes Limited Survival Analysis Yes Yes Yes Yes Yes Path Analysis Yes Yes Yes Propensity Score Matching Yes Yes Limited Limited Stratified Samples (Survey Data) Yes Yes Yes Yes Yes Experimental Design Yes Yes Quality Control Yes Yes Yes Yes Reliability Theory Yes Yes Yes Yes Yes Univariate Time Series Yes Yes Yes Yes Limited Multivariate Time Series Yes Yes Yes Yes Markov Chains Yes Yes Hidden Markov Models Yes Yes Stochastic Volatility Models Yes Yes Limited Limited Limited Diffusions Yes Yes Counting Processes Yes Yes Yes Filtering Yes Yes Limited Limited Instrumental Variables Yes Yes Yes Yes Simultaneous Equations Yes Yes Yes Yes Splines Yes Yes Yes Yes Nonparametric Smoothing Methods Yes Yes Yes Yes Extreme Value Theory Yes Yes Variance Stabilization Yes Yes Cluster Analysis Yes Yes Yes Yes Yes Neural Networks Yes Yes Yes Limited Classification & Regression Trees Yes Yes Yes Limited Boosting Classification & Regression Trees Yes Yes Random Forests Yes Yes Support Vector Machines Yes Yes Yes Signal Processing Yes Yes Wavelet Analysis Yes Yes Yes ROC Curves Yes Yes Yes Yes Yes Optimization Yes Yes Yes Limited Please read the detailed description of the services offered in the areas of statistical consulting and financial consulting: home page, types of service, experience, case studies and payment options. You may also find the following pages useful: statistics resources and finance resources.
Stashed in: Big Data!