When thinking about the big data era, what are some statistical ideas we've already figured out?
Mo Data stashed this in Analysis Tips and Tricks
If the goal is prediction accuracy, average many prediction models together.
When testing many hypotheses, correct for multiple testing
When you have data measured over space, distance, or time, you should smooth
Before you analyze your data with computers, be sure to plot it
Interactive analysis is the best way to really figure out what is going on in a data set
Know what your real sample size is.
Unless you ran a randomized trial, potential confounders should keep you up at night
Define a metric for success up front
Make your code and data available and have smart people check it