Failure Is Moving Science Forward, by FiveThirtyEight
Jared Sperli stashed this in science
Wait, power poses don't work?!
While I paced around the green room at a recent TEDx event in Colorado, one of the other speakers offered the rest of us some advice on how to ease our nerves. “Raise your arms up in the air and make yourself big — it will help you feel powerful!” It was scientifically proven, she told us (she’d seen it in a TED talk), that adopting a so-called power pose — shoulders wide, arms strong — could raise your testosterone levels, lower your stress hormones, and make you feel more confident and commanding.
Like everyone else, I was nervous. This wasn’t my usual kind of speech; it was a performance — a scripted story that wasn’t supposed to sound scripted, told with no notes and no cues. I knew my lines by heart, but I also knew that one moment of doubt was all it would take for me to draw a blank up on stage. So just before I walked through the curtains, I took a deep breath and raised my arms overhead as if signaling victory. I don’t know if the power pose helped me, but it didn’t seem to hurt.
What I didn’t say back in the green room was that although one highly touted study had shown how adopting a power pose could alter your hormone levels and make you more bold, another group of researchers had tried to repeat the study and found no such effect. It’s possible that the power pose phenomenon was nothing more than a spurious result.
Power poses aren’t the only well-publicized finding called into question by further research.
Psychology, biomedicine and numerous other fields of science have fallen into a crisis of confidence recently, after seminal findings could not be replicated in subsequent studies. These widespread problems with reproducibility underscore a problem that I discussed here last year — namely, that science is really, really hard. Even relatively straightforward questions cannot be definitively answered in a single study, and the scientific literature is riddled with results that won’t stand up. This is the way science works — it’s a process of becoming less wrong over time.
Scientific claims don’t gain credibility by someone saying, ‘I found it.’ They gain credibility by others being able to reproduce it,” said Brian Nosek, a psychologist at the University of Virginia, co-founder of the Center for Open Science and leader of the Reproducibility Project: Psychology.
RP:P, initiated in 2011, attempted to replicate 100 studies published in three high-profile psychology journals in 2008.
There are good reasons why real effects may fail to reproduce, and in many cases, we should expect replications to fail, even if the original finding is real.
It may seem counterintuitive, but initial studies have a known bias toward overestimating the magnitude of an effect for a simple reason: They were selected for publication because of their unusually small p-values, said Veronica Vieland, a biostatistician at the Battelle Center for Mathematical Medicine in Columbus, Ohio.
Imagine that you were looking at the relationship between height and college majors. You collected data from math majors in a small class that had a couple of unusually tall students and compared it with a similar sized philosophy class that happened to have one unusually short person in it. Comparing the two averages, the differences seem large — math majors are taller than philosophy majors (and perhaps the unusual difference between these two particular classes is what caught your attention in the first place). But most of those differences were flukes, and when you repeat the study you’re unlikely to see such an extreme difference between the two majors, especially if the second study has a larger sample. If you’re trying to figure out the true height differences, this “regression to the mean” is a good thing, because it gets you closer to the true averages.
But the regression to the mean issue also means that even if the initial results are correct, they may not be replicated in subsequent studies. The RP:P project attempted to replicate 100 studies, 97 of which had produced results with a “significant” p-value of 0.05 or less. By selecting so many positive studies, the group set itself up for a regression to the mean phenomenon, and that’s what it found, said Steven Goodman, co-director of the Meta-Research Innovation Center at Stanford (he was not involved in the RP:P project).
Indeed, less than half of the replication studies in RP:P reproduced the original results. That reduction in positive findings could mean that the original studies were wrong, or it could represent a simple regression to the mean. It’s also possible that some of the replication studies produced false negatives, failing to find effects that were real. The paper in the journal Science that described the RP:P results concluded, “how many of the effects have we established are true? Zero. And how many of the effects have we established are false? Zero.” Still, the message that made media headlines was that all these studies were disproven, and that simply wasn’t true, Goodman said.
It's especially hard when studying humans.
Amy Cuddy wrote, “we are trying to do something quite difficult here — predict human behavior and understand subjective experiences. Psychology may not be a hard science,” she writes, but it is certainly a difficult one. It shouldn’t be so surprising that psychology studies don’t always replicate — the field faces an inherent challenge. Rather than measuring molecules or mass, it examines human motives and behavior, which are frustratingly hard to isolate.
John Oliver based his main story tonight on this premise of the irreplicability of science studies.
This story? (Thank you for saving the honor.)
That's the one! Thank you!