Is Most Published Research Wrong?

Is Most Published Research Wrong?


In 2011 an article was published in the
reputable “Journal of Personality and Social Psychology”. It was called “Feeling the
Future: Experimental Evidence for Anomalous Retroactive Influences on
Cognition and Affect” or, in other words, proof that people can see into the
future. The paper reported on nine experiments. In one, participants were
shown two curtains on a computer screen and asked to predict which one had an
image behind it, the other just covered a blank wall. Once the participant made
their selection the computer randomly positioned an image behind one of the
curtains, then the selected curtain was pulled back to show either the image or
the blank wall the images were randomly selected from
one of three categories: neutral, negative, or erotic. If participants selected the
curtain covering the image this was considered a hit. Now with there being
two curtains and the images positions randomly behind one of them, you would
expect the hit rate to be about fifty percent. And that is exactly what the
researchers found, at least for negative neutral images however for erotic images the hit rate
was fifty-three percent. Does that mean that we can see into the future? Is that
slight deviation significant? Well to assess significance scientists usually
turn to p-values, a statistic that tells you how likely a result, at least this
extreme, is if the null hypothesis is true. In this case the null hypothesis
would just be that people couldn’t actually see into the future and the
53-percent result was due to lucky guesses. For this study the p-value was .01 meaning there was just a one-percent chance of getting a hit rate of
fifty-three percent or higher from simple luck. p-values less than .05 are generally considered significant and worthy of publication but you might
want to use a higher bar before you accept that humans can accurately
perceive the future and, say, invite the study’s author on your news program; but
hey, it’s your choice. After all, the .05 threshold was arbitrarily selected by
Ronald Fisher in a book he published in 1925. But this raises the question: how much
of the published research literature is actually false? The intuitive answer
seems to be five percent. I mean if everyone is using p less than .05
as a cut-off for statistical significance, you would expect five of
every hundred results to be false positives but that unfortunately grossly
underestimates the problem and here’s why. Imagine you’re a researcher in a field
where there are a thousand hypotheses currently being investigated. Let’s assume that ten percent of them
reflect true relationships and the rest are false, but no one of course knows
which are which, that’s the whole point of doing the research. Now, assuming the
experiments are pretty well designed, they should correctly identify around
say 80 of the hundred true relationships this is known as a statistical power of
eighty percent, so 20 results are false negatives, perhaps the sample size was
too small or the measurements were not sensitive enough. Now considered that
from those 900 false hypotheses using a p-value of .05, forty-five false
hypotheses will be incorrectly considered true. As for the rest, they
will be correctly identified as false but most journals rarely published no
results: they make up just ten to thirty percent of papers depending on the field,
which means that the papers that eventually get published will include 80
true positive results: 45 false positive results and maybe 20
true negative results. Nearly a third of published results will be wrong even with the system working normally,
things get even worse if studies are underpowered, and analysis shows they
typically are, if there is a higher ratio of false-to-true hypotheses being tested
or if the researchers are biased. All of this was pointed out in 2005 paper
entitled “Why most published research is false”. So, recently, researchers in a
number of fields have attempted to quantify the problem by replicating some
prominent past results. The Reproducibility Project repeated a hundred
psychology studies but found only thirty-six percent had a statistically
significant result the second time around and the strength of measured
relationships were on average half those of the original studies. An attempted
verification of 53 studies considered landmarks in the basic science of cancer
only managed to reproduce six even working closely with the original
study’s authors these results are even worse than i just calculated the reason
for this is nicely illustrated by a 2015 study showing that eating a bar of
chocolate every day can help you lose weight faster. In this case the
participants were randomly allocated to one of three treatment groups: one went on a low-carb diet, another one on
the same low carb diet plus a 1.5 ounce bar of chocolate per day and the
third group was the control, instructed just to maintain their regular eating
habits at the end of three weeks the control group had neither lost nor
gained weight but both low carb groups had lost an average of five pounds per
person the group that a chocolate however lost
weight ten percent faster than the non-chocolate eaters the finding was statistically
significant with a p-value less than .05 As you might expect this news
spread like wildfire, to the front page of Bild, the most widely
circulated daily newspaper in Europe and into the Daily Star, the Irish Examiner,
to Huffington Post and even Shape Magazine unfortunately the whole thing had been
faked, kind of. I mean researchers did perform the experiment exactly as they
described, but they intentionally designed it to increase the likelihood
of false positives: the sample size was incredibly small, just five people per
treatment group, and for each person 18 different measurements were tracked
including: weight, cholesterol, sodium, blood protein levels, sleep quality,
well-being, and so on; so if weight loss didn’t show a significant difference
there were plenty of other factors that might have. So the headline could have
been “chocolate lowers cholesterol” or “increases sleep quality” or… something. The point is: a p-value is only
really valid for a single measure once you’re comparing a whole slew of
variables the probability that at least one of them gives you a false positive
goes way up, and this is known as “p-hacking”. Researchers can make a lot of
decisions about their analysis that can decrease the p-value, for example let’s
say you analyze your data and you find it nearly reaches statistical
significance, so you decide to collect just a few more data points to be sure then if the p-value drops below .05 you
stop collecting data, confident that these additional data points could only
have made the result more significant if there were really a true relationship
there, but numerical simulations show that relationships can cross the
significance threshold by adding more data points even though a much larger
sample would show that there really is no relationship. In fact, there are a
great number of ways to increase the likelihood of significant results like:
having two dependent variables, adding more observations, controlling for gender,
or dropping one of three conditions combining all three of these strategies
together increases the likelihood of a false-positive to over sixty
percent, and that is using p less than .05 Now if you think this is
just a problem for psychology neuroscience or medicine, consider the
pentaquark, an exotic particle made up of five quarks, as opposed to the
regular three for protons or neutrons. Particle physics employs particularly
stringent requirements for statistical significance referred to as 5-sigma or
one chance in 3.5 million of getting a false positive, but in 2002 a Japanese
experiment found evidence for the Theta-plus pentaquark, and in the two years
that followed 11 other independent experiments then looked for and found
evidence of that same pentaquark with very high levels of statistical
significance. From July 2003 to May 2004 a theoretical paper on pentaquarks
was published on average every other day, but alas, it was a false
discovery for their experimental attempts to confirm that theta-plus
pentaquark using greater statistical power failed to find any trace of its
existence. The problem was those first scientists
weren’t blind to the data, they knew how the numbers were generated and what
answer they expected to get, and the way the data was cut and analyzed, or p-hacked,
produced the false finding. Now most scientists aren’t p-hacking
maliciously, there are legitimate decisions to be made about how to collect, analyze and
report data, and these decisions impact on the statistical significance of
results. For example, 29 different research groups were given the same data
and asked to determine if dark-skinned soccer players are more likely to be
given red cards; using identical data some groups found there was no
significant effect while others concluded dark-skinned players were
three times as likely to receive a red card. The point is that data doesn’t speak for
itself, it must be interpreted. Looking at those results it seems that dark skinned players are
more likely to get red carded but certainly not three times as likely;
consensus helps in this case but for most results only one research group
provides the analysis and therein lies the problem of incentives: scientists
have huge incentives to publish papers, in fact their careers depend on it; as
one scientist Brian Nosek puts it: “There is no cost to getting things wrong,
the cost is not getting them published”. Journals are far more likely to publish results that reach statistical
significance so if a method of data analysis results in a p-value less than
.05 then you’re likely to go with that method, publication’s also more
likely if the result is novel and unexpected, this encourages researchers
to investigate more and more unlikely hypotheses which further decreases the
ratio of true to spurious relationships that are tested; now what about
replication? Isn’t science meant to self-correct by having other scientists
replicate the findings of an initial discovery? In theory yes but in practice
it’s more complicated, like take the precognition study from the start of
this video: three researchers attempted to replicate one of those experiments,
and what did they find? well, surprise surprise, the hit rate they
obtained was not significantly different from chance. When they tried to publish
their findings in the same journal as the original paper they were rejected.
The reason? The journal refuses to publish replication studies. So if you’re
a scientist the successful strategy is clear and don’t even attempt replication
studies because few journals will publish them, and there is a very good
chance that your results won’t be statistically significant any way in
which case instead of being able to convince colleagues of the lack of
reproducibility of an effect you will be accused of just not doing it right. So a far better approach is to test
novel and unexpected hypotheses and then p-hack your way to a statistically
significant result. Now I don’t want to be too cynical about this because over
the past 10 years things have started changing for the better. Many scientists acknowledge the problems
i’ve outlined and are starting to take steps to correct them: there are more
large-scale replication studies undertaken in the last 10 years, plus
there’s a site, Retraction Watch, dedicated to publicizing papers that
have been withdrawn, there are online repositories for unpublished negative
results and there is a move towards submitting hypotheses and methods for
peer review before conducting experiments with the guarantee that
research will be published regardless of results so long as the procedure is
followed. This eliminates publication bias, promotes higher powered studies and
lessens the incentive for p-hacking. The thing I find most striking about the
reproducibility crisis in science is not the prevalence of incorrect information
in published scientific journals after all getting to the truth we know is hard
and mathematically not everything that is published can be correct. What gets me is the thought that even
trying our best to figure out what’s true, using our most sophisticated and
rigorous mathematical tools: peer review, and the standards of practice, we still
get it wrong so often; so how frequently do we delude ourselves when we’re not
using the scientific method? As flawed as our science may be, it is far away more
reliable than any other way of knowing that we have. This episode of veritasium
was supported in part by these fine people on Patreon and by Audible.com, the
leading provider of audiobooks online with hundreds of thousands of titles in
all areas of literature including: fiction, nonfiction and periodicals,
Audible offers a free 30-day trial to anyone who watches this channel, just go
to audible.com/veritasium so they know i sent you. A book i’d recommend is
called “The Invention of Nature” by Andrea Wolf which is a biography of Alexander
von Humboldt, an adventurer and naturalist who actually inspired Darwin to
board the Beagle; you can download that book or any other of your choosing for a
one month free trial at audible.com/veritasium so as always i want to thank
Audible for supporting me and I really want to thank you for watching.

100 comments

  1. you should not trust any thing less you witness it for your self! if you have not witness you are a repeater most academics are repeaters never proving any thing for them self ! direct observations can also be misleading and result in the wrong conclusion, some conceptions are imaginary and can differ among people so you must be skeptical always never trust a scientist !

  2. The problem is further compounded by a plethora of tenured hucksters working in universities. Who would bell that cat?

  3. This totally calls for a tech/software solution. Seems like the journals are to blame for a lot, as well as anything else that's picked arbitrarily. What if there was something like YouTube but for research? Or something to track data points, from which you can then make your analysis (but always more data points can be added).

    Lowering the barriers, making things more open and transparent. A website where anyone can easily see the data, the sponsors, etc.

  4. "publication is also more likely if the result is novel and unexpected"

    That's the full issue. Don't expect things to be resolved in one study, and judge findings in terms of replication history and theoretical consistency.

    As a rule of thumb, I don't really judge a finding as stable until it's been around for about 10 years.

  5. How does this video have 1k down votes? That’s 1.25% of the ratings given. That is more than most “top 10 alien abduction” videos. People really don’t want the truth they just want to hear that wine cures cancer or heart disease or whatever.

    Best video ever made people need to know this. Sometimes people talk about studies that are flawed and when I say I doubt it’s correct I get laughed at. “It was published, are you smarter than such and such scientist. Have you ever been published?” Yeah I am smarter than some of them but besides that let’s look at these test groups.

    Everyone has bias. I remember a study that said if a parent of a baby smokes outside the house the baby is more likely to die of SIDS. The conclusion being that smoke residue was causing SIDS.

    How on EARTH could you control this study. Smokers and non-smokers are very different people with different habits. Pretending that the relationship between two factors is causal when it is much more likely to be correlated is scientific malpractice. Poverty could be driving both factors, maybe genetics is driving both factors OR ONE OF LITERARY MILLIONS OF OTHER FACTORS!

    We will listen to anything we want to believe. “Smokers are monsters” is easier to handle than “sometimes babies die for no reason” and I get that.

    Great video keep up the good work!

  6. Science has become its own "Religion". It is blasphemy to question Science and it has become a heresy to prove Science wrong. The Scientific community has embraced paradigms for the $$$$$$$!

  7. Yes. Peer review is censorship.
    False assumptions create false conclusions. No gravity, no particles, no quantum, no Big Bang, no vacuum of space.
    Try electric universe, reciprocal theory, geocentric….one not zero…

  8. your message at the end was one that EVERYONE needs to hear in today's world, to question their biases and assumptions and give others the benefit of the doubt until they've discussed life more

  9. The Expertise of Experts – https://youtu.be/J8nk7GrB-zs

    The Authoritarian View of Knowledge: Peer Review – https://youtu.be/zR38CtjD__o

  10. Fundamentally the null hypothesis doesn't give you comfort that something is true, it just gives you comfort that you haven't yet been able to prove it false. Which usually means you just haven't tried hard enough.

  11. As long as researchers have the sign that says , Publish or Perish, on their office walls there certainly seems like there will be a strong incentive to P hack and such, all the more so if their is little cost for getting it wrong as compared to not publishing at all..

  12. what about statistics, are they trustworthy? i feel like mainstream media nitpicks certain statistics and avoid others, as in domestic violence against women vs men, which, if u dig deep, youll find men are victims just as much as women.

  13. I can see the future in my dreams like cesers wife and the guy that saw the 911 attack hundreds of years ago so explain that

  14. It's worth asking how much Science is published every year, how widely it is read, and what the level of Scientific Literacy is for the consumers of the findings. More broadly speaking, the point Veritasium made about interpretation concerns not just the Scientific Community itself, but the culture as a whole.

    At what point does an individual consume more information than they can interpret? And what happens in the absence of interpretation? The premise of the Internet is that 'more is good,' the more information we have the better informed we will be. Is this the case?

    If %100 of all Science Publications were 100% correct 100% of the time, would the public be able to draw the right conclusions from the conclusions? If we have more information than we can handle, what happens when it turns out that information is not true?

  15. Careful, telling Huffpo they are not following journalism standards and ethics by not vetting their story's, and you will be labeled a right-wing nut before you know it!

  16. https://m.youtube.com/watch?v=VooaLRqTSPI

    A NOVA episode from 1987 asked "Do Scientist Cheat?" and concluded that 48% of all published research included trimming, padding or cookingof data, IF the data wasn't fabricated. And, that whistleblowers suffered worse than the cheaters.

  17. I'd say that mathematical proofs is a stronger "way of knowing" than the scientific empirical methodology. Of course it is for a different scope of knowledge.

    I think the scientific method is the best way to know about the natural physical world. But this video points out all the ways we fail in the nullius in verba (nothing by word).

  18. Your p value is wrong. There is waaaay more than 1% chance of getting 53% on a random outcome. Are you kidding me???

  19. This is why I draw the exact opposite conclusion from any news feature highlighting the findings of “exciting new research “

  20. This was published in 2005: https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant?fbclid=IwAR0fbachXAefTLRT71yxzPmMU2wwld5KXW4GW70Q2YLfIxT8_K4kwDr1PLo

  21. When there are billions to be made, wrong research can kill a lot of people.. USGOV Food Pyramid used bad studies, and that has caused many millions of cases of Type 2 diabetes, and millions of obese Americans who are now pre-Type 2. But that bad research created companies who pushed Statins, to become very very rich.

  22. so much for the scientific way. selective data acquisition. unpublishable reproduction studies. humanity needs to be rebooted with a new operating system, memory controller, universal interface….

  23. The first example given is not false. The claim isn't that esp is real, the claim is that a certain experiment had a specific result. Anyone who understands science, understands that. One quick way to identify pseudoscience or popularization is the synthetic claims that it makes.

  24. That Chocolate Study was done deliberately as part of the aurthor's true experiment, he wanted to see how far his "scientific study" and it's results would go before someone started questioning it, he proposed the idea that since people wanted it to be true, it would be widely accepted in not just mass media but professional journals as well…he was right.

  25. Man, that sounds Chinese to me, I have not understood this even though I understand most of your videos. Keep it more simple !

  26. Whether everybody likes to admit it or not everybody has preconceived notions prior to an experiment will it work now he'll know it won't work so they're not going to dig as deep as they possibly could to make it work or vice versa they say all it's going to work it's going to work and then they get unsatisfactory results and they keep tweaking the experiment until they get the results they want

  27. You missed some important errors. First, the ESP study cheated. They did the same test, but did it 3 times (that they told us about. They may have used more types of photos.) When they do it that way, they increase the odds of getting a false positive. There are statistical methods to account for this… which would have nullified the 3% difference.
    Edit: Okay, at 5:30, you got to it… “p hacking” you called it. But, as I just wrote above, there are statistical rules which must be applied when they test multiple variables within one experiment, they must apply penalties to their measurements… which did not happen. So… outright lying or incompetent statistical analysis.
    **
    Perhaps more importantly, precognitive abilities have been well established over the centuries. Christians have prophesied accurately since creation. Recently, on September 3rd, for example, I warned that evil spirits were afflicting men in South Africa to go and murder and rob Christians. On the 9th, there was a riot with murder and looting. They attacked foreigners in Cape Town.
    Further, you should see my blog post “Inventions that came in dream: largest collection on the Internet.’ I’ve been adding to that post for years. Many of the most famous books, inventions, and movies came from ideas received in dreams or visions.

  28. Your final statement is WRONG. The scientific method is not “far and away” the best method of knowing we have. PRAYER of people with an established relationship with God is the best way of knowing.

  29. Finally someone can makes a sense to this. I always told my parents to not just believe some shady research they found on internet. I always said even if it's got published you need to compare the results with another comparing research with good methods. They never listen and keep sharing those shady articles.

  30. Another problem is people seem want to believe something that sounds good. Even when it's too good to be true. When you're trying to tell them that it's probably not true they will defend their false believe so hard and wouldn't listen.

  31. The only scaring thing is that there are "scientist" that are surprised by this. This is common knowledge for physicians and mathemaics. This happens when people do not understand what statistical significant actually means.

  32. I love veritasium, but this video was pretty much useless in its conclusion. In summary: "scientific studies are prone to being wrong, but it's the best we have so….." Yeah, useless.

  33. Yes and no, it depends whether it's physics or math. Physics is wrong, but math is always valid (but retarded). I'm one who thinks the Sun goes around the Earth, so meh.

  34. There's also the journal "Series of Unsurprising Results in Economics," which publishes results everyone would have expected.

  35. Research is a temporary measurement on certain individuals. It's not wrong. It just doesn't work on the whole world. Using the data wrong is where it all goes to sh

  36. All that things you tell us getg worse when lobbies and government pressure scientists…. Example: global warming..

  37. Glad to see you address these issues. Bias against publishing negative results is a major issue. Fraud and conflict of interest are widespread.
    Food for thought: Most (all) placebo-controlled studies use sugar pills for placebo which is not biologically inert, and can mask marginal effects, and invalidating many findings.

  38. Civilization is choked by political correctness, propaganda, lies, deceit, deception, fraud, exploitation. Politicians/statistics/industrial capitalism/the profit system/advertising. ENJOY.

  39. Wrong is a dangerous word. The publisher shares the responsibility with the researchers of the wrong researches. Who judges for wright and wrong researches who can judge that the committee it self contains researchers who published wrong researches

  40. Why does he show that true positives, true negatives and false positives are published, but NOT false negatives? (As the animation 3.15 shows)

  41. This is exactly why research is published for peer review in the first place. the studies need to be replicated and confirmed.

  42. The very fisrt day I started googling for how to write my first paper, this video appears. Good job. The more people know, the better the results may be!

  43. This is why we have so many lunatic minority groups… If they have a ridiculous idea its always possible to find an equally ridiculous "study" that kind of supports it.

  44. even better when we build research on a foundation of research making everything wrong to the point of stupidity aka 65% of modern scientific theory.

  45. When something is published it's proof that someone with the ability to publish things decided to publish that thing.

  46. There are significant correlations between skin color and ethnicity, correlations between ethnicity and culture/ socioeconomic class, and correlations between culture/ socioeconomic class with aggressiveness/ attitude. A study that shows that players with darker skin are more likely to receive a red card is no indication of discrimination. Especially if the skin color of the referee is taken into account— given the fact that it is likely a darker skinned referee can issue and darker skinned player a red card. The conclusions drawn from these studies are often heavily biased as well.

  47. In November of 2018, the Journal of Nature published a paper by Xu et. al. of MIT claiming they built "the first heavier than air ion propelled aircraft of any kind to carry its power supply." The paper's claims were soon repeated by many of the major science magazines, and in nearly 100 news articles.
    There is just one problem, it was NOT the "first heavier than air ion propelled aircraft of any kind to carry its power supply!" I built the first aircraft to do that originally about 12 years ago. It is many times more efficient and has been legally patented since 2014. It was widely published by the US Patent office since 2015… Please see US Patent No. 10,119,527.
    There is flight footage of 7 prototypes that can be seen by clicking on my channel icon to the left.

  48. 1:32 p value is not used to decide whether a study is "worthy of publication". The main criterion for 'publication worthiness' is that the research is valid. The failure to find a significant p value merely means the Null hypothesis was not rejected, but only in the instance of that one study, which does not mean the science is wrong, or the alternative hypothesis is false.

  49. Something far and away more reliable than the scientific method? How about engineering? You know the bridge works because it is there, you can drive on it. You know the programming algorithm uses less memory because it does, every single time.

Leave a Reply

Your email address will not be published. Required fields are marked *