Study delivers bleak verdict on validity of psychology experiment results

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.
  • Like
Reactions: 1 users
I spent a couple days digging into this stuff last year because I was curious. I would suggest reading the article published in science but also (more importantly, IMO) looking at the Reproducibility Project documents themselves (it's all open access): https://osf.io/ezcuj/wiki/home/

They have archived communications with original authors (e.g., letters explaining what the reproducing authors did differently or sometimes wrongly) that shed some light on how complicated and difficult it is to replicate certain studies. I remember one example where the original study had been done in a different language and the reproduced instructional script used (which was translated into English) was notably different than the original.
 
Members don't see this ad :)
I also wanted to add that this is another example of how the public continuously misunderstands our research. It is not like a chemistry "experiment" that you do in high school lab where if you do it right you get the same exact result. An acid lo will always tiurn the litmus paper pink or was it purple? I forget. Anyway, the constructs we work with are much more complex and there is no litmus test for any of them.
 
  • Like
Reactions: 1 user
Yeah, while I agree that reproducibility is an issue in some areas, it is not nearly the epidemic that some people make it out to be. Not only that, but this taken out of context of science in general is meaningless. You have to ask what reproduction of results looks like in other fields as well. Even the "hard" sciences have an incredibly hard time reproducing results at times. Heck, cancer research is irrevocably messed up because no one even knows what cell lines they are using anymore, and won't pay the extra money to authenticate what they think they are using.
 
  • Like
Reactions: 1 user
Apparently, a few others besides Harry3990 also questioned the approaches taken in the replication studies. Might want to also read this article refuting the bleak verdict:

http://news.harvard.edu/gazette/story/2016/03/study-that-undercut-psych-research-got-it-wrong/

"Gilbert, King, Pettigrew, and Wilson discovered that the low-fidelity studies were four times more likely to fail than were the high-fidelity studies, suggesting that when replicators strayed from the original methods of conducting research, they caused their own studies to fail."

"Readers surely assumed that if a group of scientists did 100 replications, then they must have used the same methods to study the same populations. In this case, that assumption would be quite wrong."

Great article. Thanks for sharing this! Dan Gilbert should use his TED Talk fame and do a Youtube video about this. Here's the commentary article in Science summarizing Gilbert et al.'s reanalysis of the data:
http://science.sciencemag.org/content/351/6277/1037.2.full

Here's also a nice NYT article by Lisa Barrett talking about the failure to replicate being a normal part of science: http://www.nytimes.com/2015/09/01/opinion/psychology-is-not-in-crisis.html
 
Definitely worth noting that this extends to far more than just psychology. All of science is much more subjective than the public thinks it is. And furthermore, there is a lot of fundamental misunderstandings about how the process works (i.e. "Study PROVES x is y"). I've said time and again that one study proves absolutely nothing. 10 studies showing the same thing may or may not prove something. To me, science is fundamentally about using data to drive the evolution of ideas over time. It works best when thought about at a macro level and that is important to keep in mind. It took me a very long time to come to peace with this and was probably the most difficult part of graduate school for me. I was heartbroken when my thesis tanked even though I thought I was studying a robust effect. I got frustrated that the literature on a lot of topics is all over the place and felt like I couldn't draw any firm conclusions. Its not just social psychology and its not just psychology. Heck, neuroscience is arguably an even worse offender. Have the exact same lab run the exact same imaging study twice and I virtually guarantee regions of activation will be at least somewhat different. I'm still reasonably certain the VTA has something to do with reward and the PFC with decision-making, even if individual studies don't perfectly converge on those points. On the subject of cancer research...this is why NIH recently changed the application process. There is a new "Authentication of Key Biological/Chemical Reagents" section meant to address exactly that issue.

I think a lot of this has to do with the screwed up incentive structure in academia. Rewards are largely for publications and/or grants, not science itself. In many ways, this encourages sloppy work and fudging the numbers (which is probably present in subtle, unconscious ways for almost all research and the giant "Data fabrication" scandals are highly unusual and I'd argue less of a problem as a whole). The most successful researcher I ever worked for probably did the worst work. His data was an absolute disaster, things were entered wrong, coded wrong, minimal training for RAs so everyone did something different, etc. I think most of us in the field have to find a middle ground we are comfortable working in, just because there are practical limits. I try not to judge when others opt towards the big picture, but its hard. I suspect I'll continue to be pushed in that direction over time just because of the nature of the incentives. I think its particularly tough as a junior investigator. If I shift from my current institution to one that is less research intensive, that will likely be the reason why - I just don't like how I feel about myself doing work that I know is sloppy.

You might think all of this sounds like I'm down on academia. I'm not and I love it. Its important to keep in mind these issues exist everywhere. Heck, a big part of me thinks the best thing society could do would be to abolish the stock market in its entirety. Business is ALL about fudging the numbers to build a house of cards. Incentives are for being the most profitable, not for being the best. Those are two extremely different things. At least in academia, I like to think we're at least open to having the discussion about whether or not that is a good thing:)
 
  • Like
Reactions: 1 users
A few thoughts. Mostly the recapitulation of others posts.

1. This is why an emphasis on research methods is important. It makes not only better consumers of the literature, but better designers and implementors of studies. I hear all the time about grad students who 'hate statistics' and 'hate research', but it is part of what we do to stay current in the field.

2. I don't think that this is any different than other fields. I do think that there is a greater skepticism towards psychology as a science in the general public, however, and this is what I believe drives a lot of this reactionary process. Controversy in findings is a regular thing (e.g., is healthy fat actually 'healthy'), but when psychology experiences the same thing the quick reaction is "see, you aren't a science".

3. This is why meta-analysis is important. I'm a fan of Sakaluk's theory in this paper although I don't think its always tenable.
http://johnsakaluk.com/wp-content/uploads/2014/06/Sakaluk_2015_JESP.pdf
 
  • Like
Reactions: 1 user
Yeah, while I agree that reproducibility is an issue in some areas, it is not nearly the epidemic that some people make it out to be. Not only that, but this taken out of context of science in general is meaningless. You have to ask what reproduction of results looks like in other fields as well. Even the "hard" sciences have an incredibly hard time reproducing results at times. Heck, cancer research is irrevocably messed up because no one even knows what cell lines they are using anymore, and won't pay the extra money to authenticate what they think they are using.
Definitely worth noting that this extends to far more than just psychology. All of science is much more subjective than the public thinks it is. And furthermore, there is a lot of fundamental misunderstandings about how the process works (i.e. "Study PROVES x is y"). I've said time and again that one study proves absolutely nothing. 10 studies showing the same thing may or may not prove something. To me, science is fundamentally about using data to drive the evolution of ideas over time. It works best when thought about at a macro level and that is important to keep in mind. It took me a very long time to come to peace with this and was probably the most difficult part of graduate school for me. I was heartbroken when my thesis tanked even though I thought I was studying a robust effect. I got frustrated that the literature on a lot of topics is all over the place and felt like I couldn't draw any firm conclusions. Its not just social psychology and its not just psychology. Heck, neuroscience is arguably an even worse offender. Have the exact same lab run the exact same imaging study twice and I virtually guarantee regions of activation will be at least somewhat different. I'm still reasonably certain the VTA has something to do with reward and the PFC with decision-making, even if individual studies don't perfectly converge on those points. On the subject of cancer research...this is why NIH recently changed the application process. There is a new "Authentication of Key Biological/Chemical Reagents" section meant to address exactly that issue.

I think a lot of this has to do with the screwed up incentive structure in academia. Rewards are largely for publications and/or grants, not science itself. In many ways, this encourages sloppy work and fudging the numbers (which is probably present in subtle, unconscious ways for almost all research and the giant "Data fabrication" scandals are highly unusual and I'd argue less of a problem as a whole). The most successful researcher I ever worked for probably did the worst work. His data was an absolute disaster, things were entered wrong, coded wrong, minimal training for RAs so everyone did something different, etc. I think most of us in the field have to find a middle ground we are comfortable working in, just because there are practical limits. I try not to judge when others opt towards the big picture, but its hard. I suspect I'll continue to be pushed in that direction over time just because of the nature of the incentives. I think its particularly tough as a junior investigator. If I shift from my current institution to one that is less research intensive, that will likely be the reason why - I just don't like how I feel about myself doing work that I know is sloppy.

You might think all of this sounds like I'm down on academia. I'm not and I love it. Its important to keep in mind these issues exist everywhere. Heck, a big part of me thinks the best thing society could do would be to abolish the stock market in its entirety. Business is ALL about fudging the numbers to build a house of cards. Incentives are for being the most profitable, not for being the best. Those are two extremely different things. At least in academia, I like to think we're at least open to having the discussion about whether or not that is a good thing:)

Honestly, I think even a lot of academics have no clue how science is conducted in other areas. A lesson on this would be so helpful for everyone, and would get a lot of the erroneous impressions people have about science.

Ie How different are the research methods/statistical analysis that women's studies programs use compared to Sociology compared to Psychology..compared to how stuff is done in Astronomy, etc
 
I also wanted to add that this is another example of how the public continuously misunderstands our research. It is not like a chemistry "experiment" that you do in high school lab where if you do it right you get the same exact result. An acid lo will always tiurn the litmus paper pink or was it purple? I forget. Anyway, the constructs we work with are much more complex and there is no litmus test for any of them.
I'm not sure it's an issue of complexity (chemical interactions are pretty complex), but more one of overall experimental control, precision of definition as well as applicaton of experimental variables, and addition confounds that might be introduced between lab and applied setting (as well as all the other stuff people have posted about general lack of understanding regarding psychological research, effect sizeds, etc.).
 
  • Like
Reactions: 1 user
Members don't see this ad :)
. It is not like a chemistry "experiment" that you do in high school lab where if you do it right you get the same exact result.

Missed this. I think many would be shocked to learn how dissimilar actual chemistry experiments are from high school chemistry experiments (let alone psychology). I actually think we do a great disservice to youth by how we teach science even at the undergraduate (and sometimes graduate) level. Its a system designed to train line cooks...not chefs.
 
  • Like
Reactions: 1 user
Missed this. I think many would be shocked to learn how dissimilar actual chemistry experiments are from high school chemistry experiments (let alone psychology). I actually think we do a great disservice to youth by how we teach science even at the undergraduate (and sometimes graduate) level. Its a system designed to train line cooks...not chefs.
Any insights you can give us on the difference? do they have issues with reliability/validity?
 
Well, not a chemist so I'm certainly not the best poised to answer if you are looking for details. The same holds true in psychology and any other science though. Early education in science is largely about "memorize this." Labs are learning to follow cookbooks. I do A B and C and then D happens. I knew D would happen because my chemistry book said it would. Actual chemistry...A doesn't exist so you have to invent it. B exists in theory but no one has ever actually tried it before and no one knows what will happen. C is know to work but those reagents are no longer available so you have to figure out if Q works just as well. You think D will happen, but E happens instead. You have to backtrack and figure out why.

The above is a much better depiction of the reality of the scientific process whether we are talking about psychology, chemistry, cell biology or anything else.
 
I think Ollie is referring to "demonstration" type projects which are usually pretty far removed from hypothesis- and theory-driven research. At best these projects teach basic lab skills. They don't prepare people to think like scientists. I found this to be true also in college, at least up to general organic chemistry (though that lab was more fun).
 
I actually think we do a great disservice to youth by how we teach science even at the undergraduate (and sometimes graduate) level. Its a system designed to train line cooks...not chefs.
x1000.

The over reliance on memorizing and regurgitating instead of synthesizing and revolutionizing how we think about different aspects of science.
 
  • Like
Reactions: 2 users
I think Ollie is referring to "demonstration" type projects which are usually pretty far removed from hypothesis- and theory-driven research.

Precisely. Lab skills are valuable, but arguably the least important and certainly the least exciting. Probably steers a lot of kids away from science in addition to contributing to public misconceptions about what it is.

Which skills/traits do we want to prioritize cultivating....critical thinking, problem-solving, reasoning, curiosity and skepticism or how to operate a bunsen burner?
 
Isn't qualitative data a big part of the problem here? I mean if you compared quantitative data in the social sciences (ie psychology), would you find that many differences with the physical sciences?
 
Isn't qualitative data a big part of the problem here? I mean if you compared quantitative data in the social sciences (ie psychology), would you find that many differences with the physical sciences?

Yes, there are some notable differences. In psychological studies, usually our outcome measures have more variability that is affected by confounding, uncontrolled independent variables. We do the best we can, but can never control everything, especially with human subjects. This is not unique to psychology though and would also apply to in-vivo studies regardless of field. Living subjects are just complicated that way. You can control (and therefore account for variability) a lot more in-vitro.

Edited to add: This is not really a bad thing at a fundamental level, just a part of doing science with more unexplained variance. Understanding those limitations is essential for interpreting data and becoming comfortable with drawing reasonable conclusions.


Sent from my iPhone using SDN mobile app
 
  • Like
Reactions: 1 users
Internal vs external validity. It's why so many drugs that look very promising in early research fail miserably in early living subjects testing. Also, as we said, reproducibility is a widespread issue in the sciences. I'd encourage some reading up in the issue, some interesting stuff out there.
 
Yes, there are some notable differences. In psychological studies, usually our outcome measures have more variability that is affected by confounding, uncontrolled independent variables. We do the best we can, but can never control everything, especially with human subjects. This is not unique to psychology though and would also apply to in-vivo studies regardless of field. Living subjects are just complicated that way. You can control (and therefore account for variability) a lot more in-vitro.

Edited to add: This is not really a bad thing at a fundamental level, just a part of doing science with more unexplained variance. Understanding those limitations is essential for interpreting data and becoming comfortable with drawing reasonable conclusions.


Sent from my iPhone using SDN mobile app

Right. I've just taken undergrad stats..but a lot of error/variance would mean that we would have a small effect? meaning that the treatment/model itself cannot explain the differences among groups but that it's some other factor (confounding/variance among groups, etc?)
 
We are also limited in what we can control for and randomly assigned because of ethical considerations of using human subjects. Was reading an article in the WSJ about sleep training 3 to 4 month old infants that concerned me because of potential for harm and inability to randomly assign babies to various treatment conditions.
 
  • Like
Reactions: 1 user
Internal vs external validity. It's why so many drugs that look very promising in early research fail miserably in early living subjects testing. Also, as we said, reproducibility is a widespread issue in the sciences. I'd encourage some reading up in the issue, some interesting stuff out there.

Any suggestions?
 
Any suggestions?
Exploring what interests you is the best way. You could look at the controversies surrounding cancer cell lines. You could look at the difficulties of translating treatment research for various things to animal and human studies. Pick a topic, quick lit review, and you've got hundreds of articles to choose from.
 
Exploring what interests you is the best way. You could look at the controversies surrounding cancer cell lines. You could look at the difficulties of translating treatment research for various things to animal and human studies. Pick a topic, quick lit review, and you've got hundreds of articles to choose from.
Thanks

But anybody have some sort of article..actually talking about the research process in let's say chemistry or biology..I am interested to really understand it so I can make comparisons to Psychology. I've tried to do searches and I never get anything relevant.
 
But anybody have some sort of article..actually talking about the research process in let's say chemistry or biology..I am interested to really understand it so I can make comparisons to Psychology. I've tried to do searches and I never get anything relevant.

Try reading the methods section of a journal article from Cell or Organic Letters. But know that without training and experience you can understand a scientific discipline only at the most superficial level.

This might interest you, however: http://undsci.berkeley.edu/article/howscienceworks_01
 
  • Like
Reactions: 1 user
Top