Multiple articles from one dataset?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

futureapppsy2

Assistant professor
Volunteer Staff
Lifetime Donor
15+ Year Member
Joined
Dec 25, 2008
Messages
8,125
Reaction score
7,432
I'm curious as to how you all decide if and how to "slice up" one dataset into multiple articles. On one hand, I've heard (valid) concerns about "salami publishing," where authors just try to make a dataset into as many articles as possible. On the other hand, a lot of journals seem to be leaning towards shorter and more focused articles, and sometimes, trying to put a lot of different research questions in one article can make it feel scattered.

One recent example of this, with some details changed: We published a short article on the whether or not two specific subgroups of participants (e.g., LGB participants as a group and then bisexual participants specifically) had still worse outcomes on a target variable than heterosexual participants when we controlled for their elevated anxiety symptoms. We found that both LGB participants as whole and bisexual participants specifically still had worse outcomes on that variable, even when we accounted for their higher rates of anxiety. We recently did another analysis using the same dataset where we examined whole-sample outcomes on the same outcome variable via LGB v. heterosexual status, anxiety symptoms,and a number of other risk and protective factors as predictors. Is the second analysis "too close" to the first to ethically publish?
 
Too close to ethically publish? Meh. Its a grey area. You're probably cutting it a little closer than I would personally, but its a judgment call for many of the reasons you state. I'm chopping my dissertation into 3-4 papers because combining 3-4 separate neural and behavioral measures into a single paper would be outrageously long. Any grant-funded project needs to be designed to generate multiple papers, otherwise you aren't going to continue getting grants for long.

My tolerance is usually proportional to the amount of effort that goes into collecting/analyzing the data. I get much more irritated about someone chopping up an N = 50 survey of Psych101 students where they just ran some correlations than I would a multisite randomized clinical trial. Some epidemiological studies generate 400+ papers from a single dataset without seemingly crossing that line. I'd just say to use your judgment, make sure you reference your prior study and note that it is the same dataset.
 
Too close to ethically publish? Meh. Its a grey area. You're probably cutting it a little closer than I would personally, but its a judgment call for many of the reasons you state. I'm chopping my dissertation into 3-4 papers because combining 3-4 separate neural and behavioral measures into a single paper would be outrageously long. Any grant-funded project needs to be designed to generate multiple papers, otherwise you aren't going to continue getting grants for long.

My tolerance is usually proportional to the amount of effort that goes into collecting/analyzing the data. I get much more irritated about someone chopping up an N = 50 survey of Psych101 students where they just ran some correlations than I would a multisite randomized clinical trial. Some epidemiological studies generate 400+ papers from a single dataset without seemingly crossing that line. I'd just say to use your judgment, make sure you reference your prior study and note that it is the same dataset.
Thanks, Ollie! FWIW, both articles would contribute something new to the literature (i.e., the worse outcomes in bisexual people in particular, even when controlling for their higher anxiety and the contextual role of LGB status as a significant in the context of other risk/protective factors), so IMO, they both make different, useful contributions. We are explicitly mentioning the previous paper both in the manuscript and in the submission letter, so it's not duplicitous, I don't think. Also, it's a decently large (n=500) dataset from a national sample in which we purposefully collected a fairly large amount of data, so it's not a survey of one Psych 101 section, though it's not an n=10,000 CDC dataset, either. 😉
 
Last edited:
I think you're OK. If I was a reviewer I might dock you a couple points on whatever rating scale. As long as folks are open/honest about it, that is really about all you can ask. I'd equally hate to see a worthwhile contribution go unpublished out of untoward fears it didn't pass some arbitrary threshold (Type 2 publication error?) so I'd just say send it out. Some place will take it eventually if its worth putting out there. I see it as really just being up to the journal/editor where they want to draw the line. As an individual reviewer, I usually will call the editor's attention to it (under the guise of "What does this add") if that hasn't already been done, but view decisions of that nature somewhat outside my purview as a reviewer and don't think it would ever change my reject/accept recommendation on its own.
 
I agree with Ollie all around: funding issues, acknowledgment, and readability for the papers. I think the most important thing when dealing with a larger dataset are making sure to acknowledge ways in which that dataset was used elsewhere. If something turns up as a sample specific finding, its good to be able to track that back (very clearly) to other results. Most do this, some do it more clearly.
 
This is definitely a worthwhile discussion.

I'm on two different large multi-site studies that have a whole process to get this stuff on the front end (pretty common from what I've seen, at least for multi-site studies). It's done via email and it's been helpful to see not only the various proposals and types of analysis being considered, but also the back and forth in regard to, "What does this add?" and considerations of overlap. Utilizing a standard request form to outline scope, proposed co-authors, etc. has also helped compare apples to apples.
 
Have you looked at or are interested in multicultural issues within the research data you've already collected (i.e., do some groups of individuals, Asians, Latinos, African descent, Caucasian, have any significant differences between those broad groups)?

I am always interested in multicultural issues, and have generally been able to pull factors out of multiple datasets related to ethnicity and group differences (or the lack of).
 
Last edited:
I used some big datasets, like European Social Survey, http://www.europeansocialsurvey.org/ , it had about 100K entries. Well, it is quite obvious that it was used in numerous articles.

On the other hand, I've read an interesting opinion that research should only consider statements that were presented in the original research proposal as hypothesis. Fitting data into your research post hoc was considered somehow unprofessional, because you can often find some significant correlations in the dataset, but they would not always be meaningful. A classical example was a correlation between taking a bath or a shower and the altruism.
 
On the other hand, I've read an interesting opinion that research should only consider statements that were presented in the original research proposal as hypothesis.

Eh, that's a bit extreme IMO. I do sympathize with the push for pre-registration of clinical trials and other ideas to improve the fidelity of psychological research. But I can't sign off on the idea that you should just ignore unforeseen trends in your data or never look at your data in a different way. There's a distinction to be made between reckless post-hockery to eke out another publication and exploring data to generate meaningful new hypotheses. Sometimes it's a fine line. When I'm looking at a secondary publication from a large observational study or clinical trial, I look for an indication that this is consistent with the authors' original intent (for example, a planned analysis of responder characteristics in a clinical trial). But if it's an unplanned secondary analysis, I won't necessarily rule it out depending on the quality of the research question, the credibility of the findings, etc.
 
Agreed with MamaPhD. I don't think folks need to only be writing a priori papers from large datasets; there's something to be said for evaluating the data and asking questions that either arose as a result of those a priori hypotheses, or that folks simply didn't think of ahead of time. That being said, it's important to list the restrictions, limitations, etc., of those findings, and to couch their interpretation accordingly.

And as T4C mentioned, with follow-up studies from the same dataset, it's always important to ask, "what does this add?" If the answer is basically just another line on your CV, then don't write the paper.
 
And as T4C mentioned, with follow-up studies from the same dataset, it's always important to ask, "what does this add?" If the answer is basically just another line on your CV, then don't write the paper.

Depends on where you are. Are you still in a trainee status (grad/intern/postdoc) and need to pad your CV for the next step. Definitely write the paper, even if it's a brief report. 🙂
 
Top