PhD/PsyD Thoughts/advice on using public data sets

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Cheetah89

Full Member
10+ Year Member
Joined
Jan 29, 2013
Messages
28
Reaction score
17
I'm considering using a publicly available data set for my dissertation. Has anyone here had experience with using public data sets for dissertations or theses, or any research for that matter? Any tips, advice, or cautions? Any data sets you might recommend? I'm neuropsych focused.
 
My only caution would be if this was the only real experience you get with research in graduate school. So much of what people learn is through the planning and implementation of project, not just the analysis and writeup part. If you have plenty of that experience, go nuts. If you don't have experience constructing and carrying through a research project, I'd do whatever I could to get that first.
 
My only caution would be if this was the only real experience you get with research in graduate school. So much of what people learn is through the planning and implementation of project, not just the analysis and writeup part. If you have plenty of that experience, go nuts. If you don't have experience constructing and carrying through a research project, I'd do whatever I could to get that first.

That's true. In my case it was my second dissertation and I had no interest in slogging through the recruitment process again
 
I'm considering using a publicly available data set for my dissertation. Has anyone here had experience with using public data sets for dissertations or theses, or any research for that matter? Any tips, advice, or cautions? Any data sets you might recommend? I'm neuropsych focused.

If you're using large databases like one of those maintained by VA or DHHS, you will need a mentor who has expertise in this kind of work. You might also consider a course on large database research methods (sometimes you can find these in schools of public health). Smaller datasets may be easier to navigate but you might trade off ease of use for issues of sample size, representation, etc.
 
I once published on data from a VA database. Once.
I have a major project with VA data right now that is moving towards numerous publications. Its been beyond intensive. It feels like you need a small army of people to help unless you have someone who knows SQL well enough to draw the data out and port it into a useable form.

Its frustrating because there is so much opportunity to improve Veteran care if the data was more accessible.
 
Not really sure what advice could be needed. Data is data. Depending on the dataset, you might want to read a little about sample weighting but that is about the only true difference if you are working with an already-assembled database. If you will be querying underlying databases, that's obviously a somewhat different matter. I suppose if you are doing really high-end, it becomes more important to have access to high-end computing resources/computing clusters with larger databases so you don't have to sit around for multiple days waiting for your analysis to finish. That's going to matter for things like machine learning algorithms for imaging data, but not a standard regression analysis on single self-report outcomes.

I'm not really sure how we could "recommend" databases. It depends what your research question happens to be.

I will also include a caution against it if you are interested in an academic career. The type of project does get consideration when applying for post-docs/faculty jobs and archival analysis doesn't look great for someone planning a research career. If you aren't, it matters less.
 
Thanks very much for all the helpful information everyone!
 
I will also include a caution against it if you are interested in an academic career. The type of project does get consideration when applying for post-docs/faculty jobs and archival analysis doesn't look great for someone planning a research career. If you aren't, it matters less.

How so? As a part of a skill set, I've found it to be super useful to know how to use big national data sets. Literally just submitted a paper with one, just now. NIH has grants for secondary analyses.
 
This is available to me. Have not pursued it. Too busy actually making money. It's not that much, but it's. better than zero dollars on my Friday night. 🙂 And when not on the clock, I just can't find the motivation. 🙁.

"You are a poor scientist, Dr. Venkman!"
 
To clarify...doesn't look great <for a dissertation>. In other words, it would likely be viewed negatively (if noticed by the committee) if a student graduating had never done their own original data collection before. Not that its a catastrophe - just that committees do tend to look at the scope of the projects undertaken and someone doing a complex laboratory study or clinical trial for their diss is going to have a leg up over someone who did archival analysis. That could certainly be overcome in other ways - I'm not talking absolutes here. Someone with 20 pubs is still going to beat out someone with 1, regardless of type of project. Of course, it also depends what level we're talking about (R1 AMC will care, a teaching college would be thrilled if you even had a publication, in between I imagine it depends on umpteen other factors). This is for psychology departments and AMCs. In other fields (e.g. epi) its much more common. This comes from numerous folks I've talked to making hiring decisions across multiple institutions, so its not just me.

For other projects and once that job is secured - I don't think it matters a lick. Be a productive researcher and all will fall into place.

For the record - I've published plenty of things off other people's studies too and I'd actually say its virtually necessary to do so to get tenure these days. We just submitted a 2.5 million dollar grant for a massive archival project. I'm by no means against it. Just think its important for a student planning on going into academia to demonstrate competence <running> a study as well as analyzing one.
 
Last edited:
Top