MTURK

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

psydstudent2020

Full Member
5+ Year Member
Joined
Dec 7, 2016
Messages
421
Reaction score
519
Has anyone on here ever used Amazon Mechanical Turk to collect data? What steps have you taken to ensure that you only recruit people with >95% ratings?

Members don't see this ad.
 
I spent a bunch of money using MTURK for a study a couple of years ago, after which my tech-industry spouse pointed out that I needed to check the IP addresses of the respondents. Turns out a bunch of them were either bots from the same server or people who used multiple profiles to respond to the same survey. Costly mistake, and also made me feel quite wary of relying on MTURK in general...
 
  • Like
Reactions: 1 users
Members don't see this ad :)
Even if they're not bots, you're bound to lose a huge chunk of data if you're doing things right and include an attention check. Many people use MTurk like an actual job, so the goal is to get through things as quickly as possible and maximize their hourly earnings.
 
You can select workers with >95% approval by using that "system qualification" (on the first "Enter Properties" page when you create your study, under the "Worker Requirements" heading).

I would also recommend using multiple attention checks. In particular, I found using more complex attention checks to be helpful (checks that are simple but have a lot of steps, since bots can outsmart the "Press Strongly Disagree if you are paying attention" checks.)
 
  • Like
Reactions: 1 user
I would echo the pieces about attention checks and bot checks.

Dang bots.
 
Can you still collect data in a short amount of time with attention checks and bot checks?
 
Can you still collect data in a short amount of time with attention checks and bot checks?

I ran a 10 minute study in about 2 hours, and ended up with an adequate sample size. I suppose it depends on the sample size you need, amount of funding, and length of study. I am not sure what the bot situation looks like now (I ran in late spring/early summer).

ETA: the attention and bots checks didn’t add on too much time, but again it may depend on how many measures/tasks/etc you have and how much $ the study costs. The time can add up pretty quick on MTurk.
 
  • Like
Reactions: 1 user
Having meaningful qualitative items can also be a good, informal validity check as it allows you to make sure that their responses are reasonably coherent and respond to the question at hand. I honestly haven't had a lot of issue with this in my Mturk samples, but I found that limiting it to U.S. users really helps. Also, I think Mturk gets unfairly bashed sometimes, when in fact validity issues occur across any recruitment means (undergrads just wanting to meet that 101 research requirement, patients or families lying about medical history to get into a clinical trial, etc).
 
  • Like
Reactions: 4 users
Can you still collect data in a short amount of time with attention checks and bot checks?
Yeh, you can but the rate of collection varies greatly by the pay/length of the task.

Also, I think Mturk gets unfairly bashed sometimes, when in fact validity issues occur across any recruitment means (undergrads just wanting to meet that 101 research requirement, patients or families lying about medical history to get into a clinical trial, etc).
Wait... you mean my SONA participants don't care deeply about paying attention and responding in a valid manner?
 
  • Like
Reactions: 2 users
Any large self-report sample needs several validity data checks, MTURK or otherwise. I think MTURK has the same problems, but it also has a larger proportion of.....enterprising individuals who will use bots to game the system. All too often this gets left out and few look at the data to sort this out. I remember more than a few times in grad school helping out undergrads run some quick senior thesis projects, where they analyzed the data without cleaning it, and I showed them large chunks of their data where someone just picked all of the left-most or right most answers, despite the fact that some questionnaires skew positive to the left on their scale, while similar measures skew right for positive. Definitely changes your analyses once you throw out about 25% of the data.
 
  • Like
Reactions: 1 users
I second the idea it's a little strange to consider Mturk a bad sample source when undergrad samples are still common (or, worse, whatever happens to those studies that recruit on APA list serves for general surveys [list serve recruitment is fine if you are doing research on psychologists or students, but not if it's a broad general study]).

I've found mturk data to be better when:
1. Check with your IRB to see if you can straight up boot people out of the survey without compensation for failing an attention check item that is in the first measure, or who fail criteria (e.g., the survey is for women and they check "man" on a demographics gender item). You're required to compensate people if they choose to not to answer questions, but you aren't ethically required to compensate bots and people who are not looking at survey items or reading instructions.
2. Restrict the participation to people who have 95% approval, but NOT to masters. You need to do like a thousand tasks to be a master. Those are not average people.
3. Have your survey prevent "ballot stuffing"; i.e., the same terminal can only complete the survey once. This can be worked around, but the work around would take longer than just doing a different survey.
4. Restrict participation to the U.S. BUT remember that some folks use VPNs that can bounce off Brazil, India, etc, so IP addresses are not always reliable if you log them.
5. I have gotten much better data from very short surveys and surveys that include a qual/written component. I've actually gotten a couple pretty nice, though basic, qualitative data sets that were parts of bigger projects.

I played around on mturk for a while as a user, to see what the user experience was. You do have a hunt for a little bit to find surveys--there are SO MANY tasks up. There are also an amazing number that are restricted to only masters (requiring masters seems like asking for weird data to me).
 
Members don't see this ad :)
We've used it some. Mostly when we need quick & dirty to cobble together pilot data for something bigger. It is amazing for that purpose since we can run a study in a week-ish for virtually no money and a comically small amount of effort compared to everything else we do. We've joked about shutting the lab down for a year and setting up an assembly line to see if we can pump out one mTurk study a day.

Agree with all the recommendations above. Be warned you may get pushback from reviewers...we find it harder to publish than it probably should be. Studies amenable to being run places like mTurk usually aren't targeting top-tier journals anyways, but even mid-tier journals seem to grade it a bit more harshly. Which I agree is bizarre, given its probably no more biased (and potentially less biased) than many other sampling methods...
 
  • Like
Reactions: 1 users
We've used it some. Mostly when we need quick & dirty to cobble together pilot data for something bigger. It is amazing for that purpose since we can run a study in a week-ish for virtually no money and a comically small amount of effort compared to everything else we do. We've joked about shutting the lab down for a year and setting up an assembly line to see if we can pump out one mTurk study a day.

Agree with all the recommendations above. Be warned you may get pushback from reviewers...we find it harder to publish than it probably should be. Studies amenable to being run places like mTurk usually aren't targeting top-tier journals anyways, but even mid-tier journals seem to grade it a bit more harshly. Which I agree is bizarre, given its probably no more biased (and potentially less biased) than many other sampling methods...

How much do researchers typically pay? I was told to do $1 per participant but that could add up quickly
 
Varies widely depending on survey length and other factors. I've seen anywhere from $0.50 to $5. We've recently gotten folks to answer 150 item surveys for $2, though I'm not sure that is optimal from a data quality perspective

Given both my current projects are close to $500/participant just for incentives (not counting staff time and supplies...which blow that number up closer to $2,000/participant), mTurk still sounds good to me!
 
  • Like
Reactions: 1 user
So I don’t have to worry about seeing identifying information when I collect data?
Nope, you won't see any identifiable information so it makes all sorts of topics easy and IRBs very quick.

The pay will vary depending on survey length as Ollie mentioned. While there may be issues related to data quality, I'm not certain that paying more will recruit only folks who provide good data (you may have a greater likelihood for them, but you'll have an equally better likelihood of those providing bad data also participating - more more is more money as far as incentives go). They key is always involving various attention/performance checks, including smart tasks to weed out bots. This is really where the hassle and manpower goes.
 
  • Like
Reactions: 1 user
To clarify...I wasn't trying to say that paying more will increase response quality. I trying to say that REDUCING survey length will increase response quality...though I imagine worsening response quality over long surveys is compounded when no human interaction is involved, you are making $2 and you are on a platform where many tasks can be done in 30 seconds. I do not believe this problem is in any way unique to mTurkers;)
 
  • Like
Reactions: 1 user
For those who have used MTURK, do you think it is better to have MTURK host the survey or to use qualtrics and simply post the link?
 
Last edited:
Top