It's hard to say. Just going off the students' letter, it sounds like the school may have initially acted on the noise, but then the appendix A suggests that continually accessing the same file in what's clearly automatic refresh was one of the cited grounds for exoneration. I wonder if we aren't getting information generated at different points in the story, ie. is the outside expert talking about the data before the first big group were exonerated using the appendix A criteria, or is this still the kind of evidence the school is using?
Yes, good point - let me try to spell out exactly what happened with these various waves of cleaning.
1. The school did its initial dragnet pull of Canvas data, which generated log files for around 40 students (we're told) whose Canvas accounts showed some activity during times they were taking exams
2. The school saw that a bunch of the log files they had initially produced looked totally absurd, and promptly threw out all the ones that were so massive/random/unrelated to exam content that anyone could see that they weren't generated by humans.
2a. What the school didn't do at this point is ask "huh - why did we just have to throw out so many look files that looked totally ridiculous? Is it possible that there are automatic refresh processes at work that we don't understand?" They may have discarded logs that featured
only large numbers of repeated refreshes of one or two files, but they definitely sustained multiple cases involving logs that had 2-3 page refreshes in the span of less than ten minutes.
3. Instead, the school simply discarded the logs that didn't tell a story that was useful to them, and kept the ones that did superficially appear to be able to tell a story of academic integrity violations. According to the school, this initial cull brought the number of accused students down to 17.
4. The school then did
another wave of data cleaning, to remove all of the data points that didn't tell a clear story of cheating from from the 17 logs. This resulted in a bunch of log files that looked far more clean-cut and less messy than they actually were - and basically hid the fact that there was tons of traffic in these logs that actually didn't make any sense.
5. But there was one more problem. The match-up of Canvas and Examsoft data and selective filtering of all data that didn't represent a clean match appears to have been done by IT people who really didn't understand either the principle of autonomous page refresh
or the course material. As a result, a lot of the match-ups the IT folks thought they had found turned out to be bogus, despite their cherry-picking. They accused students of cheating if there was Canvas activity on a course page whose title seemed to be relevant to an exam question -- but those Canvas pages often turned out to be things like empty discussion forums and instructor announcements that had no meaningful content. Actually opening and reviewing the Canvas pages that were cited in these logs would have fully exonerated at least 4 of the 10 students who still remain accused.
This, by the way, is where the statistical argument comes in: the school basically went on a fishing expedition, discarding datapoints they didn't like at multiple levels of analysis, until they had generated clean-looking logs that seemed to tell a damning story. And even with all of this selective culling, they messed up a significant percentage of the time, and included many associations that weren't actually problematic at all. And in case you're wondering: no, there's no evidence that these file access patterns were the result of students flailing around trying to find specific answers in a bunch of places. The kinds of resources that show up in the logs are just not things that any sane person would access with the hope of finding useulf information during an exam. Additionally, in several of these cases there existed one or more very-obvious Canvas files devoted to the very topic of the exam question that was flagged, which would have been the first place students would have looked for the answers to the questions they were accused of cheating on, had they actually been trying to cheat. The students did not access these very-obvious files, because they were apparently too busy staring at blank discussion pages.