Saturday, July 18, 2015

Garbage In, Garbage Out

In the aftermath of our twitter chat on robograding, Scott Petri tweeted at me this story from IHD about turnitin.  The gist of the story is that turnitin doesn't work as well as google searching for catching plagiarism.  The story is based on a study by a UT librarian, Dr. Susan Schorn.  The study is a classic case of Garbage In, Garbage Out research.  Here's why:

1.  The amount of cheating doesn't matter.  

Schorn complains that turnitin doesn't capture every instance of plagiarism in her mocked up essays.  (Her 2015 update is here and the 2007 original is here.)   This is a red herring.  I don't need to catch every instance of plagiarism in a a paper.  I only need to catch one.  If a student only has one small instance of plagiarism it's likely an editing error and not plagiarism.  In the final stage of editing, it's easy to drop a footnote when you are moving things around.  I'm smart enough to recognize when that happens.  Student plagiarists, in my experience, tend to take big chunks and lots of them.  If turnitin catches only one out of six instances of copying in a paper, that student is still caught.  I don't need to catch every instance in a paper, just one.  Turnitin will catch them.

2.  False positives.

Schorn didn't identify what she meant by false positives but I have a pretty good idea of what she meant.  In my experience, students' similarity index on a paper should be somewhere between 5 and 40 percent depending on the type of assignment and whether I have assigned it previously to other classes.  In a research paper, students should have footnotes and a bibliography that look like other people's footnotes and bibliographies.  All of that stuff should be a match.   It's not plagiarism but it shows up as a match.  What Schorn dislikes, I love.  My students struggle with both the mechanics of citation and remembering to do it.  By allowing my students to see their match scores, they are often able to fix problems that then register as false positive.  For a ninth grader that might mean rewriting a paper to include quotes from the sources of a DBQ because their match score was 0.  For an eleventh grader with a low match score on a research paper, it's typically because they've forgotten to include their bibliography (typically done in noodletools) with their paper.  About half my 11th graders would have lost serious points for not turning in bibliographies if it weren't for turnitin. 

3.  She didn't do a control group.  

I'm not asking for a double blind study here. Look, plagiarism detection is like birth control, it only works if you use it.  Turnitin is the pill,  it may not always work but it's easy to use and it's way better than nothing.  Google searching is the rhythm method.  It's easy to say you are doing it, but the actual effort required to have it be effective makes it more honored in the breach.   In her study, she google searched all six papers for multiple instances of plagiarism.  How likely are teachers to do that?  Or will they only search ones that make their spidey sense tingle.  This study only makes sense if she would have designed it so that there were three groups: a) all papers google searched b) only papers which instructors flagged as plagiarized google searched, and c) turnitin.  Which brings me to....

4.  Time

Of all the concerns that Schorn complains about, the time concern is the biggest one she gets wrong.  Schorn claims that it takes less time to use google's recommended format for checking than it does to learn turnitin.  It took me about fifteen minutes to figure out turnitin and it takes me about half an hour to teach my students how to use it.  After students turn in their papers it can take maybe an hour (usually less) for turnitin to do a match score.  At the end of the day, I can look at the papers and red-flag the ones with high match scores for further investigation immediately without having to read all the papers.  This allows me to move quickly on likely cases.  There are few things my dean hates more than dealing with a plagiarism case 6 weeks after it happened.  (It can take me two to three weeks to grade a set of papers and another two to three weeks to get through our hearing process).  Further, I can be doing other things while turnitin does the work.  With Schorn's preferred Google method, I have to type in word strings to try to catch the matches.  That means I have to do that work for each and every paper (not just the ones I suspect, I'm trying to be fair here).  That is very time intensive.  Incredibly, Schorn's claim that turnitin as "requir[es] more... hands on instructor time than Google" is, to put it bluntly, nonsense.  

5.  She didn't test for copying from other students work.  

Granted this might be less of a problem in college (although I'm pretty sure frat files of papers aren't urban legends), but it's a big problem in high school.  Given the more structured and narrow nature of many high school assignments, and their frequency, this is where the majority of our plagiarism cases fall out.  A student fails to hand in an assignment, hands it in late after the others are graded and returned, copies a friends (making some changes, of course), and turns in the assignment.  Google isn't going to catch that because chances are, I won't remember the other assignments well enough to notice nor would I necessarily have a copy of the old work, nor would I necessarily have the time to hunt it down.  If I'm not using turnitin, chances are I'm not catching this case.  

6.  She complains about culture, but her model is actually less student-centered.  

As I indicated earlier, I think turnitin is a great mechanism for helping students to learn how to write better history essays if students are allowed access to their match scores prior to handing in their papers.  This makes students a partner in the plagairism conversation and allows me to work with students whose scores are both too low and too high.  Note that many students whose match scores are abnormally low are high actually don't have problems with their papers, and we talk about that.   I also try to make sure every student goes through each match to see if it is something that needs a citation and that the citation is properly done.  This is a really important step, especially for high school students.  It also helps students make sure that every quote has a footnote with it.   I'll spend half a class period or more on this working with students around this with every paper and I found turnitin to be incredibly helpful to students for these issues.  The google search method puts all the burden on me as a cop policing my students, the exact complaint that Schorn makes about turnitin.  The fact is, she didn't use it properly in either of the tests and then faults the product for her own failures.   

7.  Privacy concerns.  

I get that there are concerns around privacy and students.  Turnitin has issues in this area.  But Google doesn't?  Google?  I'll take turnitin for my History of Violence class projects thank you.  I can only imagine what my search history would look liket to the feds after plagiarism checking papers on ISIS, the Tamil Tigers, various violent white supremacist groups etc.

TL;DR  Schorr doesn't understand how teachers actually use turnitin or catch plagiarists and designed a project that would make google look good and turnitin look bad.  In that she accomplished her mission.  But as they say, Garbage In, Garbage Out.  


  1. How hard is it to learn to use TurnItIn?

  2. 15 minutes plus 15 more futzing around time.