Saturday, July 18, 2015

Garbage In, Garbage Out

In the aftermath of our twitter chat on robograding, Scott Petri tweeted at me this story from IHD about turnitin.  The gist of the story is that turnitin doesn't work as well as google searching for catching plagiarism.  The story is based on a study by a UT librarian, Dr. Susan Schorn.  The study is a classic case of Garbage In, Garbage Out research.  Here's why:

1.  The amount of cheating doesn't matter.  

Schorn complains that turnitin doesn't capture every instance of plagiarism in her mocked up essays.  (Her 2015 update is here and the 2007 original is here.)   This is a red herring.  I don't need to catch every instance of plagiarism in a a paper.  I only need to catch one.  If a student only has one small instance of plagiarism it's likely an editing error and not plagiarism.  In the final stage of editing, it's easy to drop a footnote when you are moving things around.  I'm smart enough to recognize when that happens.  Student plagiarists, in my experience, tend to take big chunks and lots of them.  If turnitin catches only one out of six instances of copying in a paper, that student is still caught.  I don't need to catch every instance in a paper, just one.  Turnitin will catch them.

2.  False positives.

Schorn didn't identify what she meant by false positives but I have a pretty good idea of what she meant.  In my experience, students' similarity index on a paper should be somewhere between 5 and 40 percent depending on the type of assignment and whether I have assigned it previously to other classes.  In a research paper, students should have footnotes and a bibliography that look like other people's footnotes and bibliographies.  All of that stuff should be a match.   It's not plagiarism but it shows up as a match.  What Schorn dislikes, I love.  My students struggle with both the mechanics of citation and remembering to do it.  By allowing my students to see their match scores, they are often able to fix problems that then register as false positive.  For a ninth grader that might mean rewriting a paper to include quotes from the sources of a DBQ because their match score was 0.  For an eleventh grader with a low match score on a research paper, it's typically because they've forgotten to include their bibliography (typically done in noodletools) with their paper.  About half my 11th graders would have lost serious points for not turning in bibliographies if it weren't for turnitin. 

3.  She didn't do a control group.  

I'm not asking for a double blind study here. Look, plagiarism detection is like birth control, it only works if you use it.  Turnitin is the pill,  it may not always work but it's easy to use and it's way better than nothing.  Google searching is the rhythm method.  It's easy to say you are doing it, but the actual effort required to have it be effective makes it more honored in the breach.   In her study, she google searched all six papers for multiple instances of plagiarism.  How likely are teachers to do that?  Or will they only search ones that make their spidey sense tingle.  This study only makes sense if she would have designed it so that there were three groups: a) all papers google searched b) only papers which instructors flagged as plagiarized google searched, and c) turnitin.  Which brings me to....

4.  Time

Of all the concerns that Schorn complains about, the time concern is the biggest one she gets wrong.  Schorn claims that it takes less time to use google's recommended format for checking than it does to learn turnitin.  It took me about fifteen minutes to figure out turnitin and it takes me about half an hour to teach my students how to use it.  After students turn in their papers it can take maybe an hour (usually less) for turnitin to do a match score.  At the end of the day, I can look at the papers and red-flag the ones with high match scores for further investigation immediately without having to read all the papers.  This allows me to move quickly on likely cases.  There are few things my dean hates more than dealing with a plagiarism case 6 weeks after it happened.  (It can take me two to three weeks to grade a set of papers and another two to three weeks to get through our hearing process).  Further, I can be doing other things while turnitin does the work.  With Schorn's preferred Google method, I have to type in word strings to try to catch the matches.  That means I have to do that work for each and every paper (not just the ones I suspect, I'm trying to be fair here).  That is very time intensive.  Incredibly, Schorn's claim that turnitin as "requir[es] more... hands on instructor time than Google" is, to put it bluntly, nonsense.  

5.  She didn't test for copying from other students work.  

Granted this might be less of a problem in college (although I'm pretty sure frat files of papers aren't urban legends), but it's a big problem in high school.  Given the more structured and narrow nature of many high school assignments, and their frequency, this is where the majority of our plagiarism cases fall out.  A student fails to hand in an assignment, hands it in late after the others are graded and returned, copies a friends (making some changes, of course), and turns in the assignment.  Google isn't going to catch that because chances are, I won't remember the other assignments well enough to notice nor would I necessarily have a copy of the old work, nor would I necessarily have the time to hunt it down.  If I'm not using turnitin, chances are I'm not catching this case.  

6.  She complains about culture, but her model is actually less student-centered.  

As I indicated earlier, I think turnitin is a great mechanism for helping students to learn how to write better history essays if students are allowed access to their match scores prior to handing in their papers.  This makes students a partner in the plagairism conversation and allows me to work with students whose scores are both too low and too high.  Note that many students whose match scores are abnormally low are high actually don't have problems with their papers, and we talk about that.   I also try to make sure every student goes through each match to see if it is something that needs a citation and that the citation is properly done.  This is a really important step, especially for high school students.  It also helps students make sure that every quote has a footnote with it.   I'll spend half a class period or more on this working with students around this with every paper and I found turnitin to be incredibly helpful to students for these issues.  The google search method puts all the burden on me as a cop policing my students, the exact complaint that Schorn makes about turnitin.  The fact is, she didn't use it properly in either of the tests and then faults the product for her own failures.   

7.  Privacy concerns.  

I get that there are concerns around privacy and students.  Turnitin has issues in this area.  But Google doesn't?  Google?  I'll take turnitin for my History of Violence class projects thank you.  I can only imagine what my search history would look liket to the feds after plagiarism checking papers on ISIS, the Tamil Tigers, various violent white supremacist groups etc.

TL;DR  Schorr doesn't understand how teachers actually use turnitin or catch plagiarists and designed a project that would make google look good and turnitin look bad.  In that she accomplished her mission.  But as they say, Garbage In, Garbage Out.  

Sunday, July 12, 2015

Robograding: A Macguffin

This is the third in a series of cross-blogged posts on robograding prior to Monday night's twitter #sschat on computer assisted grading (7 pm ET) I am co-hosting with Scott M. Petri.  
Scott's first post is here
My first post in response to Scott is here

As Audrey Watters reminds us, there are two (among many) useful ways to think about education technology.  The first (and the subject of this post) is the macguffin.  Macguffins, if you didn't know, are plot devices in films that are key to the plot but ultimately the film isn't about them.  The most classic example are the useless "letters of transit signed by Charles de Gaulle himself!" in Casablanca (as if de Gaulle's name carried any weight in Vichy occupied Africa).  The macguffin isn't the story, but it hooks us into the story.  Robograding is a macguffin for a larger story.  Why do we want to use computers to assist us in our grading?  Different players in the ed-tech world have different motivations.  I started using computers because my handwriting was terrible.  Scott makes students use the apps he described in his post because he has too many students to work with them individually.  I now use computers to increase transparency and to speed my turnaround time.  ETS spends millions of dollars every year trying to develop decent robograders so they can cut the number of people they pay to grade exams by hand.  And on and on.  The numbers of reason why people use computers to assist them in grading is probably as varied as their are teachers.  But at the core of this is one concept: scale.

Current establishment education reform whether from left or right involves trying to de-skill teachers and scale programs that work.  As I've written elsewhere, I don't believe good teaching scales.   Taking a brilliant classroom teacher who is a good lecturer and building a MOOC around them doesn't scale to good teaching.  The best on-line teachers I have met are generally not great lecturers anyway.  They are people who find the disembodiment of the internet empowering and understand how to create online environments which coax contributions from those who would otherwise be silenced because they were silenced in the classroom in the past. 

So as teachers, we need to be mindful when we think about computer assisted grading.  I like tools that accomplish ends that benefit my students and make both our lives better by allowing us to work smarter not harder.   I dislike tools that end up making me work harder by requiring me to customize assignments to fit the tool, require me to teach a curriculum I didn't create, or make my students miserable by confounding them with bad writing rubrics.  If the robograder is measuring complexity of thought by counting syllables in the words used, then it runs afoul of Anthea's Axiom (named after my colleague in the English department) which is "Big Concepts, Small Words." 

Robograding is a Macguffin for that larger conversation around why we use computers and what we want out of them.  If we solely focus on robograding, we will end up teaching whatever the engineers who write the code for the robograders want us to write.  We need to take make sure that we as teachers determine what the apps do and not the other way around. 

Saturday, July 4, 2015

We are All Cyborgs Now: Robograding, computer assisted grading, and the futures of teachers and computers in assessment

          "Stories are what we live in" wrote Kerwin Klein.  So here is one story.   A couple of weeks ago, Scott Petri, sent out a query on twitter about robograders.  I asked, in response, why anyone would participate in their own de-skilling.  After a bit of back and forth, we returned to our lives and families.  A few days later Scott asked if I would co-host a twitter chat on robograding and perhaps I would do a couple of blog posts on the subject.  I said sure.   Scott and I set up a Skype chat to touch base and he sent me a couple of ed-school articles to read.  I didn't read them.  I was confident in the rightness of my views.  Robograding, I was convinced, was the devil's tool, designed to aid the depersonalization of the classroom and further diminish the wide range of skills that go into teaching.  Robograding, to me, was the beginning of the mechanization of teaching in a way that the sewing machine introduced the mechanization of clothing production.  It was a story that labor historians have been telling for a long time.   It brought up images of bubble tests and scantron and something of which I wanted no part. 

But that's not the story I'm telling today. 

So here's a different story.  This is one about a TA in a Big 10 grad program.  He had really bad handwriting.  He had a hard time grading blue book exams.  To increase his own efficiency and decrease his students frustration with his handwriting, he started numbering the comments in the blue book, typing them up on his computer, and then printing them.  Each student got about one third of a page of feedback, stapled into their blue book.  And thus, in 1994, I began my forays into computer assisted grading.  Since then, I've been employing a number of computer assists to my grading.  In addition to Microsoft Word, I've used a variety of technologies.  I currently use Google Docs, Haiku LMS, and to give feedback.  I can't imagine not grading on a screen.   In short, I do a lot of my grading with computers.  I prefer to call it computer assisted grading.  Of course, I'm lying to myself.  I'm using machines to help me teach more efficiently.  I'm robograding. 

So if we are going to have this conversation about robograding, it can't be judgmental.  We have to ask:  Why do we robograde?  What do we want the machines to do for us as teachers?  What do we want the machines to do for our students?  And perhaps, most importantly we ask:  What don't we want the machines to do?  What lines don't we want them to cross?  What prices are we willing to pay?  What prices are we not willing to pay? 

Many, many years ago the great historian and provocateur Virginia Scharff made me read Donna Harraway's Simians, Cyborgs, Women.  At the time, I missed the point.  I was deeply enamored of Judith Butler's ideas on the social construction of gender and Haraway's ruminations on gender seemed unnecessarily complex and unnecessary.  If gender were a social construction, the biology didn't mattter.  But Haraway recognized that technology was changing our very definitions of self, enabling new discourses and constructions.  She anticipated Katelyn Jenner and the trans-rights movement.

Just this week the great historian and provocateur Audrey Watters challenged us if "It is Time To Give Up on Computers in Schools."  Watters reminds us that technology is never value neutral and teachers need to take control of it lest the ed-tech overlords continue to have their way with us and our students.  Down that latter path lies the dystopian future I think of when I hear the word "robograding."

Which brings me back to Harraway She challenged us to ask "How might an appreciation of the constructed, artefactual, historically contingent nature of simians, cyborgs, and women lead from an impossible but all too present reality to a possible but all too absent elsewhere?" She challenged us to unite as "Cyborgs for earthly survival!"

By having this conversation, I hope we can take control of computer assisted grading for our own purposes and for the futures of our students, our schools, and our country.

Happy Fourth of July.