PDA

View Full Version : Calculating the statistical probability of cheating


DW Simpson
11-23-2004, 10:48 PM
http://www.newswise.com/articles/view/508480/

Final exams are just around the corner at universities nationwide, and with them the ever-present challenge of student cheating. But this year, a professor at California State University, Sacramento has prepared a new device to assist his colleagues.

Robert G. Mogull, who teaches business statistics, has shown that it takes only a modest amount of effort to thwart cheating students. He has created a simple statistical tool that any instructor can use to detect cheating on multiple-choice Scantron exams.

His technique was published in an article that appeared in the September 2004 issue of the Journal of College Teaching and Learning titled ďA Device to Detect Student Cheating.Ē It involves computing the likelihood of more than one student missing the same test question Ė the easier the test question that students jointly miss, the more likely they cheated. (A brief description of the method appears below.)

ďThis is really a device to confirm an instructorís suspicions. Thatís what itís best for,Ē Mogull says. ďOf course, Iíve learned over the years that students are exceptionally clever at cheating. They can defeat any method of detection that we can come up with. If they are really determined to cheat, they are better at it than we are at stopping them.Ē

Mogull was inspired by a recent real-life classroom teaching experience, in which two of his students missed the same 26 questions out of a total of 100 questions over four different exams. That got the statisticianís heart pumping Ė but not because he was angry.

ďIt was clear in my mind that they had cheated. That wasnít the question,Ē Mogull says. ďI was mainly intrigued. When I saw their test scores and their identically missed questions, I wanted to calculate the probability of it happening at random. The ideas for the study just sort of exploded in my mind and it became one of the easiest papers that Iíve ever written.Ē He computed the chance of two students missing the same 26 questions to be roughly 0.000000000000000004 percent. As for the two students, he gave both of them a failing grade for the course.

Mogull, who believes cheating is much more common than most professors will acknowledge or even recognize, does not plan to employ his cheat-detecting device broadly. He says that even though itís very easy to apply, itís too time consuming to use all the time. He says that the best procedure to reduce student copying is to use different test versions simultaneously so that students canít share answers with others sitting next to them.

Mogullís method, in short: After grading the Scantron forms and conducting an Item Analysis, you have the probability that a student missed each test question. To calculate the likelihood that two students would jointly miss the same question, you square the probability of missing the particular question. Follow that procedure for each identically missed question. Then multiply all the probabilities, subtract from one, and you have the probability that the students collaborated. Stated another way, subtract the percentage chance that they didnít cheat from 100 percent and you have the probability that they did cheat. The paper also offers an option for finding the chance that two (or more) students would miss the same question and also mark the same wrong answer.

Super Silver Haze
11-23-2004, 11:03 PM
First thing I thought of was cheating in the relationship sense of the word.

I think the Computer Science department at the university I graduated from had some kind of program that would analyze similarities between two students' code, to determine how likely it was that one had copied from the other. It was a more complicated thing than just the straight probability approach taken in the article - it would look at patterns in the writing and stuff like that. Or maybe it was just a made-up story the department put out there to scare people.

Hey, 400th post.

Paul Brand
11-24-2004, 12:08 AM
Mogullís method, in short: After grading the Scantron forms and conducting an Item Analysis, you have the probability that a student missed each test question. You should really normalize the data to conform to the supposed cheater's probability. For example, if the overall mean on a test is 65%, and the cheater scored 75%, for each question, you need to make an adjustment that would effectively make it more probable for the cheater to have gotten the question correct.

To calculate the likelihood that two students would jointly miss the same question, you square the probability of missing the particular question.This part seems correct. Follow that procedure for each identically missed question. Correct. Then multiply all the probabilities Wait! You need to consider the questions they both got correct as well (again squaring the probability of them individually getting the question right). Also you need to consider the questions that were inconsistent. This is very complicated. If the probabilities were assumed the same on each question, you could use a binomial distribution. Another thing to consider is the variance between the true mean and the sample mean (ie. due to chance, the sample mean may be different than the expected or true mean. This variance can affect the ending probability) , subtract from one, Correct and you have the probability that the students collaborated. Incorrect, as stated in my objections above Stated another way, subtract the percentage chance that they didnít cheat from 100 percent and you have the probability that they did cheat. Correct The paper also offers an option for finding the chance that two (or more) students would miss the same question and also mark the same wrong answer.That's an excellent consideration.

He computed the chance of two students missing the same 26 questions to be roughly 0.000000000000000004 percent.I would have thought the number would be a bit lower than this (I would think about 2-3 times the number of zeroes). Oh well. It doesn't seem that they considered the probability of selecting the same wrong answers here. Or maybe they just didn't consider the questions they both got correct.

Also, it should be considered that some students are naturally correlated with each other. Perhaps they studied together and are strong in similar subject matters. There's a bunch of unquantifiable factors to consider. Nevertheless, I think the mathematical analysis is a good idea, and gives a good basis to weigh suspicions of cheating. The unquantifiable material can make a difference of a factor in the 100s even, but the above example is very clear cut evidence, even if it doesn't consider the unquantifiable variables.

Maxprime
11-24-2004, 12:24 AM
Had it @ the UT CS department - works very well.

Travis
11-24-2004, 01:00 AM
http://www.newswise.com/articles/view/508480/

Final exams are just around the corner at universities nationwide, and with them the ever-present challenge of student cheating. But this year, a professor at California State University, Sacramento has prepared a new device to assist his colleagues.

w00t! go hornets!

The Drunken Actuary
11-24-2004, 01:04 AM
400th post.:roll:

Westley
11-24-2004, 02:50 AM
I think the Computer Science department at the university I graduated from had some kind of program that would analyze similarities between two students' code, to determine how likely it was that one had copied from the other. It was a more complicated thing than just the straight probability approach taken in the article - it would look at patterns in the writing and stuff like that. Or maybe it was just a made-up story the department put out there to scare people.

I know for a fact that when I was in school, my uni made this claim and it was a complete lie.

Westley
11-24-2004, 02:52 AM
400th post.:roll: :lol:

persephone_ashes
11-24-2004, 07:57 AM
I had a class once where we had to do this horrible problem for stock valuation and it involved linear interpolation (which the non-actuaries in the class weren't all that familiar with), anyway so the 3 people sitting next to me and I formed a study group and really worked on that particular problem. Sure enough after the exam only 4 people got that question right. Now with the linear inter. part we didn't have the same answer to the 4th decimal place or something but we did have the same work. Now only 4 out of around 35 got that question right and the four people sat together so that probably looked really suspicious to the rest of the class, but we sat together because we were friends and because we were friends we studied together. studying together would increase the probablity of giving the same (right or wrong) answers to the same questions. I don't really think you can develop some definitive model that looks only at the answers to people's tests without taking into account other circumstances.

1695814
11-24-2004, 12:29 PM
do away with mc & go completely to wa.

turpin
11-24-2004, 12:52 PM
Another problem is that this procedure assumes independence between questions. There may be issues with people sharing similar misunderstandings of the subject matter systematicly getting the same answers wrong. Similarly, people who studied together may be more likely to make similar errors even if they aren't cheating, because they are using similar ideas and methodologies and share some common misconceptions.

I raise this issue because its happened to me, because people would come to me to fill-in the gaps in their knowledge, and if I wasn't 100% correct, they end up missing the same questions I do. :roll: :shake:

The Mad Hatter
11-24-2004, 01:25 PM
turpin is onto one of the biggest flaws in this approach. The is almost certaintly high correlation among particular questions. For example, question 2 and 7 might be the only questions where the student needs to know how to take a derivative. T

There is another issue along the lines of the birthday problem.

JMO
11-24-2004, 03:16 PM
Go rent the movie "Stand and Deliver." Or even "Tuskegee Airmen", although the testing issue there is peripheral.

Actuary321
11-24-2004, 03:31 PM
I would like to see this method used on Actuarial Exams to see what the probability is that I cheated with someone clear across the country. I would bet among actuarial exam takers (because of the questions) you would be able to find papers that would indicate a very high probablity of cheating where it didn't exist.

Incredible Hulctuary
11-28-2004, 12:57 PM
Not only do some students study together, but there may be particular topics that were not taught very well, or were taught at a time when other distractions prevented the students from attending class or studying the specific topics properly. Or some questions may be just plain difficult for everybody. So the point about missing the easy questions is very important; questions that were missed by nearly the whole class (assuming the whole class wasn't also cheating) should be given less weight or removed from consideration.

DW Simpson
05-01-2007, 12:16 PM
http://www.actuarialoutpost.com/actuarial_discussion_forum/showthread.php?t=108421

Actuary321
05-01-2007, 02:05 PM
So Claude, are you going to bump every cheating thread?

DW Simpson
05-01-2007, 02:11 PM
So Claude, are you going to bump every cheating thread?

Nope, just this one came to mind, so I thought I'd link them together. Thanks!

Examinator
05-01-2007, 09:08 PM
I have a better idea. Let me calculate how much time you've just wasted.

carrot
05-02-2007, 10:13 AM
The statistical probability of cheating is 0.33.

Copy (2) of carrot
05-02-2007, 10:14 AM
The statistical probability of cheating is 0.36.

Size17
05-02-2007, 10:42 AM
Simplicity = useless???


Robert G. Mogull, who teaches business statistics, has shown that it takes only a modest amount of effort to thwart cheating students. He has created a simple statistical tool that any instructor can use to detect cheating on multiple-choice Scantron exams....

“Of course, I’ve learned over the years that students are exceptionally clever at cheating. They can defeat any method of detection that we can come up with. If they are really determined to cheat, they are better at it than we are at stopping them.”...

The ideas for the study just sort of exploded in my mind and it became one of the easiest papers that I’ve ever written.” ....

He says that even though it’s very easy to apply, it’s too time consuming to use all the time. He says that the best procedure to reduce student copying is to use different test versions simultaneously so that students can’t share answers with others sitting next to them.

SirVLCIV
05-02-2007, 11:59 AM
Old thread, but I agree that the independence assumption is mistaken (independence of questions and independence of test-takers).

Further, I doubt the binomial distribution is a valid distribution for this task.

Actuary321
05-02-2007, 12:08 PM
Old thread, but I agree that the independence assumption is mistaken (independence of questions and independence of test-takers).

Further, I doubt the binomial distribution is a valid distribution for this task.
You are absolutely right on the non-independence, but how significant is the dependence? Is it going to be significant enough that the test is going to be of no use?

I don't think I would go into any situation with a statistical test and accuse someone of cheating. But I might use that statistical test to look closer at certain individuals. When this latest story broke, I heard an interview with some guy from ACT (I think). He said on on-line exams, they actually uses realtime statistical methods to determine who to examine closer with the cameras that are monitoring the room.

Kai Su Teknon
05-02-2007, 12:18 PM
Old thread, but I agree that the independence assumption is mistaken (independence of questions and independence of test-takers).


It wouldn't be too difficult to incorporate correlation among questions via some sort of Gaussian copula method.

foghorn
05-02-2007, 12:19 PM
My daughter just called from college. She had her econ final today, and noticed the girl beside her copying her answers. My daughter then put the correct answers on her scantron form, but circled the incorrect answers on her test booklet, the one the girl was copying from. Made me ctm, thinking of the girl comparing her grade to my daughter's.

procrastinator
05-02-2007, 12:21 PM
I am rather confused by this methodology. Let's say there is a 10 question quiz, and two students each miss one question. If it happens to be the same question, then, the probability that they cheated is 1-(.1)*(.1)=.99? That is clearly wrong in so many ways.

hotkarl
05-02-2007, 12:22 PM
The next time I go to the local school to teach kids math (part of a project our company participates in), I will hand them a paper with 3 problems with the following answer choices:

1.5^2

a. 3
b. 7
c. 10
d. 25

2. 4^2
a. 2
b. 6
c. 8
d. 16

3. 6^2

a. 4
b. 8
c. 12
d. 36

We'll see what the probability of cheating is based on their answers.

Single White Shemale
05-02-2007, 12:24 PM
My daughter just called from college. She had her econ final today, and noticed the girl beside her copying her answers. My daughter then put the correct answers on her scantron form, but circled the incorrect answers on her test booklet, the one the girl was copying from. Made me ctm, thinking of the girl comparing her grade to my daughter's.

Wow, that's meen. :swear:

foghorn
05-02-2007, 12:41 PM
Wow, that's meen. :swear:

I know. But at least she got her back by having her copy the wrong answers.

Size17
05-02-2007, 01:00 PM
I am rather confused by this methodology. Let's say there is a 10 question quiz, and two students each miss one question. If it happens to be the same question, then, the probability that they cheated is 1-(.1)*(.1)=.99? That is clearly wrong in so many ways.

Not quite....if it is all True/False then for any question you have .5 chance of missing any one question. Then the probability they cheated is the prob. student A missed it, student B missed it ,and it was the same question 1-(.5)(.5)(.1)(.1) which is even crazier