Plagiarism Checker (PC) operates by combing through a text and asking search engines to identify whether a particular sequence of words in the text can be found on the internet. Finding a particular string of words does not necessarily mean that a text or part of a text is plagiarised. But if identical word sequences accumulate in length and if the sources of these similarities are the same then we can be fairly confident that a work has been copied.
Obviously, it is easy to find multiple records of short sequences of words. Set phrases, technical terms, clichés, metaphors, proverbs, sayings, idioms and other collocations contribute to overall similarity in language and cannot be regarded as plagiarism. The free version of PC offers several "word depths" to demonstrate that, as word sequences grow longer, the chances of finding that particular sequence become smaller, especially when original compositions are examined. This table illustrates the principle of using word depths:
| Word Depth | Search No | Search Frame | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 3 | 1 | The | quick | brown | ||||||
| 2 | quick | brown | fox | |||||||
| 3 | brown | fox | jumped | |||||||
| 4 | fox | jumped | over | |||||||
| 5 | jumped | over | the | |||||||
| 6 | over | the | lazy | |||||||
| 7 | the | lazy | dog. | |||||||
| 4 | 1 | The | quick | brown | fox | |||||
| 2 | quick | brown | fox | jumped | ||||||
| 3 | brown | fox | jumped | over | ||||||
| 4 | fox | jumped | over | the | ||||||
| 5 | jumped | over | the | lazy | ||||||
| 6 | over | the | lazy | dog. | ||||||
Another contributing factor to finding "hits" on the web is its sheer size: The more material, the more likely that chance combinations will turn up. Added to this is the fact that search engines may contain only one record of a particular sequence may artificially inflate overall similarity.
On starting PC the results from the search engine begin to be displayed on the screen. Sequences not found are printed in normal, dark blue type, sequences found will be bold and in red. At the end of the search, the following information will be displayed:
Because of the use of short word depths (3-5), longer passages may appear in red. In the above example, at a word depth level of 3, if the phrases "The quick brown" and "fox jumped over" were found in separate documents, then all six words will appear as a hit. Increasing the word depth to 4, however, would result in a miss, if the sequence "The quick brown fox" was not found.
The percentage of words in passages found on the Internet is the overall similarity, as a percentage of the total number of words examined. In general, the overall similarity decreases in original works as a function of the word depth, whereas plagiarised works will show a consistently high overall similarity. However, it is also possible for an unusually high overall similarity at shorter word depths to give a clue as to whether a work might be plagiarised. In order to determine how much similarity is considered "unusually high" the data words examined, words found and word depth are collected in our database and analysed.
Overall similarity, like an average, does not tell us much. If you have one hand in a bucket of ice-cold water and the other in boiling water, then, on average, you're fine. In reality, you're not. A figure like 40% overall similarity at a word depth of 5 does not tell us much, until we know how the overall similarities at word depth 5 are distributed. If only very few works had an overall similarity of 40%, then we would begin to suspect something unusual was going on and investigate further, but if it was the average in the long run, no one would worry. The percentile is a measure of where the value of overall similarity comes compared with other works in the same language and at the same word depth. As soon as the database contains more than 100 cases of comparable word depth and language, percentile values will be calculated.
The value of a percentile ranges from 0 to 100. The smaller the value, the less similarity there is relative to works in the same language and at the same word depth. In all probability it also correlates to more original work. A higher value indicates that some, or all, of the work may be plagiarised.
The stupidest way to plagiarise is: Ctrl-A, Ctrl-C, Ctrl-V. When an educator reads a student's work and recognises that it does not accord with that student's abilities or interests, or that it was written for an audience which the student could not have intended, then the alarm bells go off. If your entire occupation with a text was three presses of a button, then you are unlikely to know how glaringly apparent plagiarism can be.
Minor plagiarism can occur when students learn things by rote and forget where they learned them from (source amnesia). While this is possible, it does not mean that it is excusable. Scholars working with and from texts are usually careful to have their texts in front of them when they are working with them and they do not normally rely on memory alone to determine what a particular source has to say.
Science differs from tradition as a way of discovering and passing on knowledge in important ways: it does not rely on word of mouth, memory or imitation and it is accountable. Giving credit where credit is due is a part of this process, and being skeptical is another part. Great people may have had great ideas, but they have had their share of foolish ones as well, so that we can never accept authority alone as a measure of the value of an idea. Newton discovered calculus and gravity but he also wasted his time on alchemy.
You may think that your instructors are stupid and some of them may actually be. But they would also be very lonely if they ever plagiarised a student's work. Not that it never happens, but when it does most students take the hint and avoid handing in anything, or at least anything original to the offending professor.
Some of your fellow students may have to put in a great deal of work and achieve only mediocre results. Science's rewards may not always go to the hardest-working but they almost never go to the laziest.
The Internet is brimming full of information, some of it useful. But perhaps one of the more useful things it can offer is a mirror on what the rest of humanity has already written on your subject. Do you think about your topic the same way that everyone thinks about it? Or differently? Discover how! Search engines can play an important role in helping you to avoid plagiarism and perhaps even to put you on a better track. In the words of Georg Christoph Lichtenberg, one of the great scholars of the 18th century:
"I cannot say that it will be any better when it is any different, but to be better it will have to be different!"