Software Finds Plagiarism In Research 111
shmG writes "Researchers from the Virginia Bioinformatics Institute have created a seek-and-destroy program — for plagiarism. Called ET Blast, it's designed to find plagiarism in scientific papers. It does a full-text analysis, and then looks for similar publications in several databases. 'We have better literature,' Garner said. 'There are abstracts and full papers, and a database called Crisp, where you compare stuff to every grant the NIH gets. It's compared to any research that's been funded.'"
What about ... (Score:4, Interesting)
What about academic "recycling".
I remember being told a long time ago that some researchers will basically make several permutations of the same paper to submit to a bunch of different places. It's essentially the same paper, with nothing new in it, but if you can get several places to publish it, you can pad out your publications list.
Re:What about ... (Score:5, Insightful)
if you resubmit your own work, it's not plagiarism.
You can't plagiarize yourself [Re:What about ...] (Score:2)
if you resubmit your own work, it's not plagiarism.
Correct! It's amazing to see how many people don't understand this point, but it's correct: you can't plagiarize yourself, because plagiarism is the act of passing somebody else's work off as being yours.
I hate it when researchers report the same work in many different papers, but although it is a violation of research reporting standards, and in some cases a violation of an intellectual property contract... it's not plagiarism.
Re:You can't plagiarize yourself [Re:What about .. (Score:5, Interesting)
Re: (Score:3, Informative)
I actually ran into this in grad school. When writing a tech related paper, I referenced one of my past papers on the same subject as a source. My professor made it clear I had to cite myself to avoid "self-plagiarism". I thought it quite possibly the stupidest thing I had ever heard in my life, and it was coming from a celebrated PhD at a major New England university.
Re: (Score:2)
Re:You can't plagiarize yourself [Re:What about .. (Score:5, Interesting)
Yes, but maybe the problem is that we don't have a good terms to differentiate between appropriate reuse of one's own writing, and unnaceptable reuse.
For instance, it's a violation of academic ethics to try to publish the exact same paper in multiple places. You're effectively trying to increase your publication count without adding anything new to the body of knowledge. It's still not plagiarism, since it's your own work, but it is unethical.
Not citing previous work when writing a paper is also wrong, though not in the same way. It can be either an honest mistake, lazy, or downright unethical (e.g. not citing the work of someone you don't like). Not citing your own previous work in the area is similarly wrong. Not because it would be plagiarism, but because citations are vital to help others understand the context, significance, and background to the present work. So you should cite yourself when appropriate, just as you would cite others.
And lastly, there are times where re-using your own material is absolutely acceptable. For instance when releasing a new edition of a book, it just makes sense to tweak the things that need changing. It doesn't make sense to rewrite every sentence to avoid 'plagiarizing' yourself. Similarly if you write a review article of a certain field, it just makes sense to re-use some of the text from a previous review (now outdated) that you wrote. (There may or may not be secondary copyright concerns, depending on the various contracts in place.) It isn't plagiarism, and it isn't wrong.
Perhaps academia needs to develop terms to cleanly differentiate between these cases. Or alternately people need to be more specific when they are talking about appropriate vs. inappropriate behavior. Abusing "plagiarism" as a catch-all for "unethical publication" confuses the issue.
Re: (Score:1)
I was saying to myself, wait, this post is identical to the previous one... duh.
But, since you're posting as anonymous, it doesn't increase your publication count to republish it. Fail.
(And, anyway, "Anonymous Coward" is already the most-cited author on slashdot.)
Re: (Score:1)
It always used to make me chuckle to find textbook references cited as "personal observation" in journal articles written by one of my university's professors. Most scientists can't get away with that. But if you are as much of a bigwig in your field [murdoch.edu.au] as he was, I guess it's not as arrogant as it might seem.
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
"Okay, I give myself permission to copy work from myself....there.....now it isn't plagiarism."
Re: (Score:1)
Re: (Score:2)
You're just not being creative enough. You can come up with a different answer, for example "1+1 is 1.999..." or "1+1 is 1, for sufficiently large values of 1" etc.
Re: (Score:2)
Re: (Score:2)
Oh, I get that you may do a series of experiments all with some commonality. That is fine.
I'm specifically talking about people who essentially recycle the same paper several times with no material changes to any of the resear
you may be violating copyright. (Score:1)
like i said
Re: (Score:2)
It is, however, fraud in most cases, since most scientific journals require that papers submitted to them be research that is unpublished and not currently submitted for publication elsewhere.
Too bad (Score:2)
Re: (Score:2, Informative)
if you resubmit your own work, it's not plagiarism.
Let me clarify the issue for those not accustomed to the rules of scientific publishing.
There IS a thing as self-plagarism, and it's not necessarily a minor offense. At it's core, if you submit essentially the same work to multiple venues with the intent to pass each off as an independent body of work when they are not, then there is intent to deceive and that is an ethical breach of conduct. Worst case scenario, the author list and abstract has been changed just enough that it leads others to believe th
not so cut and dried (Score:1)
Most publications are group work. Maybe the first author wrote the entire work without input, using only the results of others. And maybe every other author made significant changes or critiques. Those words can't be reused – unless they include every previous author in the new list. Reuse an introduction a few times and the author list is going to get pretty long. Anyway, it is copyright violation to use previously published phrases and images in a publication for a different publisher. That is
Re: (Score:1)
Re: (Score:2)
rewriting your own articles isn't classified as stealing.
Re: (Score:1)
Re: (Score:2)
I remember being told a long time ago that some researchers will basically make several permutations of the same paper to submit to a bunch of different places. It's essentially the same paper, with nothing new in it, but if you can get several places to publish it, you can pad out your publications list.
So what? You can't plagiarize yourself. Researchers put out multiple, nearly identical papers all the time, especially those published in conference proceedings. (For example, this guy [stanford.edu] just go elected vice president of the American Physical Society.) It's also very common to recycle review material from one paper you have written to use in another.
This is entirely distinct from university academic misconduct policies which require papers and so forth submitted in fulfillment of course requirements to
Re: (Score:2)
For example, my advisor gave a (Revi
Re: (Score:2)
In the dejavu subsite [vt.edu], most publications share an author, so yes, it does "recycling" / self-plagiarism.
Re: (Score:1)
does it translate to chinese? (Score:2)
Would be nice to widen it to IP & Copyright infringement.
How is this different from Turnitin? (Score:5, Informative)
This sounds almost exactly like turnitin.com where when one uploads a paper to it, it searches almost anything it can get ahold of and will list any text in any academic journal that is copied verbatim.
Re: (Score:1, Informative)
Those cunts @ turnitin archive *YOUR* paper for eternity (without payment and without any course for redress) to achieve network effects and enhance their service.
Re: (Score:1)
This sounds almost exactly like turnitin.com where when one uploads a paper to it, it searches almost anything it can get ahold of and will list any text in any academic journal that is copied verbatim.
An apt analogy. Imagine the following scenario: you are simultaneously enrolled in a two classes that both require a lengthy essay which constitutes a large portion of your final grade. You find the two assignments to have similar enough parameters and decide to submit the same essay to both teachers without any prior approval for the double-dipping, thus making it appear you have spent more effort than you actually have. You are only "plagarizing yourself", so no harm, right?
Doubtful.
Self-plagari
Re: (Score:2, Interesting)
There is no harm you have done the required work. Just because you can use your work in more than one place doesn't harm anyone. Assertions to the otherwise are ridiculous.
Re: (Score:2)
Re: (Score:2)
If you want to know the difference between this and turnitin, you'd have to read the article, it specifically mentions a few differences...
Spam detector? (Score:2)
Even better if it will show papers that are suspiciously similar to pharmaceutical companies advertising literature.
Red faces all round then.... (Score:1)
Re: (Score:3, Funny)
Is this where the author of something passes it off as his own? I agree, that's a terrible thing.
False positives/negatives (Score:1)
There's nothing about "destroying" in the article (Score:3, Insightful)
Re: (Score:2)
Shocking plagiarism already found (Score:2, Funny)
They found a research paper on hydrogen stole 2 thirds from an existing paper on water.
Re: (Score:2)
plagiarism differs in science vs. English Lit. (Score:5, Insightful)
I once had an English teacher who said, "If you have more than five consecutive words matching a source, without a citation then it's plagiarism." Perhaps that's how freshman writing assignments are graded, but it's silly when applied to scientific papers. Pick up any math paper on number theory, and you're bound to find the sentence "Let p be an odd prime number." without citation, but that would hardly qualify as plagiarism. Yet, syntactic matching appears to be exactly what this program is doing.
What constitutes "plagiarism" in a scientific paper is very different from plagiarism in journalism or English literature. In scientific writing, it is expected that authors will use the same flat, impersonal style and repeat definitions and the results of others to save the reader the time of having to look them up. So, simple pattern matching between science papers will result in a great many false positives. In science (and math) writing what matters is the new result which the author is claiming. It seems to me that it would be nearly impossible for a computer program to detect the distinction.
Re: (Score:2)
> I once had an English teacher who said, "If you have more than five consecutive words matching a source, without a citation then it's plagiarism." Perhaps that's how freshman writing assignments are graded, but it's silly when applied to scientific papers.
No. Just... no. It is not "silly," it is insulting, in either freshman english lit or scientific papers. Any teacher who defines plagiarism that way has a lot more to learn than he has to teach.
Re: (Score:2)
> I once had an English teacher who said, "If you have more than five consecutive words matching a source, without a citation then it's plagiarism." Perhaps that's how freshman writing assignments are graded, but it's silly when applied to scientific papers.
No. Just... no. It is not "silly," it is insulting, in either freshman english lit or scientific papers. Any teacher who defines plagiarism that way has a lot more to learn than he has to teach.
Perhaps so, but I could see where such a rule could come from, and it could instill a discipline of making sure things are properly cited. Without any other context, obviously the rule is rubbish, but I could see it as an excellent rule to live by when taking freshman courses in writing/composition.
Re:plagiarism differs in science vs. English Lit. (Score:5, Insightful)
> Perhaps so, but I could see where such a rule could come from, and it could instill a discipline of making sure things are properly cited. Without any other context, obviously the rule is rubbish, but I could see it as an excellent rule to live by when taking freshman courses in writing/composition.
But that's half the problem. The rule may come from a desire to instill discipline, but it's just a bad rule, because it teaches that plagiarism of ideas isn't plagiarism at all, and that stringing five words together in a way that's been used before is, and that rewriting something in your own words makes it no longer plagiarism.
Demand students live by a childish rule, and you will at best be someone they have to ignore as they try to actually learn things.
Re: (Score:2)
because it teaches that plagiarism of ideas isn't plagiarism at all, and that stringing five words together in a way that's been used before is, and that rewriting something in your own words makes it no longer plagiarism.
While I agree with your general premise about childish rules... Just no.
Plagarism is taking someone elses words and claiming them as your own.
You seem to be infected by the IP bug.
Fortunately for the rest of us, one cannot plagarize ideas. Reformulating a concept in your own words does not count as plagarism, nor should it.
Regards.
Re: (Score:1)
There is a grayer area than that. If I rewrite your book, with a paragraph-by-paragraph correspondence, the same plot, the same characters with names and appearances slightly changed, it is still changed. A book callel Earl of the Rings, about a hibbit from the Shaw taking a broach to be destroyed in Mt Gloom would probably be plagiarism (unless it changed enough to become parody).
Re:plagiarism differs in science vs. English Lit. (Score:4, Interesting)
You seem to be infected by the IP bug.
Fortunately for the rest of us, one cannot plagarize ideas. Reformulating a concept in your own words does not count as plagarism, nor should it.
You seem to be infected by a different sort of IP bug.
Plagiarism is not the same thing as copyright infringement (though it's not uncommon for the same act to involve elements of both). One can plagiarize public domain sources. One can plagiarize ideas.
Plagiarism is what happens when a writer presents other people's work (their words or their ideas) as his own, without giving due credit to the source. Pretending that you thought of something when you're actually just copying another author's reasoning is intellectual dishonesty, and squarely within the realm of plagiarism.
If you copy someone's words verbatim, there is an added obligation to specifically identify the copied passage by blockquoting, using quotation marks, or otherwise clearly setting off the passage from the rest of your writing. If you're just paraphrasing, there's no obligation to use quotation marks (that would be silly) but there remains a need to properly name your source (through footnotes or other means). Rewriting someone else's work in your own words is otherwise still very much plagiarism.
Re:plagiarism differs in science vs. English Lit. (Score:4, Insightful)
Furthremore, when a scientist has spent a number of years on a long-term research plan, the condensed versions of what he is studying become so well rehearsed that it gets memorized. I have stock phrases that I use when I want to describe this or that aspect of my work because, after giving dozens of presentations about it, they are the ones that work best. They are the most highly polished and refined. They communicate the idea well. And so, they often get trotted out with every manuscript or grant application. My students and post-docs learn to use the same phrasing because, flatly, it works.
None of the instances of those phrases or full sentences require attribution because they are all from the same motherspring of thought. We are the writers. And, as you might imagine, this might well produce a raft of false positives to a system that blindly compares text.
Re: (Score:2)
Re: (Score:2)
... Yet, syntactic matching appears to be exactly what this program is doing.
What constitutes "plagiarism" in a scientific paper is very different from plagiarism in journalism or English literature. In scientific writing, it is expected that authors will use the same flat, impersonal style and repeat definitions and the results of others to save the reader the time of having to look them up. So, simple pattern matching between science papers will result in a great many false positives. In science (and math) writing what matters is the new result which the author is claiming. It seems to me that it would be nearly impossible for a computer program to detect the distinction.
Hours of speculation and typing can save one minute of reading TFA. From the article:
"Unlike other plagiarism detectors, it does not use phrases or similar words to check for copying. Helio Text actually looks at the entirety of the text."
So no, it does not. It uses instead some sort of similarity metric computed from analyzing the entire text. This is possibly similar to the text distance metrics used in vector space search engine models (see: en.wikipedia.org/wiki/Vector_space_model ). They will be publi
Re: (Score:2)
Re: (Score:2)
... Yet, syntactic matching appears to be exactly what this program is doing.
What constitutes "plagiarism" in a scientific paper is very different from plagiarism in journalism or English literature. In scientific writing, it is expected that authors will use the same flat, impersonal style and repeat definitions and the results of others to save the reader the time of having to look them up. So, simple pattern matching between science papers will result in a great many false positives. In science (and math) writing what matters is the new result which the author is claiming. It seems to me that it would be nearly impossible for a computer program to detect the distinction.
Hours of speculation and typing can save one minute of reading TFA. From the article:
"Unlike other plagiarism detectors, it does not use phrases or similar words to check for copying. Helio Text actually looks at the entirety of the text."
So no, it does not. It uses instead some sort of similarity metric computed from analyzing the entire text. This is possibly similar to the text distance metrics used in vector space search engine models (see: en.wikipedia.org/wiki/Vector_space_model ). They will be publishing a paper online in PLoS ONE.
I did RTFA. However, there is no code, no algorithm description, no indication whatsoever in TFA describing exactly how their program operates. From the vague references in TFA it appears that this is nothing more than a glorified, article+abstract-wide, pattern matcher. Perhaps it is a little more clever and uses something similar to Google's page ranking algorithm via applying distance metrics to textual spaces. However, that is also a form of syntactic analysis rather than a context analysis. Barrin
Re: (Score:2)
...you're bound to find the sentence "Let p be an odd prime number."
Actually, I kind of doubt you see that exact phrase very often. Although, you're certainly more likely to see it than "Let p be an even prime number."
Re: (Score:2)
Replying to myself, yes, I know about 2 ;-)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Actually it is quite common in papers that deal with primes in the first place, though the phrase is more often just "Let p be an odd prime" rather than "let p be an odd prime number".
OK. I didn't remember it phrased that way from any number theory, but that was decades ago for me. Seems to me a bit obtuse compared to calling out the exception that is being excluded. But if it's done that, it's done that way, regardless of my opinions...
Regarding your point on phrasing, yeah, just google the two. Yours wins 169,000 to 0.
Re: (Score:2)
Re: (Score:1)
"Pick up any math paper on number theory, and you're bound to find the sentence 'Let p be an odd prime number.' without citation, but that would hardly qualify as plagiarism."
I wonder how often you see specifically an odd prime number... since two is the only even prime, its really the oddest of the bunch.
Re: (Score:2)
I wonder how often you see specifically an odd prime number... since two is the only even prime, its really the oddest of the bunch.
The answer is:
"About 48,200 results (0.53 seconds)"
Re: (Score:1)
What a stupid rule.
Re: (Score:2)
"Syntactic matching appears to be exactly what this program is doing". At least you openly admit that you are only assuming you know how the fuck it works. Given that they are working in Bioinformatics, and that it's called "ET BLAST" I'm going to go out on a limb and say that it works similar to how BLAST works. When you computing the similarity matrix for a protein (or DNA), well, you could just put those two amino acid sequences (or basepair) side-by-side and count up where they match. Only, some am
Where's the code? (Score:2)
I poked around the site, and found the page describing some JSON APIs and things, but no links to code or developer pages.
So where's the code?
Hmm, okay, that's weird. The project is run by the Virginia Bioinformatics Institute, but the disclaimer [vt.edu] says:
This software and data are provided to enhance knowledge and encourage progress in the scientific community and are to be used only for research and educational purposes. Any reproduction or use for commercial purpose is prohibited without the prior express written permission of the University of Texas Southwestern Medical Center.
So they don't hold copyright to it? Or they didn't write it? Hmmmm....
Re: (Score:2)
Recycling (Score:2)
Re: (Score:2)
Re: (Score:2)
Its considered unethical by the majority of scientists to recycle papers unless there is a significant update from one to the next, i.e., methods changed, or additional steps are taken which improve the results. It is not considered unethical to have your paper resubmitted to a different conference or journal if it was rejected from another however.
I know, I was referring to the former. In fact, referring a paper to different conferences (say within the same year), that I would *not* consider it recycling.
That's all fine and good, but... (Score:3, Interesting)
... can it find dupes on Slashdot?
Re: (Score:2)
Can it load the front page?
Finding dupes on slashdot is like finding corruption in congress.
That's all fine and good, but... (Score:1, Redundant)
... can it find dupes on Slashdot?
How is that news? (Score:1)
amount of scientific plagiarism creeping up (Score:2)
No one has fully stated the cause for the increase. I am guessing its better software and nearly all papers are in electronic databases now. A more pessimistic explanation would be that as the "Internet Generation" enters the scientific workforce, their sloppy IP habits migrate into research papers.
Publishers are using CrossCheck (Score:1)
http://www.crossref.org/crosscheck.html [crossref.org]
They already create DOIs for their published work and now can check the works before publishing.
It's like fingerprint analysis. (Score:2)
In fingerprint analysis, the computer spits out a possible match. It's up to the human to determine whether or not that match is valid. It's the same with this stuff.
Can you get false positives? (Score:1)
Very often, much of the introductory and methodology sections may be recycled or adapted from previous publications and only the results and conclusions are scientifically novel.
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
It's no more... (Score:1)
Awesome (Score:2)
Is it really a plagiarism tool? (Score:1)
Secondly, the About [vt.edu] page doesn't talk plagiarism at all. What it says is: "eTBLAST is a unique search engine for searching biomedical literature. Our service is very different from PubMed
ET Blast (Score:2)
"Ouch."
Didn't I read this same article last week? (Score:1)
Old hat (Score:1)
My English prof back in 2000 had this software already. :)
However, my final paper was "borrowing" quiet heavily and he didn't find out. Maybe this version works better?
Software finds plagiarism in VBI's own material in (Score:1)