Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Google Admits to Using Sohu Database

CowboyNeal posted more than 7 years ago | from the cut-and-paste dept.

Google 209

prostoalex writes "A few days ago a Chinese company, Sohu.com, alleged Google improperly tapped its database for its Pinyin IME product, stirring controversy on whether two databases were similar just due to normal research process. Today Google admitted that its new product for Chinese market 'was built leveraging some non-Google database resources.' 'The dictionaries used with both software from Google and Sohu shared several common mistakes, where Chinese characters were matched with the wrong Pinyin equivalents. In addition, both dictionaries listed the names of engineers who had developed Sohu's Sogou Pinyin IME.'"

Sorry! There are no comments related to the filter you selected.

Is this... (3, Insightful)

Hsensei (1055922) | more than 7 years ago | (#18669419)

Google doing evil, or sticking it to evil?

Re:Is this... (1, Funny)

Anonymous Coward | more than 7 years ago | (#18669591)

well considering it's from godless communist china they must be sticking it to evil!
hooray!

Re:Is this... (1)

renegadesx (977007) | more than 7 years ago | (#18669805)

Its both. Do evil to combat evil. Thats the American way now, didn't you get the memo?

Re:Is this... (1, Funny)

Anonymous Coward | more than 7 years ago | (#18669945)

Sorry, I believe that memo was supposed to go down the memory hole. We are at war with East-Asia...

Re:Is this... (0, Troll)

Simon Garlick (104721) | more than 7 years ago | (#18670071)

American evil is GOODER than dirty stinkin' gook evil.

Google fucks up, so bash the Americans (0)

Anonymous Coward | more than 7 years ago | (#18670167)

renegadesx got the memo, apparently.

Dictionary mistakes. (5, Funny)

Tackhead (54550) | more than 7 years ago | (#18669449)

> Today Google admitted that its new product for Chinese market 'was built leveraging some non-Google database resources.' The dictionaries used with both software from Google and Sohu shared several common mistakes, where Chinese characters were matched with the wrong Pinyin equivalents.

...including the ones for "plagiarize", "research", and apparently a new one for the 2000s under "leverage".

Leverage! Leverage!
Let no one else's work cut short your edge,
Against the truth you can surely hedge,
So don't cut short your edge,
But leverage, leverage, leverage!

(One man deserves the credit! One man deserves the blame!
And Sergei Brin Ivanovich Lobachevsky is his name!)

Re:Dictionary mistakes. (0)

Anonymous Coward | more than 7 years ago | (#18670379)

You can't do that on the internet

Re:Dictionary mistakes. (0)

Anonymous Coward | more than 7 years ago | (#18670603)

You, sir, are a genius.

Do no evil? (-1, Flamebait)

Anonymous Coward | more than 7 years ago | (#18669457)

I've said it before, and I'll say it again: Google doesn't give a shit whose content they rip off. Rampant youtube copyright violations, intent on cloning the library of Congress without any regard for authors'/publishers' wishes, caching of webpages, you name it. Other peoples' content is their business. Stealing other peoples' content, that is.

Re:Do no evil? (0)

Anonymous Coward | more than 7 years ago | (#18669593)

It's not stealing. Trivially. Not disputing they probably do some illegal stuff, but illegal doesn't mean wrong.

As far as I can see, google are the greatest force for good (good: destroying copyright law!) in a long time.

Re:Do no evil? (0)

Anonymous Coward | more than 7 years ago | (#18669633)

Talk about drinking the kool-aid...

They're a search engine. They're not curing cancer or solving world hunger. No, they are not the greatest force for good in a long time.

Re:Do no evil? (1)

mattgreen (701203) | more than 7 years ago | (#18670237)

But they SAID they weren't evil, therefore that MUST make them good! Or, at least, that is how I fit into my naive worldview! Everything is either absolutely evil (Microsoft) or absolutely good (Google). There is no in-between.

Re:Do no evil? (0)

Anonymous Coward | more than 7 years ago | (#18669777)

Copyrights exist for a reason...read a book or something and figure it out.

Re:Do no evil? (0)

Anonymous Coward | more than 7 years ago | (#18669873)

Oh, they exist for a reason alright. That's why I oppose 'em! http://piratpartiet.se/ [piratpartiet.se]

Re:Do no evil? (1)

Thexare Blademoon (1010891) | more than 7 years ago | (#18670475)

It's not whether or not they exist for a reason that I question.

It's whether or not they exist for a good reason.

Oblig futurama quote (5, Funny)

pedantic bore (740196) | more than 7 years ago | (#18670263)

"The internet is about the free exchange of other people's ideas!"

Google's initial explanation (5, Funny)

Anonymous Coward | more than 7 years ago | (#18669459)

"In the future, Google invents a time machine that's used by a rogue employee to travel back in time to give Sohu this database. It's clear then that Sohu stole our database."

Re:Google's initial explanation (2, Funny)

BungaDunga (801391) | more than 7 years ago | (#18670189)

In fact, if we hadn't used their database, our employee won't be able to go back in time to give it to Sohu, and we wouldn't have been able to steal their database. QED.

Have no fear! (1)

mattgreen (701203) | more than 7 years ago | (#18669463)

I'm sure someone will step up and help them save face in this embarrassing situation! When in doubt, you can always try to change the subject, that has worked well in the previous thread. Now that I think about it, we need a RoughlyDrafted-esque site for Google, anyone up to the task?

This reminds me of (5, Interesting)

Diordna (815458) | more than 7 years ago | (#18669475)

"Stolen from Apple Computer" (whole story [folklore.org] )

"built leveraging some non-Google resources" (0)

Anonymous Coward | more than 7 years ago | (#18669477)

lol, Google may or may not be evil but they can spin doctor with the Microsofts of the world.

Now what could be so wrong about leveraging non-Google resources?

Turnitin.com Subscription Coming (3, Funny)

slashbob22 (918040) | more than 7 years ago | (#18669481)

I guess Google Labs will have to subscribe to Turnitin.com now.

Could be just a coincidence. (0)

Anonymous Coward | more than 7 years ago | (#18669487)

Could be just a coincidence. Doesn't quantum physics state that essentially anything is possible? /apologist

So... (5, Interesting)

Anonymous Coward | more than 7 years ago | (#18669491)

When caught making a mistake, they admit it, work to resolve it, and move on?
I think there are a few other companies who could learn from that approach ...

Re:So... (4, Insightful)

Timesprout (579035) | more than 7 years ago | (#18669611)

'Mistake' is a bit euphamistic here. The dictionary was never made public yet Google somehow managed to accquire it. They have not complied with Sohu's requests to date. They dragged their feet over the whole issue and only came clean when there more than sufficient proof they were infringing.

Its not the first time Google have taken a fairly liberal interpretation of someone elses copyright either.

On what do you base your judgment? (4, Insightful)

Anonymous Coward | more than 7 years ago | (#18669709)

> They have not complied with Sohu's requests to date.

One of Sohu's demands was to remove it. They did that, even prior to the cease & desist deadline, per the article. It sounds like they'll have to compensate Sohu next, which isn't overly surprising. As for where they got it, perhaps someone sold it to them? We don't know, so I'll reserve judgment about whether it was acquired in an un-Google "evil" way until we hear the rest of the story.

> It's not the first time Google have taken a fairly liberal interpretation of someone else's copyright either.

As for the copyright stance, I honestly don't care. Yes, I dislike Microsoft's hypocrisy concerning copyright, but I don't really give a damn about imaginary property at this point in time, and I don't see Google out there telling people that copyright infringement is evil, wrong, Communist and anti-American.

Frankly, I'm more inclined to distribute my works with only one request: that you do not acknowledge my authorship in any way. Of course, almost the only way to enforce that is to post AC :-)

Re:On what do you base your judgment? (5, Informative)

Daengbo (523424) | more than 7 years ago | (#18670127)

In my mind, there is some question of whether a database of facts should, in fact (hee hee), be copyrightable at all. The characters were not original. The pinyin is not original. The pinyin for each character is, in fact, well established. Why should a compilation of public-domain facts which in itself is a derivative work be copyrightable?

It reminds me of a court case a few years ago in Thailand, where a judge put several Thai fonts into the public domain, stating "No one owns the Thai alphabet. It belongs to the people."

Re:On what do you base your judgment? (2, Interesting)

QuantumG (50515) | more than 7 years ago | (#18670235)

meh, the argument for why compilations of public domain "facts" should be considered a copyrightable work is that it is work to compile those facts. Why people can't understand that not all work results in property is beyond me, but there's ya reasoning.

Re:On what do you base your judgment? (1)

Daengbo (523424) | more than 7 years ago | (#18670437)

I know the reasoning: I just don't understand it. Writing a historical novel or even a biography is different from a raw database of publicly available facts. One is art, while the other is just data entry.

And it isn't (1)

phorm (591458) | more than 7 years ago | (#18670607)

The language isn't copyrighted, and google was more than free to come up with their own dictionary/database. However, in this case they used somebody else's. The infringement is not against the language itself, but against the use of somebody's precompiled database (inclusive of errors, amusingly enough).

Re:So... (1)

inviolet (797804) | more than 7 years ago | (#18669863)

Its not the first time Google have taken a fairly liberal interpretation of someone elses copyright either.

Perhaps so. But then, Google has billions of dollars in the bank. They have no need to steal anything from anyone, and every reason not to.

Can you really suppose that anyone in Google management decided to snag Sohu's database? Google is in the database business, so they know all about the salting of databases. They had to know that any commercial database will be filled with giveaway records (e.g., in this case, the developers' names).

Probably, Google legitimately acquired the database by subcontracting with some of the locals -- locals who stole it on their own prerogative. And now that it's hit the fan, Google can't say anything in its own defense without making the situation worse.

Once everyone calms down, I'll bet we learn that a certain acquisition manager at Google got reprimaned for failing proper "due diligence" before approving the purchase from someone who turned out to be shady.

Re:So... (1, Insightful)

Anonymous Coward | more than 7 years ago | (#18670133)

Replace all instances of "Google" with "Microsoft" in your post and see if your argument makes any sort of sense!

Re:So... (0)

Anonymous Coward | more than 7 years ago | (#18670195)

That's what I've been telling people about Microsoft Visual J++ all along.

Can you really suppose that anyone in Microsoft management decided to snag Sun's programming language? Microsoft is in the database business, so they know all about the features of the programming languages. They had to know that any programming language will be filled with giveaway features (in Java's case, treating everything, except the primitive type, as an object).

Probably, Microsoft legitimately developed the programming language by subcontracting with some of the locals in Redmond, WA -- locals who stole it on their own prerogative. And now that it's hit the fan, Microsoft can't say anything in its own defense without making the situation worse.

Re:So... (1)

ClosedSource (238333) | more than 7 years ago | (#18670403)

That would make some sense except for the fact that J++ was presented as Java clone from day one. Sun sued MS on the basis of violating a contract, Sun never claimed that MS had stolen anything because nothing was.

Re:So... (4, Insightful)

Breakfast Pants (323698) | more than 7 years ago | (#18669613)

Actually, when caught, they just removed the developer's names from the dictionary. When a big deal of it was made, *then* they went to town 'not doing evil'. They still haven't said how it happened; I bet they will quietly settle it, and we will never hear more.

Re:So... (0)

Anonymous Coward | more than 7 years ago | (#18669679)

Or possibly at least one free software project [tinyurl.com] ...

Re:So... (2, Insightful)

suv4x4 (956391) | more than 7 years ago | (#18670105)

When caught making a mistake, they admit it, work to resolve it, and move on?
I think there are a few other companies who could learn from that approach ...


What a great approach indeed! Steal, and if caught, deny it a little, then cover it up.

Actually I think Google learned that from someone else's company, or is Google "innovating" here? A debate for the coming generations.

Once again, GOOGLE FARTS! (0)

Anonymous Coward | more than 7 years ago | (#18669513)

and slashdot smells it, lol!

Cmon Google... (3, Funny)

Anonymous Coward | more than 7 years ago | (#18669519)

surely after helping so many students copy their research papers you should know the number 1 rule of copying another persons work: Change the F*CKING NAME!

I wonder... (2, Interesting)

flyboy81 (698817) | more than 7 years ago | (#18669531)

Is this a single isolated incident or simply the first one of more coming from the company that does no evil?

Re:I wonder... (1)

themushroom (197365) | more than 7 years ago | (#18669723)

...while working with the Chinese guvmint?

Re:I wonder... (1)

AmberBlackCat (829689) | more than 7 years ago | (#18670559)

I guess the only thing reasonably certain is it's the first time they got caught.

Are dictionaries copyrightable? (0)

Anonymous Coward | more than 7 years ago | (#18669535)

Not in the States at least, AFAIK...

Mistakes are (1)

EmbeddedJanitor (597831) | more than 7 years ago | (#18669585)

The mistakes were the giveaway. Surely these are "creative works"?

Re:Mistakes are (0)

Anonymous Coward | more than 7 years ago | (#18669637)

just like they always are. Map makers used to insert tiny mistakes to keep other cartographers from copying their work.

Time for a slogan change? (5, Funny)

GFree (853379) | more than 7 years ago | (#18669587)

"Do no evil"

should be changed to

"Do just a tiny bit of evil"

which at this rate will probably end up as

"All your web are belong to us"

Re:Time for a slogan change? (2, Funny)

Ngarrang (1023425) | more than 7 years ago | (#18669601)

Do no evil, or don't get caught.
We redefine evil.
Emulate or Innovate, which ever is more convenient.

Re:Time for a slogan change? (5, Insightful)

LarsG (31008) | more than 7 years ago | (#18669695)

This reminds me of Animal Farm and how the commandments on the barn wall changed.

The people outside looked from Google to MS, and from MS to Google, and from Google to MS again; but already it was impossible to say which was which.

Re:Time for a slogan change? (2, Funny)

Anonymous Coward | more than 7 years ago | (#18670495)

It's not gotten to that point yet. If you want to figure out which is Google and which is MS, if you're ducking chairs or you hear the distant chant of "developers, developers, developers", it's MS.

Re:Time for a slogan change? (1)

Viceroy Potatohead (954845) | more than 7 years ago | (#18669973)

"Your search - do no evil - did not match any documents" [or]
"Did you mean: services [google.com] ?"

I think it's high time for Google to do an internetectomy to remove references to "do no evil" so they can get back to business as usual, without us calling them on it all the time.

Re:Time for a slogan change? (1)

Dragonslicer (991472) | more than 7 years ago | (#18670197)

"Do just a tiny bit of evil"
The Diet Coke of evil?

Re:Time for a slogan change? (0)

Anonymous Coward | more than 7 years ago | (#18670513)

I thought the slogan was 'don't be evil'... Now, considering the balance of good/evil that we are currently aware exists within the Google microcosm, do you think they are 'being' evil? Is it *really* necessary for one to 'do no evil' in order to not *be* evil? Or does it mean maintaining a popular perception among the critical that you are mostly good...with a few bad seeds and decisions scattered throughout... This is life, my friend. Not being evil means doing what you think is best...regardless of the rules imposed upon you. The depths of rationalization can go pretty far, but popular perception holds a strong net to catch you if you're willing to stay above ground.

So far, it seems Google hasn't needed to use the safety net of perception very often... Let's just hope it doesn't tear beneath them. They're pretty heavy.

Car stereo (3, Funny)

DogDude (805747) | more than 7 years ago | (#18669623)

So then, did the guy who stole my car stereo, was he "leveraging some non-car thief assets"?

Re:Car stereo (2, Insightful)

iminplaya (723125) | more than 7 years ago | (#18669861)

Did he leave you an exact copy?

New tag: copyvio (1)

Matt Perry (793115) | more than 7 years ago | (#18669643)

I recommend tagging this "copyvio [urbandictionary.com] "

Pot meet Kettle (0)

Anonymous Coward | more than 7 years ago | (#18669647)

As if the chinese aren't the biggest pirates/copycats around.

Re:Pot meet Kettle (0)

Anonymous Coward | more than 7 years ago | (#18670565)

and, apparently, cry babies to boot

Do no evil (5, Insightful)

z-j-y (1056250) | more than 7 years ago | (#18669671)

Google is going to release a statement that stealing code/data is not evil in China, and Google must fit in local cultures and abide by local laws.

Seriously, this is just pathetic. I am appalled by the Google apologists on slashdot.

Chinese input is a well established market; Google Giant forces itself into the market with a product that is very similar to existing ones and offers no innovation. That is not evil enough? They did this by stealing data and who knows what from others. Mind you that the data is not publicly available, so Google must have committed certain crimes to obtain the data.

For those who don't see what's the big deal: the mapping from ASCII sequence to Chinese character/phrase is not trivial; actually it is what Chinese input is all about.

When in Rome... (0)

Anonymous Coward | more than 7 years ago | (#18669753)

...do as the Romans do?

Re:Do no evil (2, Interesting)

maxume (22995) | more than 7 years ago | (#18669767)

There is no way to tell if the copying was done by 'Google' or if it was done by some engineer on their own. Sure, 'Google' needs to take steps to make sure that they what they put out meets some sort of standard, but the backpedaling and what not is pretty much the response you would get no matter how the copying was initiated, so there isn't much reason to assume where the responsibility for the copying lies.

Re:Do no evil (2, Insightful)

QuantumG (50515) | more than 7 years ago | (#18669857)

Or done by a Chinese company which Google outsourced to. Isn't that how all corporations do their evil? Outsource it to Evil Inc. Everyone except Microsoft and Enron I guess.

Re:Do no evil (1)

homer_s (799572) | more than 7 years ago | (#18669905)

Google Giant forces itself into the market with a product that is very similar to existing ones and offers no innovation. That is not evil enough?

So, offering a 'me too' product is now evil?

Re:Do no evil (1)

The_Wilschon (782534) | more than 7 years ago | (#18670315)

When the me-tooist is a corporate giant and the me-firsters are still quite small, the me-tooist will typically crush the me-firsters merely by virtue of its size, name recognition, and ability to lose money on a market for a while in order to gain a monopoly of it.

Even if they hadn't ganked anybody's data to do it, shoehorning themselves into a market full of players much smaller than themselves is not very nice.

Gratuitous analogy: Michael Johnson steals a kid's shoes and then wears them to run at a high school track meet.

Re:Do no evil (5, Insightful)

ShawnDoc (572959) | more than 7 years ago | (#18669943)

This is a serious problem when dealing with Chinese companies. Now that Google has opened offices in China and has staffed them with native Chinese people, they're going to have a hard time enforcing western style ideas about copyright and what constitutes "doing no evil". Its a problem we've run into in the past with our Chinese operations. The way the problem was "solved", by removing the engineers names, but still clearly using the other company's engine (they didn't remove the identical bugs), is something I have seen happen in the past when dealing with our R&D team in China when we've found them using code they "borrowed" either from open source code or from an engineers past employer. I've never seen it handled in public like this however. Google is going to need to take some serious Q&A steps in their Chinese offices to keep stuff like this from happening again or else risk their Chinese office ruining the entire company's reputation.

Re:Do no evil (0)

Anonymous Coward | more than 7 years ago | (#18670179)

They're not using the same engine. They're using (mostly) the same *data*, that was mined from the competitor's program.

Of course it's still a bad thing.

Re:Do no evil (2, Insightful)

ReallyEvilCanine (991886) | more than 7 years ago | (#18670075)

I'm appalled, too. I'm also surprised. What I'm not is a Google apologist. I still stand by the crux of my comment [slashdot.org] based on my work in I18N and with IMEs.


Google must have committed certain crimes to obtain the data.
No, or at least, "Not necessarily intentionally". The dictionary could've been indexed via the spiders. It could've been indexed via the desktop search app. There are lots of ways that Google could've got the information. Anyone who works for Google, knows the deep ins and outs of their data handling, and who reads and posts on this site ain't gonna tell. As I wrote in the last comment, Google is information. They get it from everywhere, and they know how to store, sort and use it. It may well have been intentional theft, but I don't think Google the corporation has reached the point where they actually believe "All Data Are Belong To Us".

Re:Do no evil (1)

Achromatic1978 (916097) | more than 7 years ago | (#18670319)

The dictionary could've been indexed via the spiders.

The database wasn't bulk browseable.

It could've been indexed via the desktop search app.

I certainly hope not. I would be horrified to find that my desktop search database was being uploaded to Google.

The information was NOT publicly available. Making it out as though Google just happened upon the database because "Google is information" (?!?) just reeks of a new way to spin.

About that do no evil stuff.... (1)

pcause (209643) | more than 7 years ago | (#18669673)

Ok, so we do do some evil, but jusy with our competitor's code. That isn't so bad, is it?

Exactly how did they get a copy of the DB? (1)

WoTG (610710) | more than 7 years ago | (#18669677)

OK, so now that Google has admitted to copying the sohu.com pinyin database... exactly how did they get a copy in the first place? Is there a publicly available file for personal use or was there some sort of web scraping or what?

I suspect that there's more to this story that we're not hearing.

Re:Exactly how did they get a copy of the DB? (5, Informative)

tooyoung (853621) | more than 7 years ago | (#18670171)

OK, so now that Google has admitted to copying the sohu.com pinyin database... exactly how did they get a copy in the first place? Is there a publicly available file for personal use or was there some sort of web scraping or what?

I suspect that there's more to this story that we're not hearing.


Exactly. Reading 95% of the comments for this story and yesterday's story, everyone seems to think that this is about stealing code. This is about Google using the same data to train an algorithm. Both algorithms make the same mistakes because they were trained using the same data, which contained incorrectly labled information. It is whether or not this data was publicly available that is the issue.

For (a horribly contrived) example: Lets say that I write some hand writing recognition software using a neural-net. In order to train my software, I use a large database of handwriting samples that I have found on the web. However, the person that compiled this database made the mistake of labeling all of the sample images of the letter 'n' as the letter 'q', and all of the images of the letter 'q' are labeled as the letter 'n'. Person B comes along and uses the same data set to train a naïve-Bayes classifier. Guess what? Both algorithms will make the same mistakes when it comes to the letters 'n' and 'q'. Not because I stole code from Person B, but because we used the same training data.

I'm not defending Google at all here. If they stole the data from Sohu, they should get in trouble. Based on the fact that Google is in the web-mining business, I would guess that they just grabbed this data off of the net, and someone forgot to think about if they had the right to use it.

this is quite troubling (2, Insightful)

martin-boundary (547041) | more than 7 years ago | (#18669705)

It is clear from this example that _some_ Google engineers have not the first clue about what clean room engineering [groklaw.net] is and when it should be used. Everyone in the software industry is under pressure to produce, that doesn't mean cutting corners is acceptable.

This reminds me of the recent story about GPL code found in OpenBSD [slashdot.org] . There too, an OpenBSD developer took someone else's code and started modifying it without keeping the GPL license. He apparently thought it was ok to do this as long as all the offending functions would be renamed in the final release, but was caught checking in unmodified functions by accident.

Google is well known for using a lot of GPL software, but it is also true that they do not distribute the source code of their flagship programs to the public. Episodes like this make people wonder if they "accidentally" use some GPL code in their distributed products without telling anyone.

Re:this is quite troubling (1)

QuantumG (50515) | more than 7 years ago | (#18669925)

Uh huh. Are you trying to suggest that there is something wrong with this:

1. Take existing code under incompatible license
2. Write new functionality and integrate into your code
3. Test and develop your application until it is "ready"
4. Replace incompatible code with your own code

I mean, if you were talking about using proprietary code in the first step then I could imagine that you might have some kind of argument.. but it's GPL code man.. you're free to do whatever you want with it. Only when you distribute it are you required to place other code that it is based on under the GPL.. and if you remove the GPL licensed code then you have no such responsibility anymore.

Unfortunately the dude fucked up.. everyone does it now and then.

Re:this is quite troubling (1)

tppublic (899574) | more than 7 years ago | (#18670131)

...if you were talking about using proprietary code in the first step then I could imagine that you might have some kind of argument

No need for imagination. Go read Sega v. Accolade.

it's GPL code man.. you're free to do whatever you want with it

NO. You are Free to do whatever the license grants you the right to do. From GPL Section 2(b): "You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License." (emphasis added)

You are proposing to directly leverage GPL code to develop new code. That new code and the combined code are a derivative work of the GPL code.

Thus, to directly answer your question: Yes, there is something wrong with what you are proposing. No lawyer would want you to do it, because the work you produce in steps 2 and 3 is a derivative work of GPL code, and thus must be licensed under the GPL to avoid copyright infringement.

Re:this is quite troubling (1)

QuantumG (50515) | more than 7 years ago | (#18670157)

Yeah, you're on crack if you think that new code you write is a derivative work just because you have read some GPL code.

Re:this is quite troubling (1)

martin-boundary (547041) | more than 7 years ago | (#18670455)

.. but it's GPL code man.. you're free to do whatever you want with it.
Of course you can. But if you modify _it_, then the end product is covered under the GPL. Let's take your example:

1. Take existing code under incompatible license
No problem there. At this point you have a copy of the GPL'd code, and no code of your own. You can do anything you like with the code.

2. Write new functionality and integrate into your code
At this point you have a derivative of the original GPL'd code. No problem there, you can do anything you like with the code.

3. Test and develop your application until it is "ready"
That's fine too.

4. Replace incompatible code with your own code
Here you're taking some GPL code, and modifying it. The result is GPL code. It doesn't matter if your modification consists of "removing" the "original" GPL'd code, the code you're modifying is still GPL, so the result is GPL.

Now granted, it looks confusing that you can end up with a GPL'd code which looks like you've all written it yourself, but that's because in this scenario the developer was sloppy about the disclaimers. If he'd been more pedantic, he would have seen where the mistake lies, as follows:

In step 1., the code is marked with the GPL copyright disclaimer on each source file. To get from step 1. to step 2., whenever you copy a GPL function into a new source file, you must _also_ copy the accompanying disclaimer into the new source file. Now your new source file has the GPL copyright disclaimer (pedantic, but necessary). Next you modify your source file any way you like, but you can't remove the GPL disclaimer, even though you can remove and change all the code below it. At some point, all the code below is your own, but the GPL disclaimer is still there and valid, because it was present throughout the development. If you now remove the disclaimer and put a BSD one in instead, you're clearly breaching the copyright.

So if you act pedantically, you can't fail to see where you're stuck with the GPL. Also, if you're pedantic, you can easily see how to go around the issue: create a set of source files which act as a _proxy_ for the copied GPL'd functions and isn't directly mixed with your other code, then you'll be able to split off the GPL code in the end. Besides, it makes the whole code more modular.

FWIW, I agree that the BSD guy made a mistake he paid dearly for, but if we as developers are going to play the copyright game and make a fuss when others abuse it, then we must play it _correctly_ and not _sloppily_.

Re:this is quite troubling (1)

QuantumG (50515) | more than 7 years ago | (#18670479)

At this point you have a derivative of the original GPL'd code. No problem there, you can do anything you like with the code.
No.. if you distribute it *then* you are obligated to release your code under the GPL, *but not before*.

No symmetry (1)

mangu (126918) | more than 7 years ago | (#18670317)

This reminds me of the recent story about GPL code found in OpenBSD. There too, an OpenBSD developer took someone else's code and started modifying it without keeping the GPL license


That's just like that old story about the resort where there were girls looking for husbands and husbands looking for girls. It's not a symmetrical situation. If BSD coders feel it's all right to give their work away for free to commercial companies, it doesn't mean GPL coders should be forced to do the same. Even if the BSD people disagree about the way GPL people licence their code, they should take care to respect the other point of view.

Ironic (5, Funny)

smackt4rd (950154) | more than 7 years ago | (#18669775)

So now american companies are pirating chinese software? Oh the irony! :)

Any surprise this was done in China? (-1, Flamebait)

Anonymous Coward | more than 7 years ago | (#18669837)

Google may be filled with the best engineers, but once you move out of North America, they know nothing about ethics or morality. I'm not surprised one bit that this was done by Chinese engineers. I hope those engineers get fired and Google probably needs to do a lot better to spread the sense of ethics across all parts of their company, even in those 3rd world countries.

Re:Any surprise this was done in China? (3, Insightful)

BiggerIsBetter (682164) | more than 7 years ago | (#18670093)

Google may be filled with the best engineers, but once you move out of North America, they know nothing about ethics or morality.

I'm curious how much time you've spent outside of North America, because I'm pretty sure 92% of the world population would disagree with you.

Sepaku (0)

Anonymous Coward | more than 7 years ago | (#18669879)

Google should just convince someone plausibly responsible to commit Sepaku with the promise their family would be taken care of in thanks for removing their shame.

Their new spokesperson ... (2, Funny)

myster0n (216276) | more than 7 years ago | (#18669985)

... Theo De Raadt says that the Chinese are INHUMAN.

*ducks*

Re:Their new spokesperson ... (1, Funny)

Anonymous Coward | more than 7 years ago | (#18670095)

In Soviet Russia the inhumans say that Theo De Raadt is Chinese.

Tough Luck (0, Troll)

Plekto (1018050) | more than 7 years ago | (#18670001)

I say Google stops being apologetic and says "so what". Afterall, China has no respect for U.S. copyrights and patents and steals from us every day.

Were the errors intentional? (3, Informative)

SuperBanana (662181) | more than 7 years ago | (#18670025)

If you ask around in the GIS/mapping community, it's known that the [street] map data providers (Delorme, Garmin, etc) will insert garbage data here and there. A street name is slightly wrong, or they have a mystery street that doesn't exist in the real world. They use it to try and tell if/when someone steals their data. If Zyugyz Road in Somecity, CA exists- the legal team fires at will.

It's kind of weird, considering that most mapping companies do little more than get their hands on town/county/state GIS data for cheap, massage it a bit, then charge assloads of money for it.

Re:Were the errors intentional? (1)

Dan East (318230) | more than 7 years ago | (#18670147)

The same happens with government medical related data. Take the ICD9 database for example. It is distributed in a database format not conducive to programmatic access. For example, there are hundreds of codes with the description of "Other". Its description only makes sense in the context of all its parent levels, which then produces an extremely large, redundant description. Companies will simply reformat the data, take copyright and profit.

Dan East

Re:Were the errors intentional? (0)

Anonymous Coward | more than 7 years ago | (#18670279)

Yes same here. In this google/sohu IME story the sohu developers inserted their names into sogou pinyin IME, so these names come out as the first choice for the particular key stroke sequence, although these names are all not trivial ones. Then people found that the same names come out as the first choice in google's IME, the only reason this can happen is that google is *copying* sohu's IME dictionary.

As you are probably not a user of Chinese pinyin IME, here an example to help you understand the situation: imagine that you have an English IME and you type in "sb" and you get "SuperBanana" as the first choice, how do you think?

Shame! (3, Funny)

BluBall (16231) | more than 7 years ago | (#18670047)

Following the protocols established by the recent OpenBSD/Linux Broadcom driver fiasco, the proper response would be to denounce Sohu for having been ripped off by Google.

Shame on you Sohu! This is inhuman!

Right! Google is evil! (3, Insightful)

SEE (7681) | more than 7 years ago | (#18670097)

After all, we know that all Google employees are under Total Management Mind Control, and that Google Knows Everything Everyone's Doing. It's not even remotely possible that a handful of Google employees in China could shadily cut corners (using an already-extant database instead of compiling one from their own company's data) without Sergey Brin and Larry Page having personally authorized it from Mountain View, or that it would actually take a bit of time for upper management to investigate an issue when it's uncovered.

Not a big deal (0, Flamebait)

gaz_hayes (1085925) | more than 7 years ago | (#18670145)

Good, google admitted it. I bet google contracted a Chinese company to supply them the database though. Apart from that, basically every piece of IP the USA has ever created has been copied by the Chinese and profit has been made. But, that doesnt make it right, and google needs to come 100% clean because if we start doing what the Chinese do to us, then there will be no more good people left in the world...

Re:Not a big deal (1)

Achromatic1978 (916097) | more than 7 years ago | (#18670357)

Tell you what, grab an M16 and man the borders. What the fuck piece of xenophobic, nationalistic tripe is this? "no more good people left in the world"?

Please tell me (1)

Mazin07 (999269) | more than 7 years ago | (#18670177)

How is Google's pinyin IME better than the tons of other pinyin IMEs out there? I tried it, and apart from having a search button, it doesn't seem to be a whole lot better than the Microsoft Pinyin IME that comes with Windows.

How does Google plan to set themselves apart from the rest of the competition and, even better, how does this fit into the "big picture"? Will the mass of adopters suddenly begin using Google search because it's built into their IME?

Re:Please tell me (2, Funny)

hackingbear (988354) | more than 7 years ago | (#18670477)

The advanced feature will be:


When you are typing your term paper using this IME, the IME will automatically google the Web and find out other papers on the same topic and you can just stop thinking and typing but instead copy from those paper on a click of a button.

Tutorial on Chinese input (5, Informative)

microbee (682094) | more than 7 years ago | (#18670255)

There are a lot of misundertstandings about how IME works and how Google copied non-public databases. So let me explain.

IME accepts keyboard input and converts it into certain language characters. There are many different input methods that decide how to generate Chinese characters by using English keyboards, and pinyin is one of them (and the most popular one).

pinyin is popular because it's simple and bears almost no learning curve. However, it suffers the problem of aliasing. For example, "shi" under pinyin will convert into "" "" "" ... in general, the same sequence could map to many different words (could be several dozens), and you usually need to select from them by choosing 1, 2, 3, ...(the input bar will display them from which you could choose, somtimes needing page-down). A native implementation of pinyin is thus very slow and cumbersome to use.

A good implementation uses following approaches:
1. adjust word location by how frequently it's used in the past. So most frequently used words are shift to the front, making selection much faster. Typically they should fit into the first page (no scrolling required).
2. allow partial input for common phrases. This inputs a whole phrase at once, each character only requiring the first English letters. It speeds up input significantly.

So the quality of the pinyin method depends heavily on how well the input could guess and prioritize the guesses, and thus the dictionary that is being used. And generating this dictionary (keeping it both contemporary and accurate) takes a lot of time.

The dictionary is typically distributed together with the input method (or it wouldn't work). You could obtain sohu's dictionary by just installing its input method, and Google has likely obtained it this way. However, I don't think it's in an open-standard format, so Google probably has done certain reverse-engineering to be able to actually use it in its own software.

That shouldn't be copyrightable (4, Interesting)

wrook (134116) | more than 7 years ago | (#18670265)

I've been thinking about this. Throwing the evilness of Google aside for a moment, why should someone be able to copyright a listing of the phonetic pronunciation of an alphabet?

Let's just imagine how I might create this list. I would have to hire people who spoke the Chinese. Then I would ask them to record the pronunciation of each character that they know. This is pretty easy because in Chinese each character has only one pronunciation (per dialect, anyway). There are about 3500 characters that you need to know in order to be literate. And all of these people would have learned these at school.

But how did they learn them? Well, they had a textbook and they memorized the list from the textbook.

Wait. I can't just memorize a list from one book and put it in another book. That's copyright infringement. In order for it not to be copyright infringement, I need to make sure that my sources all memorized the pronunciations from different sources. That's going to be difficult.

But let's say I do that. Now I have a list of the 3500 most common characters. And with that, I've probably got 99% of everything that's in a newspaper. But that's probably not good enough. I probably want a list
of say 60,000 characters. Otherwise it's pretty useless in a general sense. Uncommon characters are uncommon, but you *will* bump into the words over time.

So where do I find these characters? Can I hire some guy that knows them all? It would be very difficult. The best place to look is in a book. But wait... what am I going to do? Every time I find a character my people don't know, look it up in a book? Why don't I just copy it from the book in the first place? That's just copyright infringement again.

Really, the task of creating this list authoritatively without infringing copyright is monumental. Probably the *only* way to do it is with a community project where people just submit the pronunciations they know.

But if I'm going to have a community project like this, what the heck do I need copyright for? What am I protecting? If everyone is going to contribute, everyone should benefit.

So, personally, I don't think one should have copyright on this kind of material (same thing for spelling). It's just not in the public interest. This goes doubly so now that we have the internet and creating these kinds of projects is very inexpensive.

OK, I've gone on long enough... But one more rant. What's with this "do no evil" thing? Isn't that setting the bar a little low. If I told my parents that I'd work hard not to be evil, I think they'd be somewhat disappointed in me. If Google wanted to actually "do some good" rather than "do no evil", they could start a community project to collect this data and share it with the world.

Sigh... I guess we'll have to wait for some guy in his garage (but here's betting that someone has already started something).

Re:That shouldn't be copyrightable (1)

progprog (1016317) | more than 7 years ago | (#18670445)

Really, the task of creating this list authoritatively without infringing copyright is monumental. Probably the *only* way to do it is with a community project where people just submit the pronunciations they know.

It's not just about pronunciations, it's about the choices that appear and the order they appear in.

Take the term "guanxi" (meaning "connections"). One term, two characters. For a good dictionary, the correct characters for this term will map it to the default choice available after typing all six letters. A garbage dictionary would have no concept of common terms and perhaps put the characters for "can wash" before the characters for "connections".

Ordering the terms appropriately is important since a pinyin spelling maps to multiple characters. There is a huge difference in efficiency when the exact term you want is within the first couple "hits", as it may. Which is something Google may have some experience in...

There *is* such a community project - SCIM [wikipedia.org] . I wonder why Google didn't use/extend SCIM's database instead.

Easter Eggs save the day. (1)

DarkLegacy (1027316) | more than 7 years ago | (#18670461)

> In addition, both dictionaries listed the names of engineers who had developed Sohu's Sogou Pinyin IME.

And you thought Easter Eggs were just there for kicks. ;)

Finally we steal some IP from them! (2, Funny)

gatkinso (15975) | more than 7 years ago | (#18670471)

TURN ABOUT IS FAIR PLAY.

Ok fine, we have stolen from them before... but Beef and Broccoli don't count.

google is SO in troubbblllllle (1)

mycall (802802) | more than 7 years ago | (#18670609)

its the facts of life.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?