New Online Dictionaries Automate Away the Linguistic Middleman 60
An article in The New York Times highlights two growing collections of words online that effectively bypass the traditional dictionary publishing system of slow aggregation and curation. Wordnik is a private venture that has already raised more than $12 million in capital, while the Corpus of Contemporary American English is a project started by Brigham Young professor Mark Davies. These sources differ from both conventional dictionary publishers and crowd-sourced efforts like the excellent Wiktionary for their emphasis on avoiding human intervention rather than fostering it. Says founder Erin McKean in the linked article, 'Language changes every day, and the lexicographer should get out of the way. ... You can type in anything, and we'll show you what data we have.'
Isn't that called Googling? (Score:4, Insightful)
Re: (Score:2)
Indeed. [google.com]
Re: (Score:2)
The difference should be in the prioritizing of results. The first few pages from Google might give only hits based on the most common meaning of a word, while Wordnik, according to TFA, should group citations by meaning.
In practice, this didn't seem to work for the words I tried.
Re: (Score:2)
Re: (Score:2)
Yes here: http://www.wordnik.com/words/Fascism [wordnik.com]
and here: http://www.wordnik.com/words/Anti-Semitism [wordnik.com]
Re:Isn't that called Googling? (Score:5, Insightful)
Gee, it sure looks like they're returning random search engine results next to—oh look, a list of opinions as proferred by so-called "linguistic middlemen."
I like how the top example for how 'magic' is used in English isn't even purely English, but a bullet point about features in the Zend framework. I'll make a habit of saying "__magic()" in everyday speech more often!
I think the worst outcome of this is that PHP now somehow has influence on the evolution of a natural language. I do not believe I am alone in feeling terrified by this prospect.
Re: (Score:2)
Then maybe what you want is DeepDict. E.g., magic is used like http://gramtrans.com/deepdict/lookup.php?word=magic&class=N&lang=eng&top=200 [gramtrans.com] - it is not free, though all words starting with 's' are currently open to viewing for anyone.
It yields info such as: black magic, Orlando magic, ceremonial magic ... magic kingdom, magic roundabout, magic flute ... practice magic, radiate magic ... magic of animation ... etc
(disclaimer: I work on the DeepDict project)
CCAE isn't that nontraditional (Score:3)
CCAE is an annotated corpus more than a dictionary. It counts words, word co-occurrences, etc. It's also manually annotated with parts of speech and other such things, not fully automated. Its scope is bigger and more recent than what was possible before computers, but the general idea is ancient: 18th-century classicists would manually compile frequency and word co-occurrence tables for ancient languages to try to get an understanding of their structure.
Re: (Score:2)
Having access to a good corpus is really helpful, but once you start hitting the 2k word count additional entries aren't really that helpful to anybody other than hardcore linguists.At that point it's generally more helpful to have information about what words frequently travel together and where they're likely to appear in a sentence.
Re: (Score:2)
It's depends what you're doing. I've spent a while dealing with the Scottish Corpus of Texts and Speech, and there the size is around four million words. If you're doing anything based upon dialects, size does make a very real difference, because you're interested in the density of usage by area. Personally, even in a non-linguistic context, I find it useful to know whether someone in x is likely to know (by virtue of using) a word y.
Re: (Score:1)
Good idea? (Score:2)
At the risk of being elitist, I wonder if I should adjust my use of language to that of the average American.
Re: (Score:2)
It's inevitable, language always adjusts to popular usage eventually, even with guards in place that act as filters.
Though I still cringe when people say they "could care less."
Not that all rules set in place by self-annointed authorities. I never understood why end-of-sentence punctuation should appear inside quotations, especially if it might not match what was quoted, like making a question out of a sentence.
Re:Good idea? (Score:5, Funny)
Though I still cringe when people say they "could care less."
That begs the question if inappropriate use of "begs the question" is like, worse, like, than like using the word like, like in as the first like word after every like lung inhalation. I think that is a full 360 degree reversal from your suggestion.
Re: (Score:2, Informative)
Re: (Score:2)
To be quite honest, it's not an, uh, a very uncommon pattern of speech, if I may say so, to interject one's spoken English with, discourse... discourse particles, and, well, other minor disfluencies, which do--- which do vary by social class, but more in, uh, word choice than in what you might call actual frequency.
Re:Good idea? (Score:4, Interesting)
Re: (Score:3)
Though I still cringe when people say they "could care less."
That begs the question if inappropriate use of "begs the question" is like, worse, like, than like using the word like, like in as the first like word after every like lung inhalation. I think that is a full 360 degree reversal from your suggestion.
I live in the corner of a quad of homes that creates an interesting
amplifying effect of sounds, within the area. So that a house that
is completely on the other side, hundreds of feet away, you can
clearly hear people talk. [Yeah, it DOES suck].
So, the other day, I heard this teen-thing speaking to her folks
and about the 20th like, I was gonna "say loudly" since that's
all one has to do...
"Like will you shut the fuck up"
But tis the season and all that crap.
-AI
Re: (Score:1)
For all intensive purposes, yes
Re: (Score:2)
I never understood why end-of-sentence punctuation should appear inside quotations, especially if it might not match what was quoted, like making a question out of a sentence.
So I'm not the only one? Yeah!!! Although I believe that you may have a question mark outside of quotes if the sentence (and not the quoted material) is a question.
Re:Good idea? (Score:5, Insightful)
To be honest, I find it visually more pleasant. After looking at code that passes strings around as arguments in C-style imperative languages all day, it's nice to see something without a big gap on the baseline (this "is," an "example", for you.) Since the quotation mark is already floating up and away from the letters, it's less jarring to see it separated from the word than a comma or period. (This is more or less the modern aesthetic justification for keeping it the traditional way. However, modern typographers don't always agree with traditionalists: watch what happens when you point out that the "single" space used to separate sentences prior to the invention of the typewriter was actually larger than a standard double space.)
Re: (Score:1)
What about punctuation for other languages such as and or the Spanish inverted question mark at the beginning and ? at the end of a question
Re: (Score:2)
What were the other symbols you tried
Re:Wikitionary? (Score:5, Informative)
"The Free Dictionary" appears to be just a spammy repackaging of Wikipedia content. Lots of their articles even have a footer saying they're licensed under the GFDL from Wikipedia.
Re: (Score:2)
Great Idea (Score:1)
Let's eliminate the making-sense and explaining that human beings can do. The absurdity of most spell check and voice recognition "did you mean" suggestions doesn't give me much hope that it's all just a matter of having enough data. Yes, Google can seem almost prescient, but only if thousands of other people are looking for the same things as I am. When I could really use a hint, Google never comes up with something useful. On the contrary, then I have to coax it not to replace my carefully selected search
What are these guys? (Score:2)
What are these guys, all we get is what they're not:
traditional dictionary publishing system
slow aggregation
curation
crowd-sourced effort
human intervention
I'm guessing they are also not street taco vendors, catholic priests or christmas tree salesmen. Great, that really narrows it down. So, what are they? I mean in terms of workflow, or data diagrams, or even user experience. And who are their users, anyway, unless they provide a really good reason, the rest of the world will continue to use wikipedia/wikimedia products, google (lets face it, mostly google), and the urban dictionary (dare I invoke encycl
So then ... (Score:3)
Re: (Score:3)
He has altered the English language. Pray he does not alter it further.
Re: (Score:2)
Lexicographers out of the way (Score:4, Informative)
Obviously, I'd suppose you still needed a few lexicographers to come up with the system.
And to maintain it, right?
The problem seems to be when you've put 95% of lexicographers out of a job, who's going to train the next bunch, and will it be cost-effective at a university level to have a graduate program in such for 1 or 2 individuals?
Re:Lexicographers out of the way (Score:5, Funny)
Obviously, I'd suppose you still needed a few lexicographers to come up with the system.
And to maintain it, right?
The problem seems to be when you've put 95% of lexicographers out of a job, who's going to train the next bunch, and will it be cost-effective at a university level to have a graduate program in such for 1 or 2 individuals?
Syntax error on line(s): 1 thru 1
Ambiguous contraction in "I'd".
Syntax error on line(s): 1 thru 1
Mixed tense in "still needed".
Note: Root word "need" satisfies the expression.
Syntax error on line(s): 3 thru 3
Incomplete sentence.
Syntax error on line(s): 5 thru 5
Expected colon after "be" in "to be when".
Syntax error on line(s): 5 thru 5
Expected capitalization of "when" in "to be when".
Syntax error on line(s): 5 thru 5
Extraneous comma.
Note: This message is generated only once for multiple errors.
Point taken: Screw the Lexicographers!
Re: (Score:2)
I would like to subscribe to your newsletter. Do you provide an Outlook plugin?
Re: (Score:2)
Syntax error: unknown token "thru"
words will not escape us anymore? (Score:1)
So if I type in "anything" I won't get just an interpreted response
but really -- what... everything?
bjd
Wordnik is a dictionary aggregator (Score:4, Funny)
I wonder what kind of sales pitch it takes to get $12 million for a free web dictionary.
'Just imagine if we could provide 100 definitions from other people for the word "butt", how much is that worth to you?'
Re: (Score:2)
Totally agree, and it seems that their data is not cross-checked at all:
http://www.wordnik.com/words/internet [wordnik.com]
antonyms
Words with the opposite meaning:
World Wide Web
WTF ?
Internet != WWW (Score:2)
Telivision (Score:5, Insightful)
Google seems to do a good job of detecting spelling errors and automatically updating it's dictionary and of course it also shows you websites where that word is used. I don't really see what Wordnik provides.
Re: (Score:2, Troll)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
I second this notion. I frequently use the define: $searchTerm query with Google.
For example: telivision [google.com],
or: Wordnik [google.com]
Compare the latter to the same search on Wordnik: Wordnik [wordnik.com]
Bonus: Those Google links are wrapped in TLS, so no one sees the query terms or results in transit. https://www.wordnik.com/ [wordnik.com] takes you to their developer site...
$12m in venture capital to invent Urban Dictionary (Score:1)
What a horrible summary (Score:2)
You make it sound like they're completely removing the human elements. And just, a corpus by nature does that, as they're only really involved in setting the bounds of the collection and letting the authors speak for themselves. Wordnik, on the other hand, allows *anyone* to contribute, but they're not allowe
Re: (Score:2)
Collecting random words on the web into a dictionary is like getting rid of standards altogether, or saying that every piece of software out there, no matter what it does, is standards compliant. W
continuity (Score:1)
Current generation nonsense, it's high time we return to Latin. Ita et vos per linguam nisi manifestum sermonem dederitis, quo modo scietur id quod dicitur? eritis enim in aëra loquentes.
But I'd accept Old English.
Re: (Score:2)
Do you really mean to tell me that you only use words as they're defined in the dictionary? And if so, which dictionary? Because as we all know, there's lots of different standards out there. And then there's versioning of the standards, and those implementations that aren't quite complient (in language, those would be regional dialets). Language is not as cut and dried as you think it might be.
But your suggestion is actually done in other countries -- the French have a government group that officially
Wordnik (Score:2)
Regarding Wordnik, I don't think Rick Santorum is going to be a fan of their site.
Historical accretion (Score:1)
All of the interviewed persons as well as the author of the NY Times Article leave a major issue unmentioned, and that is historical word use. As a very enthusiastic user of the Oxford English Dictionary ( yes, it has the place of honour in my living room ), each time I look up a word in the venerable OED I am amazed at the thick and variegated strata of historical meaning, and the gradual shifting in it, even for words we think of as "simple".
To wit, neither the Wordnik nor the CCAE person mentioned these