http://www.forbes.com/entrepreneurs/2005/11/14/google-ibm-amazon-cx_lh_1115sentiment.html
Two Thumbs Up
Leah Hoffmann, 11.15.05, 10:00 AM ET
Ever searched Google for advice on vacation destinations? Unless
you're traveling to an obscure village in Siberia, chances are your
search turned up thousands of relevant Web sites. Looking at all of
them would be tedious and time consuming. But what if there were a way
to sort them based on whether or not they're favorable?
As it turns out, there is. "Sentiment analysis," as the field of
research is known, is a hot topic among computer scientists these
days. The goal is to create computer programs that can determine
whether a document is positive or negative. And corporations, like IBM
(nyse: IBM - news - people ), Microsoft (nasdaq: MSFT - news - people
), Google (nasdaq: GOOG - news - people ), and Amazon.com (nasdaq:
AMZN - news - people ), are paying attention to the
results. Successful applications could help automate market and
product research and dramatically alter the future of a simple
Internet search.
The most common approaches begin by identifying certain "indicator
words" within a text--words like "good," "bad" or "beautiful"--that
convey positive or negative emotions. But that's not as easy as it
sounds.
"The variety of words that people use for subjective expressions is
staggering," says Janyce Wiebe, a professor of computer science at the
University of Pittsburgh. Wiebe and her colleagues have already
assembled a dictionary of some 8,000 indicator words and phrases.
"The dictionary tells you whether a word is positive or negative when
it's taken out of context," Wiebe explains. "The challenge is to
figure out whether it's positive or negative in each individual
instance."
There are a number of different ways to accomplish this. Wiebe uses a
program that can--with assistance from humans who "train" it to
recognize the right answers--learn how context impacts the meanings of
words.
Peter Turney, a researcher at the Canadian National Research Center's
Institute for Information Technology, uses his own dictionary of
indicator words to assign each of a document's adjectives a positive
or negative value. He then averages these values to create an overall
opinion score. Accuracy rates for these methods range from 70% to 85%,
depending on the kind of document.
If you're trying to classify documents within a specific
field--reviews for a particular product, for instance--you can
increase accuracy by customizing your indicator words. (After all, a
"hot" refrigerator is bad, but a "hot" nightclub is not.) That's the
approach taken by Norwegian search company Fast Search and Transfer
ASA, which unveiled a customizable sentiment analysis program,
Marketrac, last year. FAST has already licensed Marketrac to more than
a dozen clients, who use it to keep track of everything from online
hotel reviews to the speeches of European business leaders. Licensing
fees for the program are more than $100,000 per year.
Other potential applications in the field of sentiment analysis
include automated flame detectors for online bulletin boards, tracking
systems for stock market reports and programs that monitor movie or
product reviews. You may also one day be able to do a simple Web
search to find out what people are saying about a given issue.
"When people begin expressing their emotions, their language gets very
florid and very complicated, very quickly," says Lillian Lee, a
computer science professor at Cornell University. It will be a few
years before scientists are able to refine their algorithms and get
greater accuracy with a range of documents. Once they do, however,
it's a safe bet that companies will be giving them a thumbs up.
This archive was generated by hypermail 2b30 : Tue Jun 09 2009 - 05:00:12 EDT