Text Analytics: A Primer

Marketing scientist Kevin Gray asks Professor Bing Liu to give us a quick snapshot of text analytics.

by Sharon Dexter

Project / IT Manager at GreenBook

By Kevin Gray and Bing Liu

KG: I see “text analytics” and “text mining” used in various ways by marketing researchers and often used interchangeably. What do these terms mean to you?

BL: My understanding is that the two terms mean the same thing. People from academia use the term text mining, especially data mining researchers, while text analytics is mainly used in industry. I seldom see academics use the term text analytics. There is another closely related term, called natural language processing (NLP). Text mining and text analytics usually refer to the application of data mining and machine learning algorithms to text data. NLP covers that and also other more traditional natural language tasks such as machine translation, syntax, semantics, etc. But there is really no clear demarcation between the terms.

KG: Can you give us a brief history of text analytics/mining and how it has evolved over time?

BL: It comes from three research areas: information retrieval, data mining, and natural language processing (NLP). Information retrieval started in the 1970s. It mainly deals with text retrieval. That is, given a query, which can be a few keywords or a full document, we want to find related documents from a text collection or corpus. Web search engines are giant information retrieval systems.

Traditional data mining uses structured data such as database tables. In the late 1990s, researchers started to use text as data, which gave rise to text mining. Early text mining basically applied data mining and machine learning algorithms on text data without using NLP techniques such as parsing, part-of-speech tagging, summarization, etc.

NLP has a much longer history. It started in the 1950s and its objective is to make computers understand human language. As text mining research expanded its scope in the past 10 years or so, it started to use natural language processing techniques such as parsing, part-of-speech tagging, coreference resolution, etc. Judging from topics covered in natural language process conferences, text mining has now become a part of natural language processing. My own research started with traditional data mining. I then worked on sentiment analysis or opinion mining, which led me to natural language processing.

KG: How is it used in marketing research and other fields?

BL: Text analytics has been used widely in marketing and many other fields. I am most familiar with the application of sentiment analysis. In marketing, marketers often want to know consumer opinions about their company’s products and their competitors’ products. Such opinions can be obtained by analyzing online reviews or other forms of social media postings about those products. Based on these opinions, marketers can formulate their marketing messages to suit different segments of the market. Public opinions are also very useful in many other application domains, e.g., stock market prediction, consumer sentiment prediction, political election prediction, etc.

KG: What are the major technical challenges text analytics faces?

BL: It all depends what task one is interested in. Some tasks are done reasonably well, e.g., named entity recognition. But many other tasks still need a lot of improvement in accuracy. The ultimate challenge is natural language understanding. Although researchers have worked on it for a long time, progress has not been great. Current text analytics techniques are still mainly based on traditional linguistics rules and statistical machine learning and data mining algorithms. These methods are still not able to achieve true understanding. Due to this problem, most text analytics tasks still have relatively low accuracy.

KG: What role does Artificial Intelligence play in text analytics?

BL: Advanced text analytics is a part of artificial intelligence (AI). Progress in other AI areas such as machine learning and data mining are making a big impact on text analytics. I would say that the main progress of text analytics in the past twenty years has come from better machine learning techniques.

KG: Are there misperceptions or misunderstandings many people seem to have about text analytics?

BL: I am not aware of any big misperceptions or misunderstandings about text analytics in academia. I am not sure about industry. The only thing that I know is that people can have very high expectations about text analytics, but it is a very challenging problem if you want to do it well and accurately.

KG: Lastly, looking ahead ten years, what do you think text analytics will be able to do that it cannot do now? Are there some things that will be impossible for text analytics for the foreseeable future?

BL: Let’s talk about natural language processing rather than text analytics, as advanced text analytics requires natural language processing. As machine learning such as deep learning progresses, we will certainly see better text analytics algorithms with much better accuracy than we can achieve today. But understanding natural language like we humans do is very unlikely in the foreseeable future because natural language is highly abstract. Every sentence we write has a great deal of commonsense knowledge behind it that we assume the reader knows. Clearly, a computer program does not know this. Learning, representing, and reasoning about commonsense knowledge is a major challenge.

KG: Thank you, Bing!

_________________

Kevin Gray is president of Cannon Gray, a marketing science and analytics consultancy.

Bing Liu is a full professor of Computer Science at the University of Illinois at Chicago (UIC). He received his PhD in Artificial Intelligence from the University of Edinburgh and is the author of numerous books and articles on sentiment analysis and opinion mining, machine learning and related subjects.

innovation interview state of the industry text analytics

Sharon Dexter

Project / IT Manager at GreenBook

15 articles

Disclaimer

The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.

Comments

Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.

3 Best Practices For Digital Marketing In The Market Research Industry

Nicole Burford offers tips on how to build an effective digitial strategy in the research space.

Sharon Dexter

Project / IT Manager at GreenBook

July 10, 2017

Read article

Research Technology (ResTech)

Marketing Analytics for Data Rich Environments

Professor Michael Wedel on the changing world of marketing analytics, and what market researchers most need to know about it.

Sharon Dexter

Project / IT Manager at GreenBook

April 27, 2017

Read article

Insights Industry News

What could $20,000 do for your company? Submit to the Insight Innovation Competition!

Submissions and voting are now open for the newest round of the Insight Innovation Competition.

Sharon Dexter

Project / IT Manager at GreenBook

April 6, 2017

Read article

Research Technology (ResTech)

Trump Ads Won the Election (and what marketers and advertisers can learn from it)

Trump ads directly motivated people on an emotional level, which in turn made them more memorable and inspired action.

Sharon Dexter

Project / IT Manager at GreenBook

January 20, 2017

Read article

ARTICLES

Top in Quantitative Research

Research Methodologies

Moving Away from a Narcissistic Market Research Model

Why are we still measuring brand loyalty? It isn’t something that naturally comes up with consumers, who rarely think about brand first, if at all. Ma...