Research Methodologies

January 13, 2015

Lessons from Automating Social Media Monitoring

As people use social media differently, as APIs change, social media monitoring programs can behave strangely.

Jeffrey Henning

by Jeffrey Henning

Chief Research Officer at Researchscape

0

social-media (1)

 

By Jeffrey Henning

Social media monitoring can be fragile.

I’ve been recapping the top 5 research links of the week since I coined the #MRX hashtag back in July of 2010. Originally I did it by hand, and then I had one of my sons automate the process for me in June of 2011. He had to make some minor tweaks to it whenever there were changes to the Twitter API (its Application Programming Interface, which is the way other programs are instructed to interact with Twitter as opposed to fetching and parsing its web pages).

As people use social media differently, as programs interact with it differently, as APIs change, social media monitoring programs can behave strangely. For instance:

  • At some point along the way, we stopped giving stories from The New York Times the clout they deserved, as their paywall interfered with how we looked up the URL. (Sadly, this is still a problem.)
  • When we wrote the system, Twitter didn’t have embedded images. People used third-party tools for that. Now that Twitter supports embedded images (e.g., https://pbs.twimg.com/media/B1SpTxTCYAAHKin.png:large from this tweet), sometimes the URL of an image would confuse our system.
  • When the system was originally written, emoji was not as prevalent. The initial implementation could not handle emoji and a variety of other obscure characters, so it ignored tweets with these symbols. Once emoji was added to the default keyboards of both iPhone and Android in 2011 and 2013 respectively, these symbols got used more often and our system would ignore more and more tweets.
  • The worst news at some point is that our system started producing bad data, and I didn’t notice, as it occasionally highlighted a top story that wasn’t a top story at all. This wasn’t because of an explicit change to the Twitter API, but a change to how data was returned. Sometimes, for no reason that we can tell, the URL would be returned as “t.co/…” and would inflate the count of another story. (Twitter uses its own URL shortener, t.co, even if you’ve already used a third-party URL shortener to get around the 140-character limit.)

Because of the shift to tweets with rare Unicode characters such as emoji, my son ended up rewriting our system from scratch. And the system now outputs additional diagnostics so I can verify its accuracy.

The algorithm seems to be working well now. Now, algorithm is just a formal word for automating a sometimes arbitrary process. We’ve implemented certain heuristics – another formal word, for rules of thumb! Really, a program is just an embodiment of judgment calls. Some of ours:

  • Tracking influence – We give everyone who tweets an influence score, based on some factors. There’s been a lot of research into measuring influence, and we have our own method for estimating it. If @lennyism retweets a link, it counts for more than if a new Twitter user retweets that same link.
  • Handling spam – If you retweet the same link three times to #MRX over a week, we’ve always counted it only once. We’ve implemented some new rules to better handle bots and other spammy activity that we’ve seen, including closely-related accounts retweeting a link. Beating spam is a constant battle.
  • Determining which link is canonical – We resolve shortened URLs so that we are counting the underlying link, not the different representations of it. For instance, we treat it/1D5E09h, bit.ly/1vpik1H, and lnkd.in/d8EVzD3 all as http://greenbookblog.org/2014/12/30/embracing-change-in-mr-a-year-end-perspective/. And we’ve added a few special rules to account for some different versions of hyperlinks to the same pages. Rediscovering how hard it is simply to track URLs makes me realize how error prone tracking brand names must be!

Fortunately, in our case, our program produces a report for a human to read and analyze, rather than simply spits out its results to Twitter. So a human can catch the things that the automation didn’t. For instance, I manually exclude references not related to market research (I am so not looking forward to the February release of the Bollywood film Mr. X!). I skip over any expired links – invitations to webinars now passed, for instance. And I curse any spammers who get by the system.

The lesson? Social media monitoring automation requires vigilance and updates, even for a hobbyist project like tracking the top 5 research stories of the week. Implementing custom brand trackers requires even more diligence – you should schedule regular audits to double-check the results. The myth of social media is that the data is free and therefore the analysis is as well. The data is free, but the analysis can be time-consuming and tricky. If social media is fragile, your monitoring must be robust.

0

automationsocial listening

Disclaimer

The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.

Comments

Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.

More from Jeffrey Henning

A Festivus for the Rest of Us Respondents

Research Methodologies

A Festivus for the Rest of Us Respondents

Reflecting on how we can improve survey design for respondents.

Jeffrey Henning

Jeffrey Henning

Chief Research Officer at Researchscape

Aliens vs. Dinosaurs

Brand Strategy

Aliens vs. Dinosaurs

Given the diverse backgrounds of market researchers, there is a real need to continuously train.

Jeffrey Henning

Jeffrey Henning

Chief Research Officer at Researchscape

Researchers and the Love of Learning

Insights Industry News

Researchers and the Love of Learning

MRII’s survey on how the market research industry is doing in career satisfaction, growth opportunities, and learning preferences.

Jeffrey Henning

Jeffrey Henning

Chief Research Officer at Researchscape

#MRX Top 10: The Rise of the Amateur, of AI, and the Death of Expertise

#MRX Top 10: The Rise of the Amateur, of AI, and the Death of Expertise

Jeffrey Henning details the 10 most retweeted links shared using #mrx over the last two weeks.

Jeffrey Henning

Jeffrey Henning

Chief Research Officer at Researchscape

ARTICLES

Moving Away from a Narcissistic Market Research Model

Research Methodologies

Moving Away from a Narcissistic Market Research Model

Why are we still measuring brand loyalty? It isn’t something that naturally comes up with consumers, who rarely think about brand first, if at all. Ma...

Devora Rogers

Devora Rogers

Chief Strategy Officer at Alter Agents

The Stepping Stones of Innovation: Navigating Failure and Empathy with Carol Fitzgerald
Natalie Pusch

Natalie Pusch

Senior Content Producer at Greenbook

Sign Up for
Updates

Get what matters, straight to your inbox.
Curated by top Insight Market experts.

67k+ subscribers

Weekly Newsletter

Greenbook Podcast

Webinars

Event Updates

I agree to receive emails with insights-related content from Greenbook. I understand that I can manage my email preferences or unsubscribe at any time and that Greenbook protects my privacy under the General Data Protection Regulation.*