CoderRank is a Big Idea

CoderRank suggests a universal law: for every categorization problem, some humans will do better than others.


Editor’s Note: This post is part of our Big Ideas Series, a column highlighting the innovative thinking and thought leadership at IIeX events around the world. Stu Shulman will be speaking at IIeX North America (June 13-15 in Atlanta). If you liked this article, you’ll LOVE IIeX NA. Click here to learn more.

By Stu Shulman

CoderRank is a big idea.  CoderRank is to text analytics what Google’s PageRank has been to search. Just as Google said not all web pages are created equal, links on some pages rank higher than others, I argue that not all human coders are created equal; the accuracy of observations by some coders invariably rank higher than others.

The major idea is that when training machines for text analysis, greater reliance should be placed on the specific inputs of those humans most likely to create a valid observation. I proposed a unique way to measure and rank humans on trust and knowledge vectors, and called it CoderRank. The U.S. Patent and Trademark Office agreed it was a novel approach to machine-learning and issued a patent March 1, 2016. Not bad for a political scientist.

In 2011 I read a few very important and influential books. These books brought years of laboratory experiments into sharper focus contributing directly to the big idea.  One was What Would Google Do?: Reverse-Engineering the Fastest Growing Company in the History of the World by Jeff Jarvis. I already knew about PageRank and the history of search technology through other books; however, Jarvis introduced me to a compelling way to think about where value is created in distributed software systems. What Google does is let end users and builders of systems create value on top of their web-based infrastructure.

Another source of inspiration for CoderRank was Everything Is Miscellaneous: The Power of the New Digital Disorder by David Weinberger. Weinberger writes compellingly about the difference between organizing books in a library using the rigid Dewey Decimal system versus the way we filter information in online databases using different observations,  different people,  different systems, and influenced by different reasons. A take-away point, however, is that every observation matters, but some matter more depending on the context.

There is no more important book in the formation of this big idea than James Gleick’s The Information: A History, A Theory, A Flood. The story of information conceptualization is fascinating. In every epoch, the innovators built new tools for collecting, measuring, and processing information. From Plato’s deep concerns about the frustrating effects of categorization disagreements, through the dawn of machine-learning, Gleick surfaces fundamental problems with information management. The problems with categorization cannot easily be ignored or planned out of existence. However, identifying the best tools, methods, and measurements fits squarely in the long history of information.

The big idea of CoderRank builds on these known experimental and theoretical problems. It suggests a universal law: for every categorization problem, some humans will do better than others. How we deal with this fact is a challenge for data scientists and qualitative researchers going forward.

You can leave a response, or trackback from your own site.

4 Responses to “CoderRank is a Big Idea”

  1. CoderRank Is A Big Idea | DiscoverText says:

    June 14th, 2016 at 10:09 am

    […] part of the recent IIeX North America, I wrote a short article about the recent patent award granted by the USPTO for our approach to enhanced […]

  2. Brandon Nealey says:

    September 11th, 2016 at 10:27 pm

    Hey Stu,

    This is great work. I’m curious about any potential paths to equality, and currently give my energy to dreaming up a solution. Sentient expression in any form leading to nothing but a universal recognition of it’s value is my current go-to, and beyond that, I wish for a shift that removes the need for measurement, equality included. That would be something! Do your thoughts ever go to disruption of CoderRank?

    Brandon Nealey

  3. Stuart Shulman says:

    October 31st, 2016 at 7:56 am

    Thanks for the supportive comments. I’m curious how you are using “equality” in this formulation. As for disruption, it is my view that CoderRank is the disruption.

  4. Text Mining Workshop Hosted by Emlyon | DiscoverText says:

    October 8th, 2017 at 9:14 am

    […] The key breakthrough led to a patent (US No. 9,275,291) being issued on March 1, 2016. We built a tools for adjudicating the work of coders. For example, if I ask 10 students to look at 100 Tweets that mention “penguins” and code whether or not they are about the NHL’s Pittsburgh Penguins, there will be imperfect agreement. Some coders will have deeper knowledge of the subject and some Tweets will be inscrutably ambiguous. Adjudication allows an expert to review the way the group labeled the Tweets and decide who was right and wrong. This method of validation creates a “gold standard” and it allows us to score over time the likelihood that an individual coder will create a valid observation. Participants will learn how to apply “CoderRank” in machine-learning. The major idea of the workshop is that when training machines for text analysis, greater reliance should be placed on the input of those humans most likely to create a valid observation. Texifter proposed a unique way to recursively validate, measure, and rank humans on trust and knowledge vectors, and …. […]

Leave a Reply