“Analytics is Easy”

Posted by Kevin Gray Thursday, April 9, 2015, 10:37 am
Posted in category General Information
Analytics is a lot harder than some seem to realize.

make-it-easy

 

By Kevin Gray

Erroneous thinking about analytics continues to hang on in the marketing research community.  Often it is tacit, but at times articulated candidly. This is worrisome given that marketing research is a research industry and no longer a young industry.  Some, for example, see analytics as little more than cross tabs and charting that can be done by anyone who has point-and-click software installed on their PC.  This is a bit like saying that if you can talk, you can do qualitative research.  Others think it’s “just programming.”  There are other misperceptions as well and one consequence of all this confusion is shoddy analytics which, in turn, raises doubts about the value of analytics.1  In this short article, I will demonstrate that analytics, in fact, is not easy and why this mistaken belief is potentially costly for marketing any researcher to hold.

Cross tabulations and graphics are an indispensable part of analytics but only part of it, and marketing researchers have long had a vast assortment of sophisticated tools at their disposal.  Even basic analyses should not be undertaken in a slapdash fashion, however.  Churning out stacks of cross tabs is not unheard of in our business but is very risky because even with big data there always will be fluke results.  Instead of placing our bets on shotgun empiricism, as researchers, we should plan cross tabulations and other analyses when designing the research, and interpret the patterns of our findings in the context of other pertinent information, not simply highlight isolated results.  The Improbability Principle: Why Coincidences, Miracles, and Rare Events Happen Every Day by David Hand, a past president of the Royal Statistical Society, is a great read and I can recommend it to marketing researchers.

Another example of substandard analytics can be found in mapping.  Nowadays mapping, in practice, frequently seems to mean junior research execs or even clerical personnel mass producing correspondence analysis maps, usually with the software’s default settings.  The maps are nearly always brand maps and user maps and other kinds of mapping are underutilized, in my opinion.  Moreover, though correspondence analysis is a wonderful technique it is just one of many appropriate for mapping, and biplots, MDPREF, MDS, factor analysis, discriminant analysis, canonical mapping or other methods may be better suited to the problem at hand.  What’s more, I still see maps being interpreted incorrectly.

Somewhat more elaborate but, nonetheless, debatable practice is psychographic segmentation with what has been called the tandem approach.  Though it began to be seriously questioned many years ago this method is still quite popular and, put simply, consists of K-means or hierarchical cluster analysis of factor scores derived from attitudinal ratings.  Tandem refers to the dual use of factor and cluster analysis in the segmentation.  The psychographic statements respondents rate are often improvised, making matters worse.  Poor questionnaire design plagues many kinds of marketing research and items that make little sense to respondents or mean different things to different people will sink a segmentation whatever statistical methods are used.  In the tandem approach, segments obtained from the cluster analysis are cross tabulated with demographics and other data in the hope meaningful and actionable segments will materialize.  They often do not and, accordingly, I sometimes call this the “Factor, Cluster & Pray” method.

Regression is perhaps the most widely-used statistical method of them all but is also deceptively complex.  Many books have been written which detail how regression analysis can be badly abused and Frank Harrell’s Regression Modeling Strategies is the most comprehensive and hard-hitting I’ve read.  Marketing researchers seem to make the sorts of mistakes people working in other disciplines do, though perhaps more often.  Some examples are using highly correlated predictors, neglecting residual analyses, ignoring correlations across time (e.g., in weekly sales data) or space (e.g., regions of a country), categorizing the dependent variable and confusing correlation with causation.

Another concern I have, in fact, pertains to causation.  Whenever we say things like “This sort of consumer does this because of that,” we are making a statement about causation whether or not we are conscious of it.  Causal analysis is a subject even bigger than regression and one bible is Experimental and Quasi-Experimental Designs for Generalized Causal Inference (Shadish et al.).  Trying to establish causation can be likened to walking though a minefield, to paraphrase a comment once made to me by a Marketing professor with a PhD in Statistics.  We need to tread carefully!

The next time you’re in a very brave mood, ask your senior finance director if they are no better at their job than they were 10 years ago.  Common sense should tell us that experience counts, particularly in highly technical professions.  Formal education only lays the groundwork for statisticians and even veterans are constantly learning new things and new tricks.  The list of viable analytic options continues to grow (for examples see Analytics Revolution) and we’ve reached the point where we now have so many tools that skill levels are becoming diluted.  Over-specialization, on the other hand, is also something we need to be wary of and some less-experienced analysts lean on a pet method for nearly any situationif all you have is a hammer, everything looks like a nail…

Now, here comes the bad news: The math stuff can actually be the easiest part of analytics!  Every so often I’m asked questions such as “If I give you 10 million customer records, what technique would you use?”  To characterize questions like these as naive would be too diplomatic, as they reveal little grasp of the fundamentals of research.  The Cross Industry Standard Process for Data Mining (CRISP-DM), illustrated in the diagram below, will help make clear what I mean by this.

 

CRISP-DM_Process_Diagram

 

Here are very succinct definitions of each CRISP-DM component, courtesy of Wikipedia.2 

Business Understanding:

This initial phase focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition, and a preliminary plan designed to achieve the objectives.

Data Understanding:

The data understanding phase starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information.

Data Preparation:

The data preparation phase covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be performed multiple times, and not in any prescribed order. Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools.

Modeling:

In this phase, various modeling techniques are selected and applied, and their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often needed.

Evaluation:

At this stage in the project you have built a model (or models) that appears to have high quality, from a data analysis perspective. Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model, and review the steps executed to construct the model, to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached.

Deployment:

Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it. Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data scoring (e.g. segment allocation) or data mining process. In many cases it will be the customer, not the data analyst, who will carry out the deployment steps. Even if the analyst deploys the model it is important for the customer to understand up front the actions which will need to be carried out in order to actually make use of the created models.

Bravo!  Properly understood, analytics is not just cross tabs, visualization or programming, or even fancy statistical techniques.  It is a process intended to enhance decision-making.  The first step listed above, Business Understanding, is often the most demanding and, along with Data Understanding and Data Preparation, can absorb the bulk of a project’s time and energy.  CRISP-DM was not developed specifically for marketing research but is applicable to our business and drives home the point that analytics is a multifaceted, iterative process which involves more than narrow technical skills…or the ability to use a mouse.  Serious errors can occur anywhere, anytime and even simple mistakes can have important consequences.

So, the next time someone even suggests that analytics is easy, I’d advise you to be on guard.  It just ain’t so.

_________________________________________________________________________

Notes

1 Some other reactions I have come across are that analytics is “too complicated,” or that isn’t needed or that it doesn’t work.

2 For a brief summary of CRISP-DM see Wikipedia: http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining.  For a more in-depth look, see Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management (Linoff and Berry), a popular, non-technical introduction to Data Mining and Predictive Analytics.

Share
You can leave a response, or trackback from your own site.

14 Responses to ““Analytics is Easy””

  1. Michael Lowenstein says:

    April 9th, 2015 at 10:50 am

    Terrific, comprehensive post. Confusing correlation with causation is perhaps the biggest threat to this often risk-averse, complacent profession: http://customerthink.com/correlation_is_not_causation_big_data_challenges_and_related_truths_that_will_impact_business_s/

    A related issue is utilizing metrics which have little or no connection to, or impact in driving, actual business goals simply because they are established and “accepted”, as opposed to applying metrics which actually work: http://customerthink.com/want-or-need-higher-customer-satisfaction-loyalty-and-recommendation-scores-the-real-question-is-why/

  2. Steve Needel says:

    April 9th, 2015 at 11:41 am

    Nicely done, Kevin. Statistical analysis is pretty easy these days, but understanding what it means (and what it doesn’t mean) is the hard part. I’ve always liked Niels Bohr’s comment, “It is difficult to predict, especially the future”.

  3. Kevin Gray says:

    April 9th, 2015 at 6:54 pm

    Thank you, Michael and Steve!

    NOTE TO EDITOR: For some reason, the CRISP-DM diagram image I refer in the post to isn’t visible. I’ve send that separately in a later mail so perhaps that can be fixed. It’s really cool! 🙂

  4. Kevin Gray says:

    April 10th, 2015 at 7:15 am

    We now have so many tools, many of which are highly complex and won’t easily fit into user-friendly GUIs, that analytics is much more challenging than just a few years ago. This would be true even if the volume, variety and velocity of our data had not changed. There are downsides to progress as well as the obvious benefits.

  5. Eric Rusiecki says:

    April 13th, 2015 at 10:27 am

    So true! People think that because big data is available, that we can make a simple plug and chug model for every client. The point made about customizing the model for the client is very important and cannot be overlooked. I am excited to see how <a href="http://www.prorelevant.com/agent-based-modeling-vs-marketing-mix-modeling/&quot; Agent Based Modeling can be used by analytics firms to make a customized living model for their customers.

  6. Michael Wolfe says:

    April 15th, 2015 at 8:53 am

    Early in my career, my boss, a prominent person in MR by the way, told me that surveys and focus groups were the life blood of MR above everything else. I considered that symbolic of the divorce that MR had with data science and predictive analytics that I think still exists in the minds of many today. As a result, I ceased identifying myself as a MR person from that point forward This is too bad, particularly in light of the perception by many that MR is obsolete, irrelevant and struggling to grow and have a seat at the table. Also, in light of the fact that analytics and data science is a hot growth profession, this is certainly one explanation of what ails the MR profession in general.

  7. Velimir Kanev says:

    April 15th, 2015 at 10:04 am

    Thanks for the interesting article, Kevin! And the reco for further readings, great insights and always important reminders.

  8. Paul Snyderman says:

    April 15th, 2015 at 10:31 am

    Kevin – thanks for a great article. I think the core problem for Market Researchers is found in the CRISP-DM visual. Understanding the industry and the data requires time, experience and study. In the context of the mantra that market research be done “faster and cheaper” we end up with faux research. Big cross tab deliverables and massive PowerPoint decks – often created by high speed systems. It looks like market research, but it really is nothing like the insight that we should be trying to deliver.

  9. Kevin Gray says:

    April 15th, 2015 at 4:45 pm

    Thanks to you all for your comments! I’ve been concerned about the future of MR for a very long time now. One reason stems from what Mike noted – that many have seemed completely blind to other data sources and analytics that have been a core part of MR for decades. This blindness also underscores that many MR agencies have been suppliers and no more, with little understanding of the marketing decisions clients are trying to make and how they are (and have been) making them. MR in general has low “brand equity” among many clients…small wonder.

  10. Jennifer says:

    April 15th, 2015 at 6:58 pm

    Hi Kevin, Thank you for this article! I deal with this every day. Agencies are promising faster/cheaper research, and I see studies broken up at each step, and passed around to vendors and juniors across agencies. There is often no project oversight, error is introduced at every step, studies are designed illogically, insights are lost, and the data is wrong. The broken model is very frustrating for those of us with ethics and a passion for doing good research. Please continue to write more articles like this.

  11. Kevin Gray says:

    April 16th, 2015 at 5:25 pm

    I know what you mean, Jennifer. Instead of Quick and Dirty, Not at All would be better! Why pay for suspect information and then use it to make bad decisions? Either take mr and analytics seriously or don’t use it.

  12. Chris Robinson says:

    June 7th, 2015 at 11:30 pm

    Kevin a very good summary as usual. What always amazes me as a statistician is how little people actually recall from their one year of statistical analysis. A well known market research firm in Asia was trotting out brand maps for its clients with the input being mean scores for brands across attributes. When I pointed out how nonsensical this was, their first response was very defensive, obviously assuming if it can be coded as software it must be okay. When I simply referenced the variability of skews in distributions for each attribute they started to get it.

    Truth is market research has been light years ahead of the data industry with its need for speed of interrogation, so in that sense we have been ahead of the game. But to assume cross tab capabilities equals data insights is madness.

    You kind of reference structural modeling as a powerful but dangerous tool. I would agree, with the one proviso, it does make you at least postulate models that get tested by various iterations. I do think tools like this can bring you closer to data insights, even if we are market researchers!

  13. Kevin says:

    June 9th, 2015 at 4:44 pm

    Thanks, Chris! I’m a big fan of SEM too and agree that any abuser-friendly software carries with it risks. However, some R users seem really to be programmers who don’t know what they’re doing either. 🙂 Sparse documentation and crowd-sourced user support also are risky! Certainly, stats education and training has fallen short, and not just in MR. Much work to be done…

  14. Townsend Analytics Jobs | Fresher CV says:

    September 17th, 2015 at 8:02 pm

    […] "Analytics is Easy" | GreenBook – By Kevin Gray. Erroneous thinking about analytics continues to hang on in the marketing research community. Often it is tacit, but at times articulated … […]

Leave a Reply

*

%d bloggers like this: