Analytics and Machine learning


Analytics and machine learning

Ask anything thread, here's reader's question:

Have we reached a point/have you seen anybody be successful / have you seen anybody really try to build a data lake of customer information and use machine learning to derive correlations from it? All I'm looking for is a tool and a data set that will say:

It turns out people buy more stuff online when it rains is this correlation

[x] Worthless ? [ ] Curious? [ ] Interesting? [ ] Fascinating? [ ] Actionable?

The data suggests that sending email #256b after customers of type 83h have viewed product 87254 results in a 298.4% increase in sales. IS this correlation

[ ] Worthless ? [ ] Curious? [ ] Interesting? [ ] Fascinating? [ X ] Actionable?


Are we getting any closer to turning that corner??

It's a great ask and gets to what is perhaps the central issue around analytics and analytics technology in the enterprise; namely, what's the role of the analytics warehouse and how seriously should we take the claims of big data machine learning advocates?

I'll start with a short answer to first and most direct question and then I'm going to expand on his two sub-questions to cast some light on the nature of that answer.

Have I seen clients build a data lake and get value out of it? Unequivocally yes. Our best clients – the one's really getting value out of digital analytics are nearly all using some form of advanced analytics warehouse / data lake and doing so very successfully.

Does that analytics value come from machine learning? Rarely. The vast majority of analytics value has come from traditional statistical techniques from straight algorithmic selections. (programmatic or SQL access to the detail data). I actually believe there is much value to be had from no-traditional techniques, but that's more my theory than proven field fact and I absolutely am not a big fan of the massive correlation approach that I see most commonly advocated by machine learning folks (though I don't want to cast a borad-brush here – there's lots of different flavors of machine learning and the term itself is a bit ambiguous).

So let's tackle your examples in more detail:

It turns out people buy more stuff online when it rains. Is this correlation

[x] Worthless ? [ ] Curious? [ ] Interesting? [ ] Fascinating? [ ] Actionable?

When I first read and replied to this, I assumed reader meant that the correlation was random and unexpected. But I realized afterward that to the reader it was more in the nature of obvious. Either way, I'm going to disagree with reader's selection here (though later I'm going to agree with what I take to be his broader point). Weather is important and while it might be obvious to the reader, it's left out of analytics and optimization on a routine basis.

As it happens, weather is often a highly predictive and essential variable when it comes to retail models. In Paul's bakery store baking models for example, weather was the single biggest variable factor. It was a huge impact on whether people will buy a warm cookie or not. It also has a big impact of whether people will shop in store and, ofcourse, the degree to which they might shift that behavior online.

And weather impacts aren't limited to retail where people buy online because they can't get to the store and going stir-crazy.

I remember from personal experience an interesting case where we were analyzing the PPC campaigns for an internet site focussed on real-time traffic. When we did the analysis, we found (big surprise), that giant storms in the North Uk drove massive increases in site traffic. That may seem obvious. No, that is not obvious. But here's the thing, they weren't regulating their PPC buys that way. They had a simple fixed budget. That meant that on a beautiful summer days they were spending the same amount as on blizzard days. Their budgets for PPC and their daily caps were keeping them from expanding their buys in December, so there were simply losing out on the opportunity to capture more (and, by our measurement, more engaged and valuable) customers. We found other important. Local effects like (closures) and by shifting their buying model to something more local we were able to dramatically improve their overall performance in PPC. Actionable.

We often find retailer's PPC vendors ignoring weather – and that's almost always a BAD idea.

Weather means in all sorts of places and around all sorts of use cases. I'd make a small wager that folks, are less likely to shop for life insurance n beautiful Saturday afternoons than rainy dreary ones – and that may matter when it comes to thinking about when I drop (and pay) for a display ad. But how many display campaigns in financial services are optimized by weather? A pretty small percentage I think.

Understanding when people do something has real meaning in digital (and, even more, outside digital), and weather is an important part of when they do something.

So I'm going to disagree with reader's immediate answer (Worthless). Whether because analysts don't think about the obvious or because program managers don't act on it, finding out that people buy more stuff online when it rains is useful and actionable – not least in allowing you to amp up your PPC buys to capture mostly offline customers (driven to online by being rain-bound and open to capture by new brands) of your competitors who aren't working hard enough to incorporate weather.

On the other hand, I'm on board with what I take reader's deeper point to be (and sorry for hijacking the point into a bunch of “weather” matters examples – especially since I probably missed the original irony). When Paul's bakery built their model, when we did our PPC analysis or built our utilities model, we didn't use machine learning techniques of variables to discover important relationships) to happen on weather as an important variable. We knew it was likely to be significant and we modeled it the old-fashioned way to understand the depth and importance of its impact.

The real question is how many important variables are there that analysis don't know about and is it worth randomly assembling data to find them? I'm very very, very, skeptical about this. It's true that analysts don't always understand the business their modeling very well and maybe weather is a great example of that. To solve this problem, you can:

  1. Hire analysts who understand your business

  2. Train your analysts in the business via temporary immersion.

  3. Track down every conceivable exogenous data source and use machine learning.

If you're reading this and you picked C, you're probably a salesperson for a technology vendor or a data science consultancy. Can unexpected correlations and important variables sometimes be discovered? Of course. But most businesses actually have a pretty decent understanding of the key factors driving performance even if they can't describe exactly how those key factors relate or interact. When that's the case, massive correlation is just a big, truly massive, very impressive waste of time.

Here's some thoughts that seem so basic and obvious that I'm almost embarrassed to write them down except that I often meet people who don't seem to grasp them:

  1. There are infinte (truly) set of possible external data points – so machine learning is always guided – it's just slightly less guided than when we were limited to 50 variables instead of 5000.

  2. Most of the work in analysis isn't in doing correlations. It's in assembling, cleaning and lining up the data and then in making sense of what possible correlations mean. Discovering correlations in the data via massive analysis of variables doesn't really save that much time.

  3. Data points often don't line up in ways that make it easy or even practical to auto-correlate them. It can take a truly massive amount of work to align certain kinds of operational data with marketing data in an intelligble and potentially interesting manner. Machine learning does NOT help with this and is dependent on this exercise to be successful with these data types.

  4. In the vast majority of cases, there just aren't hidden variables that no one ever thought of that are somehow the secret key to your business. This kind of “grail” thinking is pervasive in all walks of life and it's amusing but disheartening to see that analysts are as susceptible as everyone else to this type of delusion. There are plenty of consultancies who will gladly waste your money chasing that dream, but you would be much better off inventing in lottery tickets or hiring an astute court astrologer.

  5. It's much more likely that there ARE important variables that everybody knows exist but nobody has access to or can measure. Figuring these out is far, far more important than hunting for unlikely correlations in data.

Which brings me to reader's second sub-question and one that I think we can handle quickly because we are in complete agreement:

The data suggests that sending email #256b after customers of type 83h have viewed product 87254 results in a 298.4% increase in sales. IS this correlation

[ ] Worthless ? [ ] Curious? [ ] Interesting? [ ] Fascinating? [ X ] Actionable?

Yes, clearly right. And by putting these two examples forward, I assume reader means to sneakily suggest that most of the value in analytics comes from very unsurprising places and on the tail of considerable work. We are always charmed to hear stories of sudden analytic insight, swift brilliance and amusing and unexpected correlations. But real business analytics adds value mostly by delving in patient and disciplined detail into what we think is probably true.

It's the difference between the real, day-to-day practice of science and the “genius” models that dominate public imagination. I'm not a big believer in the genius models, even for true genius. Mostly, I suspect it's a lot more work than people like to think.

But id I'm not confident how genius works, I am sure that genius is not a strategy.

If you want to build an effective analytics team, the right strategy is to focus your attention on the problems you know matter and the data you think is probably important. Know your business? Absolutely always. Get the data you think you need? Definitely. Massive correlations of stuff that you doubt makes a difference? Occasionally... maybe.





 

Comments

Popular posts from this blog

5 Modern marketing trends

Identify user need for a discount to encourage full price shoppers

Entrepreneur goes into business