Credit: J. Albert Bowden II

Why big data is a blunt instrument

Much has been promised by the data scientists who say they understand the deeper meanings and patterns of mega numbers. Sceptics, including this one, say they have yet to deliver.

by Matthew Gwyther
Last Updated: 23 Jun 2016

In recent years many claims have been made for data. Few of them have been small. Most things in the excitable and frothy world of data are not just big. They are mega. ‘Big data’ is the thing of the moment. If you don’t get it you are so way behind the curve that your company will be missing out. Companies that are able to get a grip on big data will vanquish the data-incompetent. Soon, data, it is promised, will be as vital a factor in success as brand power, assets and human capital.

OK, nobody seems precisely sure what big data actually means beyond a sense that there is competitive advantage to be gained for everyone in the storage and analysis of vast amounts of impenetrable information. But never mind. For now big data is the new oil and, if you manage to strike it, you are quids in.

So what does it mean? Probably the most celebrated and frequently cited example of big data’s power involves a pregnant teenager from Minneapolis and the US retail chain Target.   

Supermarkets woke up to the value of making extensive records of their customer’s behaviour and purchases way back in the 1990s. It was Tesco with its legendary Clubcard, invented by the husband and wife team behind Dunnhumby, who paved the way for Britain’s top grocer. It’s such a valuable part of Tesco’s business that the now stricken company is being forced to sell it. It has attracted a number of bids but none anywhere close to an asking price that was said to be £1.5bn.

Too often big data can be a never-ending rabbit hole. Credit: DARPA

Tesco kept schtum about how well Clubcard was doing for them because in the UK we get anxious when our grocers come over all Big Brother (meant in the original Orwell 1984 sense) with us. But predictive analytics, as it is known in the trade, is big business for large retailers.

Anyway, to return to Target. The chain came unstuck when an assiduous reporter from The New York Times managed to make one of Target’s big data guys sing like a canary about how the techniques worked. Target had been trying for ages to establish when its customers were pregnant because there is so much cash to be made in selling them nappies, bum rash cream and all the other associated baby paraphernalia. After years of data study, looking at tell-tale sales of vitamins, scent-free soap and large bags of cotton balls they had their eureka moment and came up with the complex algorithm.

The supposedly true story of how this can go wrong was told in a delicious narrative by the NYT hack Charles Duhigg when an angry father walked into a Target store to confront the management:

‘My daughter got this in the mail!’ he said. 'She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?'

The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologised and then called a few days later to apologise again.

On the phone, though, the father was somewhat abashed. ‘'I had a talk with my daughter,’ he said. ‘It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.’

Outing teenagers who fail to understand the benefits of contraception is one skill. But the claims for big data are many and varied. Some even involve non-procreative sex. One of the more unlikely came in a recent Harvard Business Review paper with the mildly preposterous title ‘Data Scientist: The sexiest job of the 21st century.’ Never mind helping Scarlett Johansson get dressed in the morning. Forget being a paediatric surgeon in Aleppo. No, it’s ‘I’m a [brief pause] data scientist, actually’ that is going to really help you score in a bar. Really win friends and influence people.

After this show-stopping revelation you naturally follow up with a couple of witty data-savvy gags like: ‘There are only two types of people, those who can extrapolate from incomplete data’ and ‘There are 10 types of people in the world, those who understand binary and those who don’t.’

The HBR article went on to describe these individuals, often armed with PhDs in Physics who will want vast salaries plus stock options before they sign on. Not only that but they want to be ‘on the bridge’ - a reference to Star Trek and Spock to whom every human deferred when some data needed evaluating. Data scientists are frequently ardent Trekkies.

‘Data scientists today,’ crooned the HBR, ‘Are akin to Wall St quants of the 1980s and 1990s. In those days people with backgrounds in physics and math streamed to investment banks and hedge funds where they could devise entirely new algorithms and data strategies.’

Yes. And look where those complex algorithms, which the quants’ Master of the Universe bosses couldn’t quite get their heads round, got us. The Great Crash and the worst depression in Western civilisation ever. Nothing like handing over a complex sales tool for a stack of useless sub-prime mortgages to a big data genius.

Incidentally, even in the wild, pre-2008 days of unrestrained investment banking nobody described a quant as a ‘Master of the Universe’. Goldman Sachs and the rest kept them well fed and watered but very much in their boxes.    

The HBR article didn’t help its case by citing the example of an early LinkedIn employee as an example of a master of the sexy data craft. LinkedIn is many things, some of them occasionally useful, but sexy it isn’t. If Facebook and Twitter are the Javier Bardem and Penelope Cruz of social media, Linked In is the Michael Gove - earnest and moving forward too eagerly into your personal space.

But let us just pause for a moment. This is too sceptical. Too unfair and it avoids some incontrovertible facts.

The central argument about data’s power in the world of business is that it makes sellers far wiser and effective when it comes to pitching stuff to potential buyers. It gives them valuable knowledge about potential customers - what they think, what they’ve been buying and therefore what they might be persuaded to buy in the future.

These days what big organisations desperately lack with their customers is intimacy. They often don’t really know them that well at all. They are numbers, not names. The rise of the web has meant the death of the face-to-face. The personal. How different this is from retailing in the old days.

The reason old-fashioned shopkeepers were - and some still remain - great at pleasing and retaining their customers is the data they keep in their head. Knowledge combined with some emotional intelligence. Way back when, your butcher recalled the last time he sold you a piece of fillet steak and you said it was great. He also recalled you tried the sausages with sage and apple and your kids didn’t like them. (He also knew if you were Jewish or Muslim not to offer you pork sausages in the first place.)  So he thought of a better idea because you told him so to his face. You had a human relationship. Thus, when fillet steak is in, he keeps a nice piece especially under the counter just for you.  

Some of this emotional intimacy can be retained by the power of brands and the hold that they have over the non-rational part of customer behaviour. But that is another story and one in which big data has next to nothing to contribute.

Having those you don’t know knowing lots about you unnerves many people. I have no intrinsic fear of the use of data in marketing. I don’t get vexed by the question, ‘Whose data is it, anyway?’ Used properly it saves me - and those who are selling - time and money. I don’t mind people trying to sell me stuff that I like and might want. I get bored and irritated by being offered stuff I don’'t like and will never want. It wastes my precious time. That is why in the old days the 30-second ad during the News At Ten, which used to be the main artillery piece of FMCG purveyors, doesn’t really work any more.  

That is why, as a non-gambler, if I have to watch Ray Winstone one more time utter his ‘Bet! Nahhr!’ during a sports intermission, I might put my fist through the plasma screen. It’s a blunderbuss of a shot that takes out many unintended targets. (Actually it doesn’t even hit them. It misses. Goes way over their heads into the ether of irrelevance costing the marketer a shed load of cash in the process.)

Data is supposed to be smart. In the online age it is supposed to understand me. This is why cookies and their analysis were a dream come true to business. In theory. The business of personalised re-targeting of ads is a pretty grim one. The problem is they don’t work.

Recently I went to Beirut on a story. (Great place, incidentally, and I would heartily recommend it.) But after doing a quick search for flights online and buying them I was plagued for ages afterwards by pop-ups and ads for flights to Beirut. Big data was so dim it couldn’t work out I wasn’t going to be buying flights to Lebanon like a brand of beer for the foreseeable future. Fascinating city but I wasn’t going to commute there. I was quite happy to go once and return without any of my bits winding up in a Hezbollah trophy cabinet.

Big data has become a fad. Credit: J. Albert Boweden II

If big data is so smart why is it when - on the rare occasion I actually pay any attention whatsoever to the ad-related stuff in the right hand column on Facebook - do I see ads for slippers and three-bed new-builds in Crawley? Is that really the best that planet-brained Zuckerberg and his hoards of sexy data scientists in black T-shirts can do when trying to get my purchasing attention? I actually feel pretty insulted that they think this naff stuff might be ‘my thing’. Slippers? Crawley? I mean, c’mon.

In asking this I am in no way ungrateful to Facebook. We should all feel lucky that we can post photos and videos of our kids, plus all our holiday snaps for a wide range of friends across the globe for no cost whatsoever. Have you ever stopped to think what Facebook’s servers must cost? How much it must set them back just to keep all those petabytes cool?

One cannot consider Facebook without looking at the vexed issue of data privacy. Data’s other  big issue. They are so down on Facebook in the European Commission that Brussels has warned EU citizens that they should close their Facebook accounts if they wish to keep information away from the eyes of the US security services. There is no doubt that the American government’s Prism programme revealed by Edward Snowden to the world does give the guys in the Pentagon access to all those pictures of your kids.

And then there are the really bad guys. Recently, in an effort to market its services, a data protection company carried out an experiment in which it parked sensitive personal data on dark web sites. These are the dodgy venues where stolen data is bought and sold. They then watched to see how quickly cybercriminals would pass it around. It didn’t take long.

Within a fortnight the data - which included the names, addresses, phone numbers, social security numbers and credit card numbers of 1,568 fictitious people - had been viewed more than 1,000 times and downloaded 47 times by people in 22 countries on five continents. Those hard at work on their laptops included criminal gangs in Russia and Nigeria. There is a big market for this kind of information.
Moving away from business for a moment to health and the NHS. In medicine, knowledge is power. Medicine may be an art not a science but there’s a reason why doctors when they meet you take a detailed history, as they call it.

What if all our histories - not to mention all the blood tests and x-rays and diagnoses we have had since birth could be collected together on one massive Department of Health database that would be available to any doctor treating us in A&E from Land’s End to John o’Groats? Think of the lives, the time that would be saved.

Well the NHS tried this at the beginning of the Noughties. It was called Connecting for Health and it was a cataclysmic disaster. After burning its way through almost £20 billion of our money over nearly a decade in April 2007, the Public Accounts Committee of the House of Commons issued a damning 175-page report on the programme. The Committee chairman, Edward Leigh, claimed, ‘This is the biggest IT project in the world and it is turning into the biggest disaster.’

This was a classic example of a massive top-down, statist endeavour with consultants laughing all the way to the bank, going horribly wrong. But what if - in an attempt to get health data together across the nation for the benefit of the NHS - it went bottom up?

Just consider, for a moment, if large numbers of people could be persuaded to do it for themselves on a voluntary basis. The internet of things promises many things. But what if a programme like Strava, beloved of so many cyclists, combined with digital fitness bands that measure blood pressure, heart rate and the rest were all combined to make a personal digital health passport. That could be big data is serious positive action.  

The world of cycling has been transformed by data, but it isn’t big. It’s quite small and personal. And cyclists are utterly engaged with it rather than resentful and alienated.

So where has big data scored some notable success? The example often quoted is that of exposing false insurance claims. When you look at this in detail the industry cites spotting dishonest claim patterns but also observing fraudsters on social media boasting about taking whiplash insurers for a ride.

A sceptical piece about big data in The New Yorker quoted Anthony Nystrom of the web software company Intridea, who suggested that selling big data is a great gig for charlatans because they never have to admit to being wrong. Said Nystrom: ‘If their system fails to provide predictive insight, it's not their models, it’s an issue with your data. You didn’t have enough of it, there was too much noise, you measured the wrong things. The list of excuses can be long.’

We had one of the global data champs in the UK in Mike Lynch and his meteor of a company Autonomy. Autonomy was so good at digging through massive amounts of ‘unstructured’ data and finding patterns that no mortal could achieve that by 2010 it was the UK’s most successful software business.

In 2011 the ailing tech giant HP - which hadn’t had a decent new idea in several decades - bought Autonomy for an amazing $11.7bn (£7.7bn), a premium of 79% over its market price.   

Within a year, and after major ‘culture clashes’ between the Cantabrian and Californian teams, HP wrote down the value of the company by $8.8 billion. To say there have been recriminations would be an understatement. The matter is now in the hands of dozens of lawyers on both sides of the Atlantic, who appear to be the only winners in the situation. Funny how big data has never been able to replicate the tasks of a lawyer sifting through all those files bound with pink tape that get wheeled into courtrooms. That task is left to human minds. But just think of the fees that could be saved.

Big data has very similar problems to small data. Except they are far larger. A beautiful mathematical model is one thing, but the numbers you feed into that model are fragile. Remember the ‘garbage in, garbage out’ principle? When you get right down to the most granular level you can, if you forgive the mix of the metaphor, miss the wood for the trees.

The Labour Party at this year’s election claimed it was master of data, leaving the Tories way behind in its digital sophistication. It was so on top of Facebook it was going to leave the analogue Cameron for dust. Look where that got them. You ignore intuition and judgement at your peril. And massive mainframe computers just don’t have these.   

We’ll never know what Albert Einstein would have made of big data. However, he was the one that intimated: ‘Not everything that can be counted counts, and not everything that counts can be counted.’ By which, one assumes, he meant numbers - while helpful in finding our way in the world - are not the route of all wisdom and knowledge.

This article first appeared in Rapha’s magazine Mondial. Matthew Gwyther owns a Brompton, which means he is far too naff to wear Rapha kit.

Find this article useful?

Get more great articles like this in your inbox every lunchtime

Upcoming Events