There weren’t exactly a lot of plum jobs in pure mathematics in the 1970s, but Jim Simons had one. After working as a Cold War codebreaker at Princeton’s Institute of Defense Analyses, he took on the role of chair of Stony Brook University’s mathematics department when he was 30 years old. By all accounts, he thrived in the faculty and, in 1976, Professor Simons received the American Mathematical Society’s prestigious Oswald Veblen Prize in Geometry for his work on invariants in curved spaces. You can imagine the bemusement of his colleagues when, two years later, he quit to become a commodities trader.
Simons had become convinced that there were mathematical structures in the financial markets, controlling the seemingly inexplicable movements of asset prices. Believing that anyone who "solved" them could make a fortune in arbitrage, he hired a diverse crew of algebraists, statisticians, game theorists and computer scientists to find patterns in vast, largely forgotten data sets of historical trades.
His competitors, who invariably came from the hard graft and street smarts school of finance, scoffed. The markets weren’t an algebra problem, they were capricious and inscrutable. Data science – the derivation of useful insights through quantitative analysis – belonged in ivory towers, not the trading pit.
Undeterred, Simons began to make rapid progress as computing power exploded, utilising an early form of machine learning to improve the algorithms in real time. By the late 1990s, his hedge fund, Renaissance, was outperforming traditional funds, in stocks as well as commodities.
By 2018, Renaissance’s four funds managed approximately $65bn, according to Gregory Zuckerman’s biography of Simons, The Man Who Solved the Market. The most successful of them, Medallion, has generated average annual gross returns of 66.1 per cent since 1988, knocking the socks off star investors like Warren Buffett, Ray Dalio and George Soros. Algorithmic trades of the type concocted by Simons now account for roughly 30 per cent of all stock market activity, a proportion that has doubled since 2013.
Science vs intuition
What can the first quant, as data scientists are known in the investment world, teach the rest of us about how to make business decisions? A great deal, if you listen to the Silicon Valley FAANGs (Facebook, Apple, Amazon, Netflix and Google) and their coterie of start-up wannabes.
Many of the most successful companies in the world in the last two decades are built on the transformative power of data and they evangelise about it constantly. Facebook and Google, for example, have used maths and machines to devour much of the global advertising industry, a famed bastion of creativity and gut feel. Others have captured retail, recruitment, entertainment and publishing, in each case beating seasoned industry professionals at their own games, at breakneck speed. All of them insist that their success is built on science, not guesswork.
The success of data analytics hasn’t just changed what businesses can do, but something much more profound – it has changed how we think about human agency in decision-making.
"Ten or 15 years ago, I had to keep promoting the idea of using data analysis and statistical optimisation techniques, because many managers thought their intuition was good enough," says Oded Koenigsberg, professor of marketing and deputy dean of degree education at London Business School.
"There’s been a drastic change in perception in the last few years that’s almost turned it upside down. Many managers now think that decisions can be made only by algorithms."
Businesses far outside the technology sector are falling over themselves to board the data train. By 2015, there were already 2.3 million job adverts in the US requiring some measure of analytics skills, according to PwC, a figure predicted to grow to 2.7 million by 2020. The result is that we’re moving from not using data enough to becoming over-reliant on it. "If you’re doing something like running inventory, then yes you can probably use an algorithm, but if it’s a strategic decision you need to be very careful. Data analysis is a necessary ingredient of decision-making, but it is not sufficient," says Koenigsberg.
It’s easy to see how we could fall into the trap of thinking otherwise, given data science has been used to such profound effect by technology companies. And the consulting industry, which offers extensive digital transformation services, has made good money convincing any doubters of the need to get on board.
But the rise of the tech giants or quant hedge funds is misleading. Facebook and Google aren’t normal companies, for several reasons. First, they don’t just use data, they use big data. (For the avoidance of doubt, we can define data-driven insights as those derived from quantitative analysis, which could be as simple as financial forecasting models; big data insights generally involve using billions of data points to train fiendishly complex algorithms, often of the machine learning or self-improving variety. If you’re unsure whether your data is big, it’s not.)
It’s an important distinction, because in the world of numbers, size matters. So does the other abnormal feature of the tech giants – they actually know how to do analytics properly. Most do not.
"It’s interesting how there’s this obsession with big data – in bold capital letters – and investing in complicated technology, when perhaps the emphasis should be on finding the right information instead," says veteran consultant and former CFO Alastair Dryburgh. "There’s an awful lot of quite obvious, small data that’s just lying around in big companies, being ignored. And while it’s easy to start a new business system that will collect lots of data, it’s something quite different to do the hard work of finding the data in your legacy systems. I suspect that’s an awful lot less popular, but probably more valuable for most businesses."
Goldilocks and the three quants
At the heart of using data properly, whether it’s big or small, is recognising where it has value and where it does not. Take, for example, the charts below, which tell the story of Goldilocks and the Three Bears. As their creator Andrew Missingham, CEO of consultancy B+A Equals, explains, "Data is best at telling you what, when, where and (sometimes) how. It’s not great at why.
"While it is possible to tell an engaging story with data, the narrative and nuance of an individual human story can bring the insights to life much more easily. Basically, the charts don’t help us care about Goldilocks the person, or the Three Bears for that matter."
The story reveals two truths about data: not everything of value can be quantified and not everything worth quantifying is necessarily available. Failure to appreciate this can allow all sorts of cognitive biases and errors to creep in. Consider the polling industry, which has been subjected to a lot of unjustified flak over its recent failures to predict the Brexit referendum or US presidential election results, despite devoting considerable resources to creating large, bias-free samples. (Unjustified because the polls are just as inaccurate as they’ve always been – if you’d like some data on the subject, analysis by US polling site FiveThirtyEight found the average error of American polls since 1998 to be 6 per cent, which results in a statistical margin of error of over 14 per cent.)
How could the pollsters be so wrong? Maybe it was the models, or the assumptions that went into them, for example how likely different demographics are to vote. Maybe they selected unrepresentative samples. Maybe they asked leading questions, or just the wrong questions. Whatever it is, decades of practice haven’t led to any noticeable improvements.
You might think that big data would be a more reliable way of assessing voting intentions, perhaps by searching for signals through the noise of social media activity. But heavy social media users have been shown to be unrepresentative of the wider population and thus far, no adequate big data solution has been found to predict election results any more accurately than old-fashioned polling.
This could well be because big data has many of the same statistical pitfalls as small data – in some cases arguably more so. It is all too easy with any data set to find a correlation and assume that it proves causation (eg we made our ice cream packets red just before the heatwave, and sales went up – therefore our customers prefer red packets). In big data sets, this problem increases exponentially. According to the "multiple comparisons problem", identified by researcher John Ioannidis, the more data points there are, the more possibility there is for random correlations to be found, a problem that is exacerbated during times of rapid change, as there isn’t enough historical data to rule spurious correlations out.
This can have unfortunate consequences, for example around false positives. Dryburgh, whose consulting activities have stretched to the US intelligence community, speaks of a debate raging at the National Security Agency into the effectiveness of using big data to find potential terrorists. "If our algorithm is 99 per cent accurate, that means it will spot 99 per cent of terrorists, and if you run it against people who aren’t terrorists it will tell you they’re not terrorists 99 per cent of the time.
But in a population of 300 million where there are, let’s say 300 terrorists, then that means you’ll get 297 terrorists on a list of three million suspects, which isn’t particularly useful – especially if you happen to be among the three million.
"The reason is there haven’t actually been enough terrorists to train an algorithm properly, so more often than not there’s still a human saying that a certain pattern of behaviour – making calls let’s say to Pakistan, China and Nigeria – is indicative of being a terrorist."
This is the fundamental reason that biases and errors creep into data analysis – like any other form of thought, data insights ultimately depend on a human being – whether that’s the question the data is asked, or the assumptions going into its collection, processing and modelling, or the inferences made that eventually lead to a decision. This applies whether the insight is made by a statistician or an algorithm programmed by one. As a result, says Sandra Matz, data can always tell you what you want to hear.
Matz, an assistant professor of management at Columbia University Business School in New York and an expert in using computational methods in psychology, is keenly aware of the need to avoid unintended consequences from data analytics.
Her pioneering research into psychological targeting cast light 489 on how people could be surreptitiously profiled from their social media activity, a big data technique allegedly used by controversial ‘election management’ company Cambridge Analytica to serve persuadable voters fake news in the 2016 EU referendum and US presidential race.
"You can’t separate what someone wants to get from an algorithm from the way it’s designed. There’s always a person making decisions about what it does and what input data is included, and they can tweak those if they have a specific outcome in mind. To take an extreme example, if I only wanted to hire male candidates for a job, I’d use input variables like height and strength, and the algorithm would then just recommend men. In that sense, an algorithm is never objective," says Matz.
The most advanced machine learning is trying to bypass this human element, by incorporating intuition into the training of algorithms, and by examining unstructured or contextual data, such as pictures, documents and videos.
"For a long time, we’ve had quantitative analysis that uses statistical techniques to derive insights from structured data such as spreadsheets. In that scenario, it’s the data scientists choosing what’s interesting in the data. With the most modern techniques, the machine chooses what’s interesting," explains Dan Olley, chief data officer at FTSE 100 business intelligence and analytics company RELX.
Unsupervised learning, where the algorithm draws its own connections without the problem being defined by humans beforehand, can spot things that data scientists didn’t even know they were looking for, says Olley, "fundamentally blurring the lines between the human and the digital world".
Despite such a grand prediction, Olley still isn’t advocating letting the data speak for itself.
"The judgement must still fall to a human. Maybe I’m not forward-thinking enough, but I’m not comfortable handing over big decisions to the machine. The data sets aren’t complete – we haven’t encoded every action of the world into a data set. And if you think about our brains, they’re the most advanced computers on the planet, yet we all have unconscious biases because the brain takes short cuts, and that’s essentially what we’re simulating with machine learning."
It shouldn’t be surprising that there are limits to what data can do, or that its efficacy as a tool is only as great as the human beings wielding it. Yet our collective attachment to data means there is a danger, as B+A Equals’ Missingham puts it, "That its newness is blinding us to our sense. A leader’s job isn’t only to react to inputs, but to direct which inputs they have available to them."
Indeed, in several ways, being overly reliant on data can lead directly to basic errors in management.
Several years ago, for instance, Dell computers decided it wanted to improve its after-sale customer service experience. Quantifying how satisfied customers are is tricky – one person’s six out of 10 is another’s eight – so the company decided to use call length as a proxy. The quicker the fix, they figured, the shorter the call.
Unsurprisingly, it backfired horribly. "There are huge issues with using data as a performance measure," says Dryburgh, a former Dell customer. "Because they were being measured on average call length, the operators would do anything to get me off the phone as quickly as possible. Five times they sent an expensive engineer out, without fixing the problem. Sometimes they just cut me off midway through the call."
These problems don’t get easier as the data at your disposal gets more sophisticated. In the Profit Levers podcast, published on Management Today online, MIT Sloan School of Management senior lecturer Jonathan Byrnes describes what has happened to special forces operations now that senior commanders have access to real-time data from drones.
"The line between timely supervision and micromanagement has become blurred... a marine officer recalls, for example, that during an operation in Afghanistan, he was sent wildly diverging orders by three different senior commanders. One told him to seize the town 50 miles away, another told him to seize the roadway just outside of town, and the third told him don’t do anything beyond patrol five miles around the base. The biggest problem with top-level micromanagement in the military, just as in business, is the huge hidden opportunity cost of failing to manage at the right level, a leader ignoring the critical issues of high-level strategy and organisational capability."
Then there are the dangers of "paving over cowpaths" – using insights to make the processes you currently have incrementally more efficient, rather than seeking out new, bold, innovative ways of doing things, or doing entirely different things altogether. Amazon may have mastered using big data to connect shoppers with what they’re looking for, but it wasn’t data that gave former financial quant Jeff Bezos the idea of Amazon, or convinced him to take the risk in starting it.
Pointing out such errors in business practice or philosophy can be problematic when data has become the cultural currency for credibility. This isn’t just a problem when dealing with true believers, as Harvard Business School Professor Gary Pisano argues in his book Creative Construction.
"I don’t think the managers involved are stupid or naive. The vast majority, I believe, know that these kinds of analytics have limits," Pisano writes. "When I have asked them why they are still depending on these methods, I get two types of responses. The first is that they value the rigour of quantitative analysis... by boiling everything down to specific numerical values, there is a sense that the output is more precise.
"The second common response I hear about relying on analytical tools is, ‘There is no alternative.’" One of the great, erroneous arguments of the data-first model of business is that there is a binary choice between rigorous science and sheer guesswork. It seems a weighty argument – science doesn’t claim to be the truth, but rather to be the best available version of the truth, and thousands of years of history seem to back up the idea that, given enough time, science will eventually beat intuition, every time.
Understand the limitations
But science allows hypotheses to be tested, to be proved or disproved by experiment. Data science in business does not. The findings of corporate analytics departments can’t be independently verified, replicated, scrutinised or challenged, because the data sets and algorithms are almost always proprietary. No one can peer-review Google’s insights from its own data sets, without being granted access to them.
And the alternative to deferring slavishly to the data is not deferring slavishly to guesswork, but to use judgement when assessing the data, along with any and all other relevant information sources.
To do that, CEOs and senior executives will need to understand data at least well enough to appreciate its value and its limitations, rather than just placing blind faith in the experts. The danger otherwise is that you will have CEOs who don’t understand data, making business-critical decisions on the advice of data scientists who often don’t understand the inner workings of business.
"I would worry if a company said the answer to all this is to hire a few data scientists, because that says to me ‘I’m going to abdicate the problem to someone else and hope they can solve it’. Data is a culture, a company-wide initiative. But equally that doesn’t mean the whole company has to become data scientists, it just needs to become data literate," says RELX’s Olley.
Being data literate means that at least you know how to ask the right questions of the data specialists, especially as the science becomes more complex. "Rather than obsessing about what’s going on in the ‘black box’, ask how I trained the model, what datasets I used, how did I know they were complete, how did I test and validate the model," says Olley.
Commercial and data people speaking a common language requires work from both sides – it’s equally important that data teams understand the business context, which could be achieved for example by giving someone board level responsibility for data and the value it brings.
The most important thing that bosses can do, in a world awash with algorithms, is to recognise that while data is very powerful and very useful, so too are intuition, experience, leadership, open-mindedness and judgement, the strengths on which they were hired. After all, even the most proudly and successfully data-led firms in the world are still run by and for human beings.
If you’re still tempted to put all your faith in the numbers, spare some thought for Jim Simons’ great rival, LTCM, the most famous and successful quant fund in the world, when it suddenly collapsed in 1998 – a fate that has befallen a surprising number of algorithmic hedge funds over the years. "The LTCM collapse reinforced an existing mantra at Renaissance: never place too much trust in the trading models. Yes the firm’s systems seemed to work, but formulas are fallible," writes Zuckerman of the episode.
He gives the last word to one of Simons’ closest colleagues, Nick Patterson. "LTCM’s basic error was believing its models were truth... we never believed our models reflected reality, just some aspects of reality."
Main image: Getty Images
Graphs: Andrew Missingham (graphics Jon Butterworth)