As we start to emerge from lockdown, the challenge facing business leaders isn’t just one of short-term survival, it’s also about building resilience into their organisations – to buffer them from unexpected shocks, and to help them steer a path through an uncertain future.
Resilience has many dimensions. For some people it is about adapting quickly to changing market dynamics. For others it is a human capacity to cope with adversity. The dimension we focus on here is operational resilience.
This is the capacity of a system to absorb disturbance and retain its basic ability to function; in colloquial terms, keeping the show on the road. It includes such things as having enough people in key roles, ensuring the supply of critical components and keeping current customers happy.
To start a conversation about improving operational resilience, you need to have a point of view on what good looks like. So take a moment to reflect on this question: what’s the most resilient system you can think of?
We have asked our MBA students this question, and it’s fascinating to hear what they came up with. One interesting answer is the Internet. As lockdown hit in late March, Internet usage - especially videoconferencing – grew by several orders of magnitude, and the Internet coped just fine.
The original design principles of Tim Berners-Lee – standardised protocols for data exchange, capacity to route packets of data through multiple pathways and so on – have proven to be very effective. When we have problems with connectivity, they are almost always local ‘last mile’ issues, not a problem with the core infrastructure.
Another response is to look at utilities like electricity, water and heating. For example, we take it for granted that electricity will be available 100 per cent of the time at the flick of a switch. But think for a second about how that reliability is achieved – with all sorts of different sources of energy, each with its own unique qualities, flowing into a grid and used in a way that ensures continuous supply but with as little waste as possible.
Widening the lens, some people talk about first-responder services, such as the police, ambulance or fire services. Their raison d’être is responding to low-probability, high-impact incidents, and for the most part they do it well, despite continuous pressure for efficiency savings. The challenge of keeping these response units fit and motivated to act at the drop of a hat, while keeping costs under control, is considerable.
There are also examples from further afield – faith systems, democracies, life itself – and it is fascinating to think about such settings, to understand the underlying principles that make some systems more resilient than others.
Overcoming the efficiency-reliability dilemma
Bringing these examples back to a business context helps us frame the challenge facing corporate leaders. You can think of the Internet, electrical utilities and first-responder services as reliable-first: they are built to ensure that even when unexpected problems occur, or when demand spikes, they are still able to function. Sometimes this is achieved by building spare capacity (e.g. the fire service), sometimes by having multiple sources of supply (e.g. the electricity grid), sometimes by stockpiling key components (e.g. personal protective equipment in the case of a pandemic).
Most business systems, in contrast, are efficiency-first: supplies are sourced from their cheapest location around the world, inventory levels are kept as low as possible, and delivery happens on a just-in-time basis.
So the challenge here is conceptually very simple: how can we develop a best-of-both-worlds flexible solution, where we build reliability into our operations without giving up on efficiency? In graphical terms, there are tactics we can deploy to reach the top-right corner of the matrix in the figure below (thanks to our colleague Alex Yang for helpful discussions here. Both he and Jérémie spoke about the challenges of operational resilience in webinars hosted by Julian recently):
Of course there is no silver-bullet solution. But there are many steps you can take to resolve the trade-off to some degree. Building on insights from some of these reliable-first settings, and also from academic studies, here are five specific suggestions.
1. Focus on the weak links
Every system can be divided into subsystems or components, and with some fairly simple analysis you can identify which subsystems are creating the biggest threat to reliability, and then focus your attention on those. It turns out that often it’s not the most expensive or elaborate components that create the biggest problems.
An academic study in Ford Motor found that the biggest potential performance impact (of failure) came from its smaller suppliers. In the context of the pandemic, we can see something similar going on – for example, the problems in the UK healthcare response were around lack of Protective Personal Equipment (PPE), and not availability of hospital beds.
2. Create transparency and trust through the system
If you don’t know anything about a supplier – for example its financial situation or its own internal resilience – you tend to assume the worst and build contingency plans around the possibility of it failing. Moreover, lack of visibility in supply chains often create huge problems, for example the ‘bullwhip’ effect where small demand fluctuations downstream create huge fluctuations upstream in the chain. These problems can all be reduced with greater transparency – information sharing – among parties, because problems higher up or lower down the chain can then be anticipated and planned for.
Greater transparency also goes hand-in-hand with trust. In 1997, one of Toyota’s suppliers called Aisin suffered a factory fire that threatened to halt production of all Toyota’s vehicles. Within days, Aisin mobilised dozens of firms in its network to manufacture replacement “P Valves” to Toyota’s exacting standards. Knowledge sharing, trust and mutual dependency among these firms all played their part in keeping the just-in-time production system on track, and it also pays off in normal times when these firms collaborate to improve quality or reduce costs.
3. Simplify your interfaces for quicker coordination
One key reason why the Internet works so well is the “interoperability” of the systems and components that feed into it. There are internationally-recognised protocols for coding and sharing data, which reduces the reliance on any one part of the system. The electricity grid, likewise, takes in energy from multiple sources and it has standardised ways of distributing and storing energy.
Contrast this with the automotive industry where there are few common standards across manufacturers for electronic components – which means relatively few suppliers and greater risk of disruption. This applies to people and teams, too: many organisations designed for high reliability (such as hospital emergency departments) have standardised roles (e.g., paramedics, nurses, junior doctors, consultants) and team structures (shifts, wards). As a result, individuals and teams can be easily substituted or supplemented when the need arises.
It’s not enough just to have simple and well-defined interfaces – you also need a way to coordinate things so that supply and demand can be matched quickly. Consider Uber’s scheduling system – drivers are matched with customers in seconds, and if one driver rejects the match, another steps in. Uber may not be 100 per cent reliable, but it’s a vast improvement on what we used to put up with. More broadly, many companies face opportunities with their staffing and planning processes to dynamically allocate their resources to quickly changing demands.
4. Invest in fungible (general purpose) resources
When there is a shock to a system, there is often a huge spike in demand for certain resources. If those resources are highly specialised - for example ventilators - you need to stockpile them to cope. But if they are fungible, meaning that they can be deployed in a number of ways, capacity planning is much easier. Hospital beds were mostly pretty full at the start of the crisis, but they were quickly made available (by sending less unwell patients home) to cope with the COVID-19 crisis.
The same logic applies in other settings. Consider the world of business education: some universities had stronger digital capabilities than others before the pandemic hit, and this allowed them to scale up online learning quickly. But these investments were typically made with a general view of digital learning becoming more important, rather than specific concern over the risk of a pandemic.
There is a human side to this point as well. Investing in the general development of your employees – and cross-training then in multiple activities – is a good thing in general but particularly so as a way of coping with uncertainty. Firefighters aren’t just trained to fight fires – they are skilled at fire prevention, and a range of other emergency services as well.
5. Keep people fit and alert
It’s not enough just to have equipment and people that are available to be redeployed at a moment’s notice, they also have to be willing and able. People working in the fire service, where active fire-duty is a small part of the job, spend lots of time in training and doing drills.
And people working in high-reliability settings like power stations, air traffic control or mines are trained to think holistically about the work they are doing (rather than in narrow silos). They are drilled in safety and security techniques, and they take learning from near-failures (e.g. a lost time injury) very seriously.
How do you get your employees to behave in these ways? A huge part of operational resilience is cultural, so you can encourage these behaviours through what you say and how you say it. You also need good metrics and aligned incentives. Is there a risk, for example, that a buyer in your organisation might select a low price supplier in a dangerous location or in poor financial health because of an aggressive spend reduction target? High-reliability organisations are very careful about what types of behaviours actually get rewarded, to ensure that profit-seeking doesn’t drive out safety.
In sum, there is a range of things you can do to improve the reliability of your operations without spending a fortune on stockpiles and alternative sources of supply. Sometimes it is about homing in on the weak points in your operations and making sure they don’t fail. Sometimes it is about designing the overall system better (greater transparency, standardised interfaces). And at other times it is about investing in the human side of the system to make sure things work as they are supposed to.
Obviously the mix of tactics depends on your specific circumstances. But hopefully the ideas and examples provided here will help you think more creatively about this important challenge.
Julian Birkinshaw and Jérémie Gallien are professors at London Business School. Copyright Julian Birkinshaw and Jérémie Gallien 2020
Julian Birkinshaw discusses resilience further in the summer 2020 edition of Management Today, available here
Image credit: Monty Rakusen via Getty Images