PROBLEMSOLVING PART 4 – HOW TO SOLVE PROBLEMS IN COMPLEX SYSTEMS

In three previous parts, I have described how we act, often intuitively, to solve problems. It often works well in simple (I used a bike with a flat tire as an example) and complicated systems (I used a car with engine problems as an example) but it may not work as well in complex systems (I used an Air Traffic Control Centre with several losses of separation as an example).

No alt text provided for this image

In this, fourth and final part, we will look at other methods for problem-solving, that could be more appropriate for complex systems. I will begin with Dave Snowden. He is the creator of Cynefin, a framework used to aid decision-making. According to Snowden, in complex systems we will not know for sure, what the effect of different actions will be. Every action is an experiment.

That means, we will have to prepare for the fact that actions can give results that are either more positive or more negative than we expected. Thus, we must be prepared for both possibilities. If the result is negative, me must have a possibility to “dampen” these effects, by discontinue or by returning to an earlier stage. If the result is positive, we can move on and perhaps increase.

These ideas resemble the “micro experiments” that Robert de Boer advocates in his new book “Safety Leadership”. One example is the introduction of changes in an aviation maintenance facility, described in a paper that can be found on: https://www.researchgate.net/publication/341728911_Safety_differently_A_case_study_in_an_Aviation_Maintenance-Repair-Overhaul_facility.

One conclusion from Snowden is that traditional, large change projects, are doomed to fail. In a recent post on Twitter, he wrote:

The single most fundamental error of the last three decades is to try and design an idealised future state rather than working the evolutionary potential of the here and now, the adjacent possibles – it is impossible to gain consensus in the former, easier in the latter.”

For me this is a consequence of the way complex systems work, the lack of predictability for the actions we take in such systems. Still, we continue to hear descriptions of such idealised future states, and of the big projects that will realise them. Apparently, Snowden wants us to rather observe the reality and where it is heading, and to use that understanding to carefully influence that heading and he is using the word “nudging”.

The term “nudging” became popular through a book from 2008; “Nudge: Improving Decisions about Health, Wealth, and Happiness” by Richard H. Thaler (later a Nobel Prize winner) and Cass R. Sunstein. This book describes active engineering of choice architecture, to help people do the choices that are best for them without removing the freedom of choice. If you Google “nudging” you will see many pictures that illustrate how this can be used!

No alt text provided for this image

Snowden’s twitter-post with “working the evolutionary potential of the here and now” can seem to touch on another idea for improvements and change, called “appreciative inquiry” (AI). This originates from a 1987 article by David Cooperrider and Suresh Srivastva.

Traditional management of change tends to begin with a problem, the identification of causes to the problem and then the creation of a plan to address the problem. Working with AI means that you start with asking what is already working well. From that foundation you start the creation of a vision of what could be and of what is needed to get there. Instead of focusing on problems, the focus is on what is good and there are many examples of good results from using this method. One example of an organisation that has used AI for improvements is British Airways.

Erik Hollnagel, in his book “Synesis”, seems to agree with Snowden when it comes to avoiding too large change projects. Although he mentions the need for big plans to guide the long-term development, he also says that if big plans result in large projects, there may be difficulties. One of these is the possibility to evaluate. Large projects take time, and since complex systems change all the time, it will be difficult to know whether changes are the effects of the project or by other factors. Therefore, Hollnagel advocates for big plans to be implemented via small steps with individual objectives. Such small steps can be done quicker, and the system can be seen as more stable, and evaluation is easier. Such evaluations are important, as they can be used for the planning of the next small step, but also for the continuous evaluation of the big plan.

Contrary to this, Hollnagel also points at another factor. The effects of change can sometimes take time to show, especially in complex systems. If the evaluation is done too quickly, there is the risk that all effects of the change are not yet visible. Who said managing change in complex systems should be easy?

However, and above all, Hollnagel insists that we cannot use simplified models of our system as a way to understand them. Such shortcuts do not work well. Instead, we must spend more resources on getting to know and understand our system. One of the reasons it is so difficult to predict how a change will affect a complex system, is that we know too little about it. A way forward is using the system “FRAM”, a tool created by Hollnagel. I will not describe it in detail here, but it provides a way to describe a system and how different functions are connected and affect each other. This increases our possibilities to understand how our actions will affect our system, and by getting to know our system better, we can also reduce the gaps between “work-as-imagined” and “work-as-done”.

There is so much more to discuss when it comes to complex systems and how we can improve our way of managing them. However, for now, I decide to stop here and end this fourth and final part.

PROBLEMSOLVING PART 3 – WHY COMPLEX SYSTEMS ARE DIFFERENT

In two previous parts, we looked at simple and complicated systems and at the method we use to solve problems in these systems. We also looked at how we often apply the same method for complex systems. In this part, we will look at some of the things that makes complex systems different from simple and complicated systems. To understand these differences can help us see why our standard problem-solving method is not always appropriate for complex systems.

Stability

We looked at a simple system – a bike with a flat tire – and at a complicated system – a car with engine problems. These systems are both stable. If we do not address the problems we found, the problems will remain. Then we looked at a complex system; an Air Traffic Control Centre, where we had a number of recent reports on losses of separation between aircraft approaching an airport.

A complex system is however not stable in the way a simple or complicated system is. At the ATC Centre, imagine that we do nothing to solve our separation problem. The controllers that work with this traffic, and perhaps were involved in the incidents, will probably discuss this with their colleagues. The situation is not pleasant, and they want to understand why the incidents occurred, and how they can make sure will not happen again. It is quite possible that they will develop an understanding and find a way to adjust their way of working to avoid the problem. To adapt and adjust to the situation is typical for normal work. It is certainly a plausible development; that there will be no more losses of separation – even without any action from the Centre management!

Of course, there are several other, also plausible, developments, including some more negative, with even more incidents. The point I am trying to make here is that a complex system is “alive”, constantly adapting, and developing. It is not stable.

Independency

Our simple and complicated systems – the bike and the car – are stable also in another sense. They are not affected by other systems. Fixing the flat tire on the bike will not have any effect on the car’s engine problems.

A complex system is however usually consisting of several other systems and is also connected to outside systems. These systems are typically tightly coupled, in a way that changes in one system will affect other systems. In our ATC Centre, the airspace is divided into several sectors, each sector manned by one or two controllers. Even if the incidents we saw only occurred in one sector, other sectors will be affected. One example could be that the incidents are mitigated by the controllers, by increasing safety margins. This will have an effect on co-ordinations and traffic solutions towards other sectors.

We can expand this. Our system – the ATC Centre – consists of smaller systems, but it is also a part of a larger system and it borders to other systems. Our ATC Centre is not living in splendid isolation – it is just one of many ATC Centres, most of them managed by other organisations, in other countries. One change in one sector can affect the situation in other countries!

Cause and effect

With our simple and complicated systems, we found a clear connection between cause and effect. Using our method (Plan, Do, Check) we can establish that the cause of the bike’s flat tire was a hole in the hose, and the cause of the car’s engine problems was old spark plugs.

In our complex system – the ATC Centre – we often assume that there are similar connections between cause and effect. One of the objectives of an incident investigation is to find the cause (or causes). Starting with the incident and going backwards in time it is often possible to see a chain of events leading to a point that we can declare “the root cause”. In our example, we found a new method, with issues concerning the description of it, leading to difficulties for the controllers and then to the losses of separation. That finding can however be something of an illusion.

With the bike and the car, the connection between cause and effect is obvious already before the problem. We can say, without any doubt, that a broken hose, will lead to a flat tire. And if we understand a car engine, we will know that if we do not replace the spark plugs at regular intervals, we will get engine problems.

We do not have that ability with complex systems. Of course, we could have realised that our new method might be mis-understood and that it might increase the risk for a loss of separation.  This is however not the one and only possible effect. In another scenario, the controllers could have been able to adopt, there would have been no incidents, and we would continue to have a poorly described method in our system for years ahead.

I am not suggesting that there is total randomness in a complex system, but the connection between cause and effect is only obvious in hindsight, not in advance. Instead, we can talk about “emerging effects”, where a large number of factors work together in unpredictable (or at least hard-to-predict) ways.

After the problem-solving

When we solve problems, or introduce changes, with simple or complicated systems, we can be almost certain that the change is long-lasting. Our repaired bicycle tire will work fine until there is a new damage to it. Our engine will run smoothly until it is again time for the spark plugs to be replaced (with a reservation for the fact that a car’s engine is complicated enough for other parts to be able to cause problems).

When we solve problems in a complex system, it is another story. Our problem-solving might look successful and our evaluation can be positive, but a few months later we might have a reoccurrence without any obvious reasons. It can be connected to human adaptation – people can revert to older working methods, that felt easier or more efficient. It can also be an adaptation to a completely different change, but in a way that looks like reverting.

A complex system is never completely stable. It changes while we are planning a change, it changes while we implement a change, and it changes after we finalised a change. Most of that change is for the good, it is a way to make sure that the system continues to deliver what is expected, in a changing world, with limited or lacking resources, with competing and sometimes conflicting goals.

Conclusion

There is much more to be said about complex systems. I still hope that I have been able to show a few reasons why traditional ways of problem-solving, that work very well with simple and complicated systems, might not give the intended results in a complex system.

One factor is stability, or lack of stability. While we check, monitor and analyse a complex system – it changes. We implement a new change – perhaps a new method for arriving traffic – but that is not the only change to the system. As our system changes and adapts to different circumstances, we try to predict the effects. It is good to realise how difficult that can be.

In an ATC Centre, no day is the same as another day. Airlines change their aircraft, destinations and time schedules. Weather changes: winds, thunder and icing conditions and airlines adapt to this. There are also changes in the Centre itself: controllers don’t work exactly the same, and sometimes they get sick and there is a lack of resources. That will change our way of working – affecting other ATC Centres as they will affect us.

The obvious question, with all these aggravating factors, is if it is at all possible to solve problems in complex systems. In the next, and final, part I will point at some of the ideas that are to be found. Join me then!

PROBLEM-SOLVING PART 2 – COMPLEX SYSTEMS

In the previous part, we had a look at methods to solve problems. I used a bike with a flat tire as an example of a simple system, and a car with engine troubles as an example of a complicated system. The methods for problem-solving were similar; notice the problem, find a hypothesis to develop a plan (perhaps after asking someone who is more knowledgeable), proceed with the plan and evaluate the result.

Let’s move on to our next example. We are now the management of an Air Traffic Control Centre. This is of course a very well organised place, with highly qualified and trained staff, advanced technology, several specialised support functions, and established processes for a range of activities. The outcome is continuously monitored by collecting different kinds of data. One example is the reporting system, where operational staff can hand in mandatory or voluntary reports. Lately, we have seen an increased number of these reports, dealing with a loss of separation between arriving aircraft, to one of the airports serviced by the Centre.

Faced with this problem, we turn to our universal problem-solving method. We assume there is a cause to our problem, and we realise that to be able to find the cause we will need expertise. In this case, we use our incident investigation experts. They have investigated the reports, made an analysis, found the causes of these incidents and made suggestions for actions to solve the problem.

From these reports, we learn that shortly before the new incidents (losses of separation) started to occu, a new method was implemented that changed the handling of arriving aircraft to the relevant airport. Our hypothesis, based on expert opinion, is that there were issues with the way this new method was described. The air traffic controllers misinterpreted the new method, or found it difficult to apply, and the result is that the Centre had a number of incidents, loss of separation.

With this hypothesis, we can now develop a plan. We will revise, clarify, and improve the new method. This updated version will then be implemented, and the result evaluated. Hopefully, we will see that after the implementation there will be no more loss of separation and we have another proof of the validity of our problem-solving method!

There is however one detail to consider. An Air Traffic Control Centre is neither a simple nor a complicated system. Instead, it is a complex system and as such has a few attributes that makes it different in many aspects, including how to manage problem-solving.

But our example above described a successful example of problem-solving, confirming the validity of our method, didn’t it? It may look that way, but there are a few questions to be asked; Did we really find the cause? Was the improvement we saw the result of our actions? Did we actually solve the problem once and for all?

To be able to manage problem-solving in complex systems, we need to understand a lot more about them. This is what we will look at in the next episode…

PROBLEM-SOLVING PART 1 – SIMPLE AND COMPLICATED SYSTEMS

Here I will discuss how to solve problems in simple and complicated systems. I will later continue the discussion into complex systems, about why they are different and how this affects our problem-solving methods.

My discussion will be based partly on “Cynefin”; the framework created by Dave Snowden. It will also be based on what Erik Hollnagel writes about systems in his new book, called “Synesis” which is about “the Unification of Productivity, Quality, Safety and Reliability”.

Hollnagel describes several models for managing change. One example is the “PDCA wheel”, where the letters represent Plan, Do, Check and Act. It actually resembles the scientific method, described by Francis Bacon already in 1620; formulate a hypothesis, carry out an experiment and then evaluate the outcome. Whether we are aware of these methods or not, I believe we use them intuitively.

A flat bicycle tire

Imagine taking the bike to go shopping and finding a flat tire. We would quickly come up with a hypothesis (the hose needs repairing), carry out an experiment (fix the hose) and evaluate (pump the tire and see if it holds the air). We could also call it PDCA and if Check works out well, we Act by going shopping.

A bike is what we (Cynefin) call a simple system. It has a clear boundary (is not affected by other systems) and it is an ordered system. By ordered we mean that there is a clear connection between cause and effect. It is also simple in the sense that most people understand a bike and how it functions. When we get a problem with a bike, we understand it, can categorise it and find a proper way to solve it. For any problem, there is a solution and once we have learned those, they are easy to use.

A car engine

Now, imagine driving to work. You have a rather old and cheap car, and you do not bother giving it any service. As long as it starts easily and bring you to work, you are happy. Lately however, starting it has become more and more difficult and once started, the engine is not running well.

A person, handy with cars, might see a car as another simple system. For other persons, a car is complicated and difficult to understand. The line between simple and complicated is not always very clear. A complicated system is still ordered, with a clear connection between cause and effect, but it might not be obvious what is causing a problem. To find a plan for problem-solving in a complicated system, we need to do an analysis and perhaps call for an expert. There may also be more than one solution to the problem. We ask our neighbour, who is very interested in cars. He asks us if we have changed the spark plugs recently. By the look of our face, he determines that we now have a hypothesis and a “Plan”. After a tour to a nearby garage, we start the “Do” and replace the spark plugs. The “Check” part is easy, and with a bit of luck the engine immediately starts and continues running like a purring cat. Problem solved; we can go to work!

I hope it is easy to relate to these stories. We do this all the time; with all kinds of problems we are faced with. We notice them, we think of the possible cause or causes, perhaps ask someone more knowledgeable and develop a plan for fixing the problem. It seems like a method with universal applicability. But what about if we are facing a system that is even more complicated, or complex?

Join me in the next episode…

Are we climbing or descending now?

” – Ladies and Gentlemen, this is your captain speaking. Can you tell me if we are climbing or descending now?

Is this the future for aviation, when the pilot will be on the ground as we fly? This is among the things I have asked myself during the pandemic with regards to automation.

I have attended many webinars during the pandemic, and webinars are really a kind of automation. It replaces a presentation made in a classroom or a conference theater. In stead of being on a scene, the presenter is sitting in an office or even in his/her home with a computer and a powerpoint presentation. This is then delivered worldwide by automation. An audience of hundreds, even thousands are sitting in their homes or offices with a computer, tablet or phone using automation technique to take part of this presentation. There are a few problems though…

” – Can you hear me?” or ” – Can you see my presentation now? Cause I can’t!” In Human Factors this is called being “out-of-the-loop”. That is, automation is doing its thing but does not give feedback to the user. I have seen presenters speaking for several minutes before someone points out that the presentation is still on the intoductory slide…

No alt text provided for this image

Or you read in the chat window; ” –I only see a black screen.” or ” – Is there suppposed to be sound now?” This could be another case of “out-of-the-loop” where it is not obvious if the presentation has started or not. It could also be technical limitations. Like when the presenter suddenly disappears. Sometimes to quickly return; ” – I had to change computer” but sometimes never coming back at all. Then there is the bandwidth. Everyone is talking about it but I’m not sure many understands it. ” – We are having some problems with the bandwidth. If we all turn off microphones and web cameras, it might work.”

No alt text provided for this image

Are we all trained to use this webinar automation? I’ve seen what I believe is examples of poor preparations. There seem to be different ways to set up a webinar, like assigning different roles. Who will, for example be able to share their screen? My guess is that would include the presenter? Instead I frequently hear; ” – It seems I can’t start the presentation?” and then “Oh, I can do that for you!” and “Great! So, next slide please…” and later “I think you are one slide too far, can you show the previous slide?” It can take minutes before they agree on what slide to show.

No alt text provided for this image

And what role will the audience have? Should they use their web-cameras and/or microphones? Once, I heard someone having his mic on while having a loud argument with the spouse at home, thus disturbing a large part of the presentation. Other times the presenter asks ” – Can you hear me now?” and all we can do is to nod – without a camera or a mic….

Many of these webinars is about how things will be “post-COVID”. Very optimistic usually. This pandemic is terrible but also an opportunity. Now we can finally implement all that digital technology and automation. Like the conservative avitaion business, airports and especially air traffic management. ” – The technical solutions are already there, ready to implement!” and ” – If ATM just implement, they could be so much faster, cheaper and better…“.

Sometimes there is someone on the panel pointing out that aviation is safety critical and that could be the reason for being conservative. Using the terminology of Erik Hollnagel; it is about the ETTO – Efficiency-Thoroughness Trade-Off. In aviation it could be a good idea to rather lean towards the thoroughness side of ETTO.

Perhaps you think I’m hostile to technology. But during my many webinars I have also seen fantastic examples. Where the presenter organsation is well organised, all staff well trained, all settings made to suit the format and where technology works spotless. I think they took on the ETTO by being rather thorough.

There will certainly be much more autoamtion coming and bringing a lot of value. Still, my webinar experience tells me that when implementing automation it could be a good idea to be a bit slow, to design with the user experience in mind, to test a lot, to prepare well and to train properly.

Because, even in the splendid webinars I have attended there is the occasional; ” – Can you hear me? And can you see my presentation?” This is not the kind of thing I want to hear on my first pilot-less flight.