Disclaimer: This isn’t going to be a magic method that will solve every problem you’ll ever find. It’s nothing especially new, and shouldn’t be a big surprise to anyone. It’s just another tool in my arsenal, and the simplest way I have of making a big problem into a small problem.
How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?
— Arthur Conan Doyle, The Sign of the Four, 1890
This is a reframing of advice that’s given in a hundred different spaces, to solve a hundred different types of problem: break that problem down into smaller pieces. Specifically, breaking down pieces of a problem that aren’t actually part of the problem.
This sort of approach can be found and applied to various types of problems - software development is my main use, but I apply it when fixing electronics, fault finding with my car, or anything complex enough that I can’t see the entire problem space at once.
So, how do we go about it?
Step One: Identify the Problem Space
The first step is to identify the maximum scope the problem might be within. In the example of a fault with your car, it’s reasonably obvious: the whole car. For a software problem, you’ll need to identify all of the system components that could be involved. This can include all of the product you’re working with, plus any clients, hardware or software involved in the process.
Take care not to understate the scope of where the problem may lie. A taillight not working could well be a control system problem, or a weird glitch occurring seemingly in a backend web system could be caused by a bug in the way web browser is sending data to the service.
Reduce the scope
The next step is to find a part of the problem space that can be removed. The trick here is to find a simple check to perform that can rule out a large portion of the space.
For our car taillight example, there are some very simple checks to run, like checking bulbs and fuses - the ease of testing these makes them a good first bet, but consider some other checks you can make. If you’ve got a multimeter, find an access point to check voltage - if power’s getting to the light cluster, you’ve just ruled out most of the systems in the car very quickly.
For web projects, I’d normally start by splitting into frontend and backend. If you can check the data is coming through correctly from your webservices, you can exclude the whole backend as the source of the problem.
Check the problem is still in scope
Consider this a “checking your working out” step. Once you’ve reduced the problem space, you should make sure that it still actually contains the problem. Mis-scoping the problem can arise in different ways. You might’ve just made a mistake - tested the wrong wire, checked the wrong value… Or the problem might be created from multiple sources - if a bottleneck with the network exacerbates a bug in the backend, removing either half of the space can cause the problem to disappear.
It can be a little awkward to do this in some scenarios - you may need to come up with an alternative way of powering the components in question, or out together some unit tests that handle the specific case (you do have a testing framework in place, right?) I don’t blame you if you skip this step, but if you can, it’ll avoid a lot of wasted time.
And importantly, don’t despair if this happens: you’ve almost certainly learnt something new about the problem.
Can you reduce the problem space any further?
This one requires a little judgement. “Can you” is also “should you” and “is it useful to”. If you’ve pulled a blown bulb, you probably don’t need to identify if it’s the filament or the base that’s causing the problem, given you’ll be replacing the whole thing.
In a complex software issue where there are many layers to work with, you’ll probably find it useful to iterate a few times to identify the layer that’s the issue, before you switch to another technique. But if you’ve got a decent debugger, you might find it easier to find the component with the issue, then drop straight into code. It will depend on the problem in question, as well as your personal skills and preferences.
Caveats
It’s useful, for example, to categorize your target as a sorcerer, if he is one; but if you get too attached to your category it’ll leave you embarrassed when he suddenly pulls a knife on you.
— Steven Brust, Issola (2010)
The biggest downside to this process is that we can very easily push the root cause outside our problem space by simply misdefining something.
“There’s clearly an electrical fault, so we can ignore mechanical faults.”
“The front-end isn’t throwing any errors, so it must be the backend.”
Always keep in mind that problems can be more complex than they first seem. Whilst it’s useful to identify problematic components, if you assume that, because one component is broken, nothing else is, you can waste a lot of time fixing the wrong issue. Fuses are the best practical example - they’ll often be the current cause of the problem, but they’re rarely the actual root cause.