Problem Solving: Mindset, Techniques & Tools

I was about five years old when my Uncle introduced me to trick questions. My earliest problem-solving memory was when he drew two lines of equal length in my notebook and asked me if I could make one line longer than the other. I picked the pencil and increased the size of one of the lines, and just like any other preschooler, I counted it as a considerable achievement; his next question was, “Can you do it without using a pencil?” (Don’t stress out, this is not a story of a gifted child 😁) Well! This question left me confused; No, I didn’t have an answer for him. After a couple of minutes, he picked an eraser and reduced the length of one of the lines. Woohoo! Once again, one line became longer than the other. The next thing he showed me was the Muller-Lyer illusion, where one line appears longer depending on its surroundings.

What Is the Muller-Lyer Illusion?

It was nothing short of a magical experience for me. And with that, a lesson stuck with me for life that there is more than one way to solve any problem, and the solution dramatically depends on the availability of tools and the constraints that are present in the environment.

As the title suggests, in this post I will discuss some techniques & tools but above all, the importance of critical thinking and the right mindset required to find an effective solution for any given problem. Before we jump onto the next section, a quick disclaimer: although the problem-solving processes, techniques, tips & tricks are pretty generic, some of the examples used in this post are specific to computer engineering. I have no experience in applying these techniques to complex problems like “Global Warming”, “World Hunger”, or “Pandemic Situation”.

You cannot fix a problem that you refuse to acknowledge

I strongly agree with those who have written on this subject before me that “Defining the problem” is indeed the first and the most important step in this process, but what if I tell you that there is a step-0 that is most crucial of all? It is “Acknowledge that the problem exists“. But…

Ever heard of “Pontiac is allergic to vanilla ice cream”? Yes, this is what a Pontiac customer reported to General Motors. Interestingly this was a legit complaint!

…This is the second time I have written you, and I don’t blame you for not answering me, because what I have to say sounds kind of crazy. […] every time I buy vanilla ice cream, when I start back from the store my car won’t start. If I get any other kind of ice cream, the car starts just fine...

… the answer: vapor lock. It was happening every night, but the extra time taken to get the other flavors allowed the engine to cool down sufficiently to start. When the man got vanilla, the engine was still too hot for the vapor lock to dissipate…

Source: https://www.cse.psu.edu/~deh25/cmpsc473/jokes99/joke09.html

This problem got resolved because Pontiac’s president and the engineer(s) were in acceptance mode. That, my friend, is a problem solver’s mindset!

Users are also human beings, they can make mistakes; however, you still need to acknowledge the problem and provide evidence to prove that it was a user error. Coincidentally we encountered one such issue this week. A customer raised a security concern that an email was being sent to an unintended recipient from our service; it turned out to be a misconfigured group email account.

If you want to solve problems but don’t know where to begin?
First OBSERVE, then ASSIST before you LEAD

Optional: watch Air Crash Investigation

Problem-solving in a nutshell

The Information Technology Infrastructure Library (ITIL) defines a problem as “the unknown cause of one or more (potential or occurring) incidents” where an incident is “an unplanned interruption or reduction in the quality of an IT service (a Service Interruption)“. This phrase implies problem solving involves finding that unknown cause and fixing it so that the said incident doesn’t happen again. Problem-solving is a skill that can be learned and applied to numerous domains. I see it as a loosely coupled, overlapping, multi-stage process

Acknowledge

We have already discussed this in detail in the previous section, yet if I have to put it in one line, then it is simply the opposite of anything that you see on programmingexcuses

Define

One of the greatest minds in human history, Albert Einstein said: If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute resolving it.

Why is there so much emphasis on problem definition and root cause identification? Believe it or not, knowing the root cause reveals the solution in many cases.

An ideal problem statement includes the current state and the expected state; the gap in between is filled using root cause analysis and gap analysis.

At this stage, it is essential to distinguish between a problem and a symptom and how a workaround is different from a solution. Allow me to explain this with the help of an old friend, “OutOfMemoryError”, which possesses the power to terminate a java thread or application. When this happens, a background job might abnormally fail, or an end-user might see a 5xx error code. Here “OutOfMemoryError” is the problem, and abnormal termination of a thread or application, failed API request, or termination of background job are all symptoms. There are about 7 JVM level technical reasons that can result in an “OutOfMemoryError”. It is possible that the application genuinely requires a larger heap; if so, the solution involves increasing the max heap or adding more resources to the underlying hardware. A thread leak or memory leak can also cause this error, and if it is a slow leak, then a daily deployment, which creates new application instances, might prevent this error. Here, thread leak or memory leak is the root cause, and the daily deployment is a workaround, not a solution.

Telling users to hold the phone differently for an underlying hardware reception issue is a workaround, not a solution.

Gather

To stay on the right track, you should be aware of what you don’t know that you need to know, for analysis. Gathering data shouldn’t be viewed as a bounded stage; it is a perpetual task executed concurrently from Problem Definition to the Monitoring stage. An application log is your best friend, provided the application is

  • generating desired logs with an appropriate logger level
  • not failing silently
try {
	// do something
} catch (Exception ex) {
	// do nothing 
}

☝️ is an invitation to misery; at the same time, logging sensitive data, e.g. password, access token, is disastrous.

Analyze

Okay, root cause analysis is the key, but how do we do it? Surprise! There are many tested and trusted tools: Ishikawa fishbone diagram, 5 Whys, Kepner and Fourie, Fault tree analysis

Learn and Practice!

This is the stage when you poke your creativity; a 404 (Not Found) can be an outcome of missing access permission. Corrupt data due to a missing database constraint may cause a NullPointerException. A 502 (Bad Gateway) is not necessarily a server capacity issue…

Analyze the data not only to find the root cause but also to list the possible solutions. And I must say, if you have only one solution for the problem, you are not thinking thoroughly.

Note: “Won’t fix” & “Retire the problematic feature” are valid options.

Converge

Once you have discovered the root cause and identified all the possible solutions, it is time to pick, Ummm, not necessarily the best, but the most pragmatic solution. Don’t worry; there is another powerful tool to rescue, i.e. Six Thinking Hats. This tool is generally meant for a group, but I practice this even in isolation. The objective is to train your brain on perspective thinking without conflicts.

Implement

Nike! I mean, Just do it!

Monitor

Something that works in a development or test environment can behave very differently in production. Always ensure that the solution is not causing any side effects.

Remember to celebrate every win!

Sharpen your saw

Complex problems require extensive analysis and often require knowledge of specialized tools like memory analyzer, application and database profiler, APM (Application Performance Monitoring), remote debugger, lighthouse, query execution plan… All of these tools are built by developers for developers with the sole purpose of improving the problem-solving experience.

Learn and Practice!

The law of the vital few

The Pareto principle is quite popular among trouble-shooters; I have experienced personally that roughly 80% of consequences come from 20% of causes.

Here are some real query stats from a production environment

The report indicates that 20% of distinct queries contribute to 80% of the load on the database. So if you are interested in reducing the database load, focus on the top 20% queries.

I have noticed this pattern everywhere, from static assets download count to API requests to log messages even in errors & exceptions… literally everywhere. Keep a close eye on the top 20% to avoid 80% of the problems.

Importance of clear and consistent communication

I vividly remember this decade-old incident when one of my colleagues (with non-technical background) called me for help to fix a demo. I suspected that one of the services was not running, so I asked them to run telnet in the terminal and share the output. I had to spell the command verbally… telnet localhost 9999… this conversation was quite challenging for both of us; within 5 minutes, I realized what support engineers go through every day… Alas! I was speechless for a few moments when I looked at the screenshot of the output; it had

Lesson learned: Choose the mode of communication and words wisely; it can save precious time in critical situations.

Every piece of information should be complete in every sense: add timezone to the chronological order, put units next to duration & data size, specify operating system while sharing commands, mention version with library name… put those typing muscles to work.

And while working on problems, avoid using acronyms, slang, emojis, and words that require dictionary lookup. A broken communication introduces delays and may even result in more problems.

Keep reminding yourself that NASA’s Mars Climate Orbiter burned up in the Martian atmosphere because engineers failed to convert units from English to metric.

What about emotions

Someone rightly said that emotional intelligence allows us to respond instead of react.

If you want to solve a problem and not be a part of it, get rid of one question from your probe list- “Who did it?”, unless it is a legal matter. In my experience, the question “Who did it?” helped in <1% of the situations; 99% of the time, this question introduced resistance, friction, and opacity. To establish a collaborative environment, ask graceful constructive questions:

  • What’s next?
  • How can I help?
  • Why did it happen?
  • What are the options?

Trust me; the human aspect can make the problem-solving process tricky. You may find yourself in situations where the solution requires buy-in from multiple stakeholders or approval from a third party… Once again, data is your friend but first, know your audience and then present the data from their perspective. Now, what if there is a key stakeholder who is not a domain expert yet offers to participate in problem-solving? I will share a story from this thread here and leave it to your creativity 😉.

A story from the time recounts that Piero Soderino, the head of the powerful Florentine Republic, even told the famously irascible Michelangelo that David’s nose was much too large. Michelangelo then hid some marble dust in his hand, climbed back up his ladder, and pretended to do some more “chiseling” on the offending proboscis. While he did so, he let some marble dust fall from his hand. The pompous Soderino was fooled – he examined the unchanged nose and announced it was much improved and far more “life-like.”

Summarizing realistically

“If you can’t solve a problem, then there is an easier problem you can solve: find it.”

― George Polya, Mathematical Discovery on Understanding, Learning, and Teaching Problem Solving, Vol I

Keep calm, enjoy boba tea and solve problems!

Posted by Ruhi Hira

Senior Software Architect