What 2am banana bread can tell us about software engineering

Paradise Wright

December 4, 2024 · 5 min read

Sometimes learnings come from unusual places, and even as developers sometimes what happens outside of work can give us insights into how to be better engineers. The following is a bit of a comedy of errors tale about the dangers of overconfidence and learning the lessons.

Lying awake at 2am, unable to sleep and smelling the bananas in the kitchen getting riper by the minute, I did what any reasonable person would do and attempted to make banana bread. Attempted, of course, because articles are rarely written when something mundane goes as expected.

Banana bread is a relatively easy recipe, which I had done dozens of times before. However when I checked the oven 30 minutes later everything was not right as the bread was stubbornly lying in the bottom of the pan instead of fluffing up into a proud loaf. Clearly, a mistake had been made. And herein is the first lesson -

constant monitoring, and alerting when something isn’t right. If you’re able to detect defects early you can often prevent the problems getting out of control

When running software a sudden lack of (log) volume can be an indication of something seriously wrong, which is often an overlooked symptom when compared to a sudden increase in volume. The underlying assumption however is that, like baking, there is a certain amount of volume expected.

In the case of bread a rise is expected, but only because you know that of course it will! But if it was the first time you had ever set foot in a kitchen, would you be able to tell if it was normal or an indication of a pending calamity? This is the same for metrics;

you need to understand the system before you can tell when something is unusual

So as soon as I was alerted that something was wrong, it was time to debug. Unfortunately an oven door stood between the problem and me, which was vaguely reminiscent of attempting to debug a remote host with nothing more than a VNC connection to the rack.

Since running hands-on diagnostics was out of the question, my only option was to turn to the documentation. When I baked my first loaf I followed the recipe exactly. I carefully measured the ingredients, set the temperature just right, and double checked each step. But after getting familiar with the process I started to rely on the recipe less and less..

When you do a process enough it becomes repetitive, and when you stop paying attention and that is when the mistakes creep in.

Research shows that checklists improve outcomes. Doctors and pilots use them, and so should we as engineers. Going back to the recipe it became immediately obvious that I had missed a crucial ingredient. Now to simply put the fix into production…

In normal circumstances you’d make sure to understand the remediation steps to a problem, and clearly communicate with your stakeholders about the process. In this case I patiently told my stomach that it will probably not get exactly what was promised - but it will just have to finish baking before we can check.

Sometimes solving problems takes time, and sometimes there is nothing you can do about that.

If a server is restarting there is no amount of bash magic to make the platters spin faster or the network interface to come up sooner, but the one thing you can control is the expectations.

Clear communication during the incident helps make sure that everyone is aligned and aware of the timelines and steps being taken

Once it had finally made it out of the oven it was time to reflect on these lessons, because each failure is a learning opportunity, and a good postmortem is about identifying the faults in the system and not assigning blame. It would be easy to say that because it was 2am and I was already sleep-deprived that it was a bad idea to start baking, however a better approach would be to have a post-mortem:

Always ask: how was it possible that the mistake was made in the first place, and how it could be avoided in the future? What factors lead to the situation?

The root causes for this outage of banana bread include the fact a checklist was skipped because it was a routine procedure. Ingredients were missed because they were not stored together. Banana bread was happening at 2am because.. well basically because the desire was not dealt with sooner in the evening.

In this case I made some systemic changes by writing out the recipe and tacking it to the cupboard where I store the baking ingredients so that it would always be immediately visible. I also moved the baking powder to the same shelf as the flour, as they would always be used together.

And so while baking and engineering are sometimes worlds apart, there are lots of things we can learn from each other:

Constant monitoring, and alerting when something isn’t right. If you’re able to detect defects early you can often prevent the problems getting out of control
You need to understand the system before you can tell when something is unusual
When you do a process enough it becomes repetitive, and when you stop paying attention and that is when the mistakes creep in.
Sometimes solving problems takes time, and sometimes there is nothing you can do about that.
Clear communication during the incident helps make sure that everyone is aligned and aware of the timelines and steps being taken
Always ask: how was it possible that the mistake was made in the first place, and how it could be avoided in the future? What factors lead to the situation?

What 2am banana bread can tell us about software engineering

Pair Programming

Code Reviews

The Evolution of Our Release and Quality Process