The secret to being a top programmer: Don’t get stuck

Contents

Programmers have to solve dozens of problems every day. Most of these are trivial issues, but eventually something crops up that stumps you for 30 minutes or more. These add up to multiple hours of painful unproductive time.

The system I use to avoid getting stuck is the same one used by top detectives, diagnosticians and scientists. It’s how they deal with the unknown and get to a solution when time is against them.

A programmer who systematically avoids getting stuck is at least 100𝗑 more productive than one who sits looking confused for 30 minutes, or worse, one who panics and runs random commands late into the night. If you can learn to accurately diagnose issues quickly, it will save you hours every day.

I’ve pulled together a complete list of techniques that I use to solve problems, and roughly ordered them to balance the time they take with the likelihood that they lead to a solution (they eventually get pretty desperate). They aren’t all applicable to every issue, so skip as appropriate.

See if you can guess what the system is before the big reveal at the end.

How not to get stuck on trivial issues

1. Do not ask ChatGPT

Explanation further down.

2. Use your brain

Take a step back and ask yourself if what you’re doing makes sense. You may have some intuition as to what the potential flaws in your code are, and this may lead you directly to the problem. If you jump in too soon you’ll end up wasting time.

3. Find the error message

Find any and all error/warning logs if they exist. The full stack trace is the most useful thing to find.

This is stating the obvious, but I’ve met so many people who don’t do this.

Use the stack trace to find the line of your code that the issue is happening on. Go to that line and look for obvious mistakes.

If your logs don’t capture the error then you must change your logging config and replicate the error, otherwise you are just guessing.

If you don’t see an error message, find a log that indicates what’s happening. Reconfigure your code so that it logs something if it’s successful and logs something else (or even better, throws an error) if it’s unsuccessful. Without logs, you’re completely in the dark.

4. Search your error message verbatim

Never forget to do this, even if you think the error is unique to you. You might get lucky and find the exact solution in the first Stack Overflow answer or GitHub issue. This is a huge time saver in situations where the error message has nothing to do with the actual issue. Otherwise, you will get some clues.

Even if you have no error message, Googling your issue might lead you to an explanation for very common issues, e.g. “error when creating vault approle role id”.

Check the comments too! There’s often a lot of wisdom there.

If nothing comes up, this is also a clue; either you’ve misunderstood what the issue is, or your issue could be a more fundamental one, and what you’re trying to do could be impossible.

Too many results? That’s a clue. Your issue is a generic programming issue (or skill issue), such as mistakenly treating a string as if it’s a list.

Side note: Occam’s Razor

This search may lead you to many theories, but don’t spend too long investigating and trying out each potential fix. You might fall into the trap of thinking you’ve identified the cause, and then spending all your energy trying to build a solution that doesn’t actually work for you.

You should time-box each attempted fix to about 15 minutes to prevent yourself from spending multiple days going down a particular rabbit hole.

Avoid overcommitting to any theory until you’ve tried all the simplest and quickest ideas. Occam’s Razor states that the simplest solutions are almost always the best, and this is especially true of programming.

5. Replicate it locally

This is essential to narrow down the scope of the problem and speed up the debugging process. If you replicate it locally you know it’s in the code, if not, you know it’s in the environment.

If you don’t do this, you’ll have to wait for your deployment pipeline to finish every time you want to test a change. This could easily be slowing you down 100𝗑 (time to test code = 10s, time to deploy and test = 15 minutes). That’s 17 minutes to try 100 different things, VS over 24 hours.

6. Read docstrings and trace through the source code

Go to the source code for the functions you’re running (The PyCharm shortcut command + b does this) and read the docstrings. Use Ctrl + f to find where the variables you’re providing get used.

The aim now is to make sure that you haven’t misunderstood what your code does, and rule that out as the possible cause.

7. Check values and types

Print out your variables and their types, e.g. print(foo, type(foo)), or use the debugger. Look for anything unexpected or unfamiliar to you.

8. Catch unexpected behaviour and fail loudly

Add assert statements for all your assumptions, e.g. assert data is not None. This can really help if your codebase is huge and functions are getting called in many places.

9. Check for typos

Or just check you used the right names for everything, especially file paths.

10. Lint your JSON

Similar to typos, it’s easy to miss a mistake in a JSON file. If you have JSON inputs or outputs, I recommend pasting them into jsonlint.vearne.cc.

11. Rule out brand new bugs from imported packages

Look up the packages you’re using and see if they have had any new releases recently. If a new version of a package was recently released, try downgrading. I often try this first when my code suddenly breaks when there haven’t been any changes.

For any containers, use their lts or stable tag instead of latest.

12. Rule out all bugs from imported packages

Similarly, look up the source repos of each 3rd party package you use and look at recent issues. Someone else might have just raised the same issue you have.

13. Uninstall and reinstall everything

Create a fresh new virtual environment, or if it’s a node.js issue, delete node_modules, then reinstall.

This might feel quite drastic, but I suggest doing this quite early on as it seems to solve 99% of all the most confusing issues I come across.

14. Upgrade everything (to a stable version)

Many issues happen simply because we’re using a really old version of something, and the reason why we can’t find any help online is that virtually nobody else still uses it.

If this feels too drastic to you, consider that you’ll likely face more issues in the future if you continue to use outdated versions of things.

15. Rule out service outage

If your code relies upon an API hosted by a 3rd party, check their status page or Downdetector page. If more than ~5 people are having issues, then you can relax. Make a cup of tea and hope they’ve resolve it quickly.

16. The eyeball test

If you haven’t narrowed down the issue to a single line of code yet, print absolutely everything out and trace it through by eye to make sure it’s doing what you expect. This is where explaining your code to a rubber duck can help too, although I prefer to use a colleague.

If your code is too complex for this, you might need to visualise what your code is doing in some other way. Try outputting some metrics to a CSV and plotting it.

17. Check the problem isn’t you

Make sure the code you’ve written is similar to the current recommended usage. You might have inherited code from an old project, or copied code from ChatGPT that is very outdated and is no longer compatible with the version of the library you’re using. Take a look at official examples, and disregard any StackOverflow answers that are over 5 years old.

This is likely to happen with popular packages like Pandas or SQLAlchemy. They’ve all had major breaking changes in recent years.

18. Enable verbose logging

If you still haven’t narrowed down the problem, enable verbose logging. This is done differently depending on the library/program, so Google how to do it if you don’t already know. If you’re testing your own code, add loads of extra logging.

You might want to configure your logger to show the file name and line number with every log.

19. Time to think again

At this point, you’ve spent a large chunk of time trying to solve this issue, and it’s time to consider if there’s a quicker, easier solution to the problem your code is trying to solve.

Do you actually need this code to work? Could you hack together a solution with a manual change or brute force? Is there a simple workaround? Did the client ask for something that is far more complex than what they actually need.

Ask yourself “why is this so difficult?”.

20. Atomic tests

Create stand-alone scripts to run the specific section of code that you have a problem with locally (also, turn this into a unit test afterwards). You may need to refactor your code and make it more functional in order to achieve this.

The aim is to systematically rule things out until you’ve narrowed down the problem to a single possible explanation. Don’t make the mistake of trying things at random and hoping that everything magically starts working.

21. Rule out missing permissions

If you’re interacting with a service that requires authentication, it’s best to double check your role/permissions. Some APIs can unhelpfully return 404 for requests that lack the proper permissions.

Make sure your credentials have been refreshed recently (log in again), and your IP address is on any allow lists required by the service.

22. Ensure you’re using the right identity

Are you signed in with the right account, or using the right API key for this environment, or is the service account being used the one you think it is?

23. Attempt to rule out unknown network rules, just in case

Network admins will often set up rules to block requests that look like they include common User-Agent headers from Python packages (e.g. python-requests) to prevent hackers from using scripts. Try overriding the header.

It’s good to know this one so that it’s not an unknown unknown (I talk about these later). This has actually come up twice in my career, and stumped me for at least an hour both times.

24. Rule out malformed requests

Attempt to run the same API request with curl or Postman. This will tell you if your Python client is doing something different.

25. Rule out environment variables

Print your environment variables with env. Make sure there’s nothing that could be interfering with your code. Similarly you may need to check any secrets that get loaded.

26. Rule out cached results, randomness, and race conditions

If you don’t see your error every time, then that’s a clue that it’s probably caused by one of these.

Ensure that nothing is being cached. Clear all caches (particularly in browsers).

If your code uses random number generators, check that they behave as expected (you may need to set or change the seed).

Make sure, if you have any asynchronous functions, eventually consistent databases, or anything that could vary based on the speed of your code, that you aren’t experiencing a race condition.

27. Remove anything else that could cause strangeness

Here’s my attempt to list everything that causes extremely strange bugs:

  • Make sure venv folders aren’t included in build directories
  • Rename files that have the same name as Python modules (e.g. “csv.py”), because they mess with imports
  • Check that your code isn’t affected by timezones or daylight savings time
  • Make sure none of your variables are secretly generators or ‘getter’ functions that return new values every time instead of the variable
  • Try disallowing redirects when making API requests

28. Read the docs

At this point, you may be out of ideas. You’re only going to make progress if you can gather new information that leads you to a new hypothesis.

Sit down, get comfortable, and actually read the documentation end-to-end. The more core knowledge you have about a tool, the easier it will be to think of solutions. The answer could be hidden in there somewhere.

API references can contain more information than Python docstrings.

READMEs can contain more info than wikis/official docs. Slack channels can contain even more key information.

The docs could be full of red herrings, and could be outdated.

29. Test all possible scenarios

It might seem like we’ve ruled out every possible cause, but if we don’t know what the issue is yet then that can’t be true.

Remember, there’s only two possibilities at this point:

  1. There is an obscure and interesting cause of this issue that will be extremely satisfying to find and fix. It’s also becoming increasingly likely that it’s a combination of multiple obscure things
  2. You have fundamentally misunderstood the issue

Write any unit tests that you’re missing until you can start ruling out any unexpected behaviour. You may need to simulate different environments. Here it can help to profile your code and add even more logs.

30. Ask a colleague

A fresh pair of eyes could reveal a new path to a solution. Also, people like being asked to help solve a mystery.

Side note: The XY Problem

When asking for help, many programmers make a mistake called The XY Problem. See xyproblem.info (it’s a 2 minute read).

It’s important to share what you know about your problem, not your current theory, and it’s just as important to explain why and what your end goal is. There could be a much better way of doing it.

31. Ask a more experienced colleague

With more senior developers, there’s a higher chance they’ve seen the exact same issue before.

32. Now it’s time to ask ChatGPT

If the docs turn up nothing, ask an LLM, because they have read all of the docs. There could be a single line in some random page somewhere that you would never be able to find. Be cautions with LLM results though, because they tend to give completely made up solutions rather than be honest and tell you “I don’t know”. You should avoid doing this before trying to fully understand things, otherwise you won’t know if the AI is just making stuff up and you’ll waste time.

Worse than this though, if you don’t understand the problem then you’ll be sent further in the wrong direction. Contrary to popular belief, there is such thing as a bad question, and stupid questions get stupid answers. An LLM will never tell you that you have the wrong approach or that you’ve misunderstood the issue. This is why you should avoid asking the LLM until you’ve exhausted all other options.

33. Call your IT service desk

If you are forced to use systems managed by a separate IT or platform team you may be able to get answers from them. You might feel weird about e-mailing someone you don’t know, but you don’t want to risk waiting indefinitely for a solution in the case where they hold the only solution to your problem. You may find that when you type out your solution you realise there are parts of your code that you don’t fully understand, or things that you aren’t 100% sure about because you haven’t tested them. The support request is acting like a rubber duck and helping you reflect on the issue.

Be sure to include the full context with your request. The first thing the IT person is going to think is “They haven’t even tried Googling it”. Tell them the exact error message, line of code, library versions, etc., and let them make their own conclusion. Also mention how long you’ve already been stuck on this, and how important it is.

34. Go back to fundamentals

Things are getting desperate. It’s time to rule out all unknowns.

Trace through the entire application and find gaps in your knowledge. Read all the docs again, then read Wikipedia articles, textbooks, and the source code of 3rd party packages. Your aim is to understand everything that goes on in your application at a fundamental level.

35. Repeat

At this point, the solution is probably an unknown unknown. You can’t get to it because you don’t even know the right question to ask. Your base assumptions may be wrong.

The only way you’re going to get to a solution is by first gathering more information.

Don’t despair, there’s actually a method to this (one we’ve been following this whole time):

  1. Search for clues/information
  2. Hypothesise
  3. Experiment to rule out possible causes (test the null hypothesis).
    • Rule out you misunderstanding what’s happening
    • Rule out your tools or environment being the issue
    • Rule out inputs being the issue
    • Rule out individual pieces of functionality
  4. Search for unknown unknowns: check assumptions; gain understanding
  5. Repeat

This is the scientific method, which is the mystery system I was referring to at the start. It’s a system that is guaranteed to get you to an answer, as long as you can keep collecting evidence and testing hypotheses.

Step 3 is key; we must focus on ruling things out, not on magically stumbling upon an answer.

Focus on the clues along the lines of “this works and this doesn’t and we don’t know why”. This is the most useful type of clue. e.g. “the connection to this service works and this one doesn’t and they are mostly the same.”

36. Time to give up

This does not mean you quit your job and retrain as a baker. It’s time to go back to the drawing board and design a new solution to whatever your code was meant to do.

--

Matt Simmons is a Principal Data Engineer at Datasparq.

Read more

Call us when you're ready