Better Logging in Python

 

or as I like to call it: Kurt Loblog's Log Blog

 

TL;DR

There are countless log blog posts out there about the benefits of good logging, how to log well, and how much to log. Going through them all can be a real log blog slog. Wouldn't it be cool if you could log like this:

logger.info("Something happened!", job=job.id, user=user.id)

And get an easily searchable log line like:

something_log_w_border.png

What if I told you: you can using Python!

 

Why This is Important

Good application logs can make a world of difference when tracking down bugs. This is especially true when you run into a bug that happens rarely enough that it's very difficult to reproduce. Almost every mature programming language has a robust logging system. Unfortunately these logging systems can often be fairly complex to configure. Some log blog posts suggest logging in JSON. This can be very powerful if you are using a log management system. It also makes your logs more voluminous. If you aren't using a log management system, it can make your logs look very ugly.

Two commandments from The 10 Commandments of Logging stand out to me as particularly important for making logs easy to search through:

  6. Thou shalt log with context

  7. Thou shalt log in machine parsable format

If you log with context, then you have something to search for so you can limit your search to lines that are relevant to the job, event, or user you're tracking down. If you log in machine parsable format, then you can use tools to do the searching for you. These two also stood out to me as things that could be made easier with the right tools. This blog post describes the techniques and tools that we use to make logging painless while keeping the resulting logs searchable.

Here’s How We Make Good Logging Easier

For the CCDL's refine.bio project, I wanted to make sure that all the logs that we generated across our distributed system were consistent and searchable. However we don't use a log management system for all of our logs, so I didn't want to log in JSON. I also wanted to be able to easily add the identifier of whatever entity I was tracking down into the context for each log message. After searching for python libraries that would let me do this I found daiquiri. Daiquiri's extras feature was nearly the perfect straw with which to stir my logging drink:

import logging

import daiquiri
import daiquiri.formatter

daiquiri.setup(level=logging.INFO, outputs=(
    daiquiri.output.Stream(formatter=daiquiri.formatter.ColorFormatter(
        fmt=(daiquiri.formatter.DEFAULT_FORMAT +
             " [%(subsystem)s is %(mood)s]"))),
        ))

logger = daiquiri.getLogger(__name__, subsystem="example")    
logger.info("It works and log to stderr by default with color!",    
            mood="happy")

ouptuts:

extra_output_w_border.png

On every call to the method you can pass additional context with a keyword parameter. Pretty cool! Unfortunately, as you can see it requires you to specify what keyword parameters you want to accept when you configure your logger. In fact you have to pass all the parameters it is expecting or else you'll end up with an exception, and an exception is the last thing you want when you're logging about an exception!

A logging exception logging an exception logging an exception logging an exception…  Source: https://media.giphy.com/media/xlTwaFb20TVjW/giphy.gif

A logging exception logging an exception logging an exception logging an exception…
Source: https://media.giphy.com/media/xlTwaFb20TVjW/giphy.gif

However with a little bit of work I was able to extend this behavior by adding the ExtrasFormatter and ColorExtrasFormatter:

import logging

import daiquiri
import daiquiri.formatter

daiquiri.setup(level=logging.INFO, outputs=[
    daiquiri.output.Stream(formatter=daiquiri.formatter.ColorExtrasFormatter(
        fmt=(daiquiri.formatter.DEFAULT_FORMAT +
             " [%(subsystem)s is %(mood)s]" +
             "%(extras)s"),
        keywords=['mood', 'subsystem'],
    ))])

logger = daiquiri.getLogger(__name__, subsystem="example")
logger.info("It works and log to stderr by default with color!",
            mood="happy",
            arbitrary_context="included"
            )
advanced_narrow_w_border.png

This Also Supports JSON Logging

It's also possible to tweak some parameters to log your context as JSON*, which was an idea mentioned in the aforementioned 10 Commandments log blog post:

import logging

import daiquiri
import daiquiri.formatter

daiquiri.setup(level=logging.INFO, outputs=[
    daiquiri.output.Stream(formatter=daiquiri.formatter.ColorExtrasFormatter(
        fmt=(daiquiri.formatter.DEFAULT_FORMAT +
             "%(extras)s"),
        keywords=['mood', 'subsystem'],
        extras_prefix=' {',
        extras_template='"{0}": "{1}"',
        extras_separator=', ',
        extras_suffix='}'
    ))])

logger = daiquiri.getLogger(__name__)
logger.info("It works and log to stderr by default with color!",
            arbitrary_context="included",
            using="json"
            )

outputs:

extra_json_w_border.png

* While it will be valid JSON, numbers will still be represented as strings.

Using the Standard Python Library

Note that if you don't want the overhead of daiquiri, you can get close with the standard python logging library:

import json
import logging

class StructuredMessage(object):
    def __init__(self, message, **kwargs):
        self.message = message
        self.kwargs = kwargs

    def __str__(self):
        return '%s >>> %s' % (self.message, json.dumps(self.kwargs))

_ = StructuredMessage   # optional, to improve readability

logging.basicConfig(level=logging.INFO, format='%(message)s')
logging.info(_('message 1', foo='bar', bar='baz', num=123, fnum=123.456))

As you can see, there's still some configuration required. You will have to construct a message object in every logging call, and you don't get the other benefits of using daiquiri (such as color support).

Other Log Blog Nuggets

If you're curious how we log in refine.bio, every log message uses the logger configured in our logging module. If you take a look you'll see we do a few other things in there too:

  •  We have a moderately complex log format that we stick some always-present context into.

  •  We unconfigure the root logger.

  •  We read the minimum log level that should be emitted from an environment variable.

  •  We send our logs to sentry.io (which actually automatically parses out our context even though it's not JSON. Pretty cool.)

Conclusion

In short: write good logs! They can really make a difference, especially if you make them searchable by adding as much context as possible in a machine parsable format!

Kurt Loblog’s Log Blog Job Flog

By the way…. we’re hiring!!!! Take a look at our careers page for a list of openings and how to apply for them.

 
Kurt Wheeler