Tech Blog: JSON logging in Python
By Tomislav Levanic on February 24, 2020
This blog post is a part of the Velebit AI Tech Blog series where we discuss good practices for a scalable and robust production deployment.
Formatting logs: Collecting logs: Visualizing logs:
JSON logging everywhere
If you haven’t already, be sure to check out our previous blog post where we discussed how we configured uWSGI to use JSON logging. There was one important detail we practically skimmed over. We assumed application logs would already be JSON formatted. This is, however, not the case for Python, and now we will discuss the exact way our team tackled that specific issue.
Custom-built JSON formatter
Existing solutions
Before starting, we decided that we wanted a solution filling out these requirements
- an easy way to log LogRecord attributes in JSON log using arbitrary keys
- log additional static values, ie.
app_name
, etc. - log values provided as argument
extra
in logging methods
The first thing we did was, of course, a search for other open-source solutions for JSON logging. There are existing solutions out there, however, none of them seemed a good fit for our use case. A good example is python-json-logging which offers a wide variety of features we do not intend to use, and more importantly, fails at our first and second requirement.
Our solution
Finally, we built a custom JSONFormatter allowing us simple configuration by passing the
argument fields
to replace Python’s LogRecord keys with our own, for example:
{
"message": "message",
"time": "created",
"log_level": "levelname",
"process": "process",
"process_name": "processName",
"thread": "thread",
"thread_name": "threadName",
"traceback": "exc_text"
}
where we evaluate values as LogRecord attributes and log them under their keys. Similarly,
we can also pass always_extra
key during formatter initialization to log additional static
values. Just like python-json-logging does as well, we also log values provided as argument
extra in logging methods.
Since we want to use the same logging for all of our Python microservices, we also need a simple way to set up our log formatting. This is where the function useful.logs.setup with sane defaults comes in. This makes JSON logging a breeze.
import logging
import useful.logs
useful.logs.setup()
_log = logging.getLogger(__name__)
if __name__ == '__main__':
_log.info("testing", extra={"app": "AppName", "metric": 1.2})
Edge cases
It is good to have JSON logging in cases you explicitly call logging methods. There are, however, situations where you simply do not catch all possible exceptions and Python writes the exception traceback to stderr. Python provides a mechanism for solving this issue. We do it by overriding a function sys.excepthook which will serve as an alternative “logging” function called in case of uncaught exceptions. Here we can override writing to stderr and instead use our standard JSON logging.
import logging
import sys
def except_logging(exc_type, exc_value, exc_traceback):
logging.error("Uncaught exception",
exc_info=(exc_type, exc_value, exc_traceback))
sys.excepthook = except_logging
This is better, but what happens when it is impossible to catch the exception? Yes, such cases do exist. One example would be an exception during garbage collection. Again, the Python sys module gives us the solution! Simply set the hook sys.unraisablehook in a similar manner as sys.excepthook.
import logging
import sys
def unraisable_logging(args):
exc_type, exc_value, exc_traceback, err_msg, _ = args
default_msg = "Unraisable exception"
logging.error(err_msg or default_msg,
exc_info=(exc_type, exc_value, exc_traceback))
sys.unraisablehook = unraisable_logging
Sadly, we are not done yet. We still have to handle exceptions raised by threading.Thread.run() and as of Python version 3.8.0 we can override the hook threading.excepthook the same way we did sys.excepthook.
import logging
import threading
def threading_except_logging(args):
exc_type, exc_value, exc_traceback, _ = args
logging.error("Uncaught threading exception",
exc_info=(exc_type, exc_value, exc_traceback))
threading.excepthook = threading_except_logging
The final issue is with the Python (as of writing, the latest stable version is 3.8.1)
multiprocessing
module, which implements and calls its own exception handling routine that
writes to stderr instead of calling the sys.excepthook. As the answer on the StackOverflow
suggests, you can “fix” this issue by overriding method multiprocessing.Process.run
which
will manually catch all exceptions and call sys.excepthook. We do not use multiprocessing
in any of our microservices and have thus decided not to implement any solution to
this problem since we haven’t found a clean way to do it.
Besides multiprocessing, all other edge cases mentioned above are handled inside useful.logs.setup.
Request tracing id
When building REST APIs in a microservice environment, it is very important to
have a way of tracing requests across the infrastructure. A simple way to do
it is to save some sort of identification value in the request header set by
the first service in the infrastructure (nginx in our case) and log it in every
service afterward. This requires you to have access to the request at the moment of
logging, which can be a bit impractical, especially when using a framework where
current request isn’t stored in some sort of global variable like Flask does. Since we
use Falcon that has no such implementation and we are running on uwsgi with threads, we
implemented our own useful.logs.context based on threading.local that can be
used to store request tracing id
and is accessible inside log formatter. All that
is left is a simple manner of implementing custom Falcon middleware to automatically
read the request header and store it into useful.logs.context. If you need support
for greenlets, werkzeug provides an awesome alternative implementation supporting
both threads and greenlets along with the way to “clean” the context after each
request since WSGI does not guarantee that threads won’t be reused for
multiple requests. If you are using DataDog you can use Python module ddtrace to
solve this issue.
We still have to explore how to configure nginx for JSON logging and request tracing along with log collecting infrastructure, so stay tuned!
Follow Velebit AI on LinkedIn.
Recent Blog Posts
We build AI for your needs
Partner with us to develop an AI solution specifically tailored to your business.
Contact us