Tech Blog: Collecting logs in docker clusters

By Tomislav Levanić on March 29, 2021

In previous blogs from this series we discussed how we formatted uwsgi and Python logs using JSON. Properly formatted logs are, however, useless if you don't have a way of accessing them. Reading raw log files on servers directly is great when working on a small number of servers, but it quickly becomes cumbersome.

This blog post is a part of the Velebit AI Tech Blog series where we discuss good practices for a scalable and robust production deployment.

Formatting logs:

Collecting logs:

Collecting logs in docker clusters

Visualizing logs:

Connecting Elasticsearch logs to Grafana and Kibana

We deploy our microservices redundantly across a cluster of servers, and reading log files on servers directly is not an option. In order to make the log inspection job feasible, we need to collect the logs from all of these machines and send them to one central location.

As discussed in uwsgi json logging, the idea behind using JSON logs is to simplify log collection stack. Instead of using Filebeat, Logstash and Elasticsearch, we can simply use Fluent Bit + Elasticsearch.

Popular log collection solutions

Logstash and Beats

One of the most popular solutions out there, and with good reason - it was built by the same folks that make Elasticsearch. Logstash is the L in ELK stack. It, however, does not do any log collection. For that purpose you can use lightweight data shippers called Beats. In our case, we are interested in Filebeat - a shipper intended to collect log data from files.

Filebeat can process log files of various formats, do basic processing, and finally send data to Logstash or directly to Elasticsearch. Kafka and Redis outputs are also supported. If the processing options provided by Filebeat are good enough for your setup, this sounds like a really solid choice.

Of course, if you at any point get the need to do more complex processing, you can add Logstash in your log collection stack and this will most likely cover all of your processing wishes. But, as we all know, complexity has its negative sides.

It should also be noted that Filebeat must be deployed on every server in the cluster, while Logstash should be deployed separately, on a central location. Separate deployment means that it requires additional resources to run it, which potentially increases the deployment cost.

Unfortunately, for our use case Filebeat processors lack a bit of power so we would have to end up using Logstash and our previous experience with Logstash showed that it requires maintenance we would prefer to avoid.

Also, when using Filebeat without Logstash you basically must use Elasticsearch. Logstash on the other hand offers a lot of choices for outputs. There are more lightweight options out there that decouple you from Elasticsearch as your only log storage option.

Fluentd

Fluentd is a log collector, processor, and aggregator. This makes it an alternative for both Logstash and some Beats. If you are interested in collecting stdout logs from services running inside of docker containers, a neat feature you can use is Fluentd logging driver.

Being more lightweight than Logstash, this makes Fluentd a good choice when you require strong aggregation features. There is one catch though, while Logstash is deployed on a central location, Fluentd also does log collection so you need to deploy it on every machine just like Filebeat. If you truly have needs for strong aggregation, this feels like it could easily end up requiring even more resources than a Logstash solution.

One solution for this problem would be to have a similar architecture required for Logstash - simpler Fluentd instances for log collection without any aggregation sending logs to separately deployed central Fluentd instances performing heavy aggregation. But it still feels like an overkill. Can we do better?

Fluentd + Fluent Bit

An often occurring setup uses Fluent Bit for per-server log collecting and simple processing, with separate per-cluster Fluentd instances. In a similar manner Filebeat and Logstash are supposed to work. Since Fluent Bit requires even less resources than Filebeat, this makes Fluent Bit + Fluend a less resource hungry (but fully featured) alternative to Filebeat + Logstash.

That sounds good, but we skipped a few steps there. What is Fluent Bit?

Fluent Bit

Another subproject under the umbrella of Fluentd, which somewhat explains the combination with Fluentd. As stated on the official web of the project, Fluent Bit is an open source and multi-platform log processor and forwarder which allows you to collect data/logs from different sources, unify and send them to multiple destinations.

Besides sending logs to Fluentd instances, Fluent Bit can also send them directly to various destinations, including Elasticsearch and many others.

Even though Fluent Bit does not have aggregation features on the level of Fluentd or Logstash, it is still quite powerful. Yes, you can actually do basic aggregations using Fluent Bit! Besides aggregations, Fluent Bit also provides an impressive processing system for such a low-resource system.

Here at Velebit, we don’t use complex aggregation features, so the Fluent Bit feature set covers our needs quite well.

Why did we choose Fluent Bit?

Obviously, Filebeat/Logstash and Fluend aren’t the only Fluent Bit alternatives out there. So, why did we pick Fluent Bit? Well, besides being well supported piece of software, there are several important reasons:

It is extremely lightweight
Configuration is very simple
Powerful processing capabilities
Out-of-the-box integration with docker using fluentd logging driver

Fluent Bit seemed like a great choice for us. It is, however, not the only choice for this job, and you should decide whether Fluent Bit or something else serves you better for your custom deployment.

1. It is extremely lightweight

In order to reduce both CPU and memory requirements, Fluent Bit developers opted to use C programming language, and this definitely shows. You can find the memory usage numbers for all 4 competitors collected after running a simple docker run without any configuration.

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT    MEM %               NET I/O             BLOCK I/O           PIDS
ec60d791262d        logstash            1.83%               766.9MiB / 15.2GiB   4.93%               9.79kB / 157B       2.65MB / 8.16MB     57
b97a778cdb5e        fluentd             0.17%               49.6MiB / 15.2GiB    0.32%               4.5kB / 0B          3.05MB / 28.7kB     13
6b52446b0164        fluentbit           0.05%               3.516MiB / 15.2GiB   0.02%               4.5kB / 0B          0B / 0B             3
9a37bbba7996        filebeat            0.00%               22.48MiB / 15.2GiB   0.14%               5.18kB / 782B       291kB / 8.19kB      13

Besides requiring little resources, lightweight programs also have less moving parts, which leaves less space for bugs and that is a great thing. In the column PID you can see the number of processes the kernel started inside this container, which can be another indicator of the complexity.

2. Configuration is very simple

Before getting started with configuring Fluent Bit - you can find all of the code from this post here. Let’s create a simple configuration for Fluent Bit collecting logs using the fluentd docker logging driver. Fluentd logging driver returns a JSON-like structure

{
    "log": "example log message",
    "container_name": "/simple_log_creator_1",
    "source": "stdout",
    "container_id": "527c3810bed057714db1bad3d012567752d1975893a9fe426f299676efe23c1e",
    // ...
    "SERVICE_NAME": "log_creator"
}

where

log - the raw string read directly from source
source - the origin of the logs
container_id - the ID of the container producing the logs
container_name - the name of the container producing the logs
SERVICE_NAME - additional value we can pass using docker logging configuration when creating the container. We use this for convenience and an easy way to differentiate various services. As you can imagine, you can pass various useful information this way.

When parsing log data, we want to do a little bit of processing. For example, let’s rename the key SERVICE_NAME to a key service_name. Another thing we want to do is parse log string. We prepared our services to work with JSON formatted logs, and now we want to use that.

Below is a configuration achieving this along with outputting logs to stdout - the “trivial” log destination.

Note. All of the files in this example should be in the same folder.

./config.conf

[SERVICE]
    # import parsers defined in a parsers file
    Parsers_File /configs/parsers.conf

# collect docker logs using fluend logging driver
[INPUT]
    Name forward
    Listen 0.0.0.0
    port 24224

# make SERVICE_NAME key lowercase
[FILTER]
    Name modify
    Match *
    Rename SERVICE_NAME service_name

# try parsing log as json and lift its keys to the first-level
[FILTER]
    Name parser
    Match *
    Parser json
    Key_Name log
    Reserve_Data On

# send logs to stdout
[OUTPUT]
    Name stdout
    Match *

As you can notice, we import parsers from file /configs/parsers.conf. This is a user-defined file where we can define various types of parsers, for example parsing raw nginx logs with regex, etc. For this example, json parser is the option we want to use.

./parsers.conf

[PARSER]
    Name   json
    Format json

Finally, we want to create a Fluent Bit process to collect logs.

./logging.yml

version: '2.3'

services:
    fluentbit:
        image: fluent/fluent-bit:1.7.2
        container_name: fluentbit
        command: "/fluent-bit/bin/fluent-bit -c /configs/config.conf"
        ports:
            - "24224:24224/tcp"
            - "24224:24224/udp"
        volumes:
            - ./config.conf:/configs/config.conf
            - ./parsers.conf:/configs/parsers.conf

The only thing left to do is to test this setup. We will start a simple container running a shell script outputting JSON formatted string every second.

./service.yml

version: '2.3'

services:
    log_creator:
        image: alpine:3.12.4
        container_name: log_creator
        command: >
            sh -c '
                counter=0
                while :
                do
                    sleep 1
                    echo "{\"message\": \"log line $$counter\", \"something\": \"else\"}"
                    counter=$$(($$counter + 1))
                done'
        environment:
            SERVICE_NAME: log_creator
        logging:
            driver: fluentd
            options:
                env: SERVICE_NAME

You can run this toy-example using commands

docker-compose -f logging.yml up -d
docker-compose -f service.yml up -d

By running docker-compose -f logging.yml logs -f fluentbit, you can preview the results

fluentbit    | Fluent Bit v1.7.2
fluentbit    | * Copyright (C) 2019-2021 The Fluent Bit Authors
fluentbit    | * Copyright (C) 2015-2018 Treasure Data
fluentbit    | * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
fluentbit    | * https://fluentbit.io
fluentbit    |
fluentbit    | [2021/03/29 18:37:10] [ info] [engine] started (pid=1)
fluentbit    | [2021/03/29 18:37:10] [ info] [storage] version=1.1.1, initializing...
fluentbit    | [2021/03/29 18:37:10] [ info] [storage] in-memory
fluentbit    | [2021/03/29 18:37:10] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
fluentbit    | [2021/03/29 18:37:10] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
fluentbit    | [2021/03/29 18:37:10] [ info] [sp] stream processor started
fluentbit    | [0] c0d7e76ab44f: [1617043036.000000000, {"message"=>"log line 0", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit    | [1] c0d7e76ab44f: [1617043037.000000000, {"message"=>"log line 1", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit    | [2] c0d7e76ab44f: [1617043038.000000000, {"message"=>"log line 2", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit    | [3] c0d7e76ab44f: [1617043039.000000000, {"message"=>"log line 3", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit    | [0] c0d7e76ab44f: [1617043040.000000000, {"message"=>"log line 4", "something"=>"else", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6"}]
fluentbit    | [1] c0d7e76ab44f: [1617043041.000000000, {"message"=>"log line 5", "something"=>"else", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout"}]
fluentbit    | [2] c0d7e76ab44f: [1617043042.000000000, {"message"=>"log line 6", "something"=>"else", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout"}]
fluentbit    | [3] c0d7e76ab44f: [1617043043.000000000, {"message"=>"log line 7", "something"=>"else", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6"}]
fluentbit    | [4] c0d7e76ab44f: [1617043044.000000000, {"message"=>"log line 8", "something"=>"else", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6"}]
fluentbit    | [0] c0d7e76ab44f: [1617043045.000000000, {"message"=>"log line 9", "something"=>"else", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator"}]
fluentbit    | [1] c0d7e76ab44f: [1617043046.000000000, {"message"=>"log line 10", "something"=>"else", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator"}]
fluentbit    | [2] c0d7e76ab44f: [1617043047.000000000, {"message"=>"log line 11", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit    | [3] c0d7e76ab44f: [1617043048.000000000, {"message"=>"log line 12", "something"=>"else", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6"}]
fluentbit    | [4] c0d7e76ab44f: [1617043049.000000000, {"message"=>"log line 13", "something"=>"else", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout"}]

Finally, you can clean up everything by running

docker-compose -f service.yml down --remove-orphans
docker-compose -f logging.yml down --remove-orphans

3. Powerful processing capabilities

Fluent Bit covers a lot of log processing cases with various configurable filters, however, if you need even more control and custom processing, you can use Lua filter.

A good example would be uwsgi timestamp. uwsgi allows for microsecond precision timestamps, however, it logs the timestamp as an integer, the number of microseconds. On the other hand, working with nginx and Python, the more natural way would be to report a float, the number of seconds with arbitrary precision. We used a Lua filter to write a simple script that converts integer of microseconds to a float of seconds.

-- uwsgi timestamp with microsecond precision can only be an integer in
-- microseconds, instead of a float number in seconds, with microsecond
-- precision. Here we convert that uwsgi integer timestamp to a "standard"
-- float timestamp
function parse_uwsgi_timestamp(tag, timestamp, record)
    if record["source"] == "uwsgi" or record["source"] == "uwsgi-req" then
        record["time"] = tonumber(record["time"]) / 1000000
    end
    return 1, timestamp, record
end

4. Out-of-the-box integration with docker using fluentd logging driver

As you could already see above, we used the fluentd logging driver without much configuration. This is very convenient and allows you to specify which docker logging driver each of your services use. If you use fluentd, your logs will get collected, otherwise, they won’t.

There is, however, a negative side to this. fluentd logging driver redirects stdout to fluentd driver, which effectively disables the use of the docker logs command for services using it. Users of Docker Enterprise can make use of “dual logging”, which allows you to enable multiple docker logging drivers, which you can use to enable the docker logs command together with any other logging driver. You can read about this and other limitations of logging drivers here.

Good news is that as of version 20.10.1 docker CE outputs logs to both stdout and the selected logging driver as described here. A bit different than full dual logging support, as you cannot use arbitrary logging drivers combinations. For example, you cannot use the fluentd + json combination. Instead, you will end up using a fluentd + local driver combination, or any other driver combined with local logging driver. In any case, it is better than nothing. Upgrade your docker!

An alternative solution to this would be not to use the fluentd logging driver. When working with json-file logging driver, stdout is stored in a file, somewhere in /var/lib/containers/*/*.json. Since Fluent Bit should be deployed to every instance on the cluster anyway, you can simply configure Fluent Bit to parse JSON log files directly on the host. This is a very viable and often used option for production setups.

Another way to collect all the log files from docker would be by using Logspout, but this adds an additional component to your architecture. A very simple component, but still an additional one.

We opted for the fluentd logging driver solution even though at the time docker CE did not support any sort of dual logging. It just works, and in the end, we rarely end up inspecting logs directly on our VM instances anyway.

Stay tuned for Part 4, where we will continue this theme by exploring log and metric visualization options.

Recent Blog Posts

Technical and business lessons from real AI projects

Blog

Model Context Protocol Explained: Create AI Agents That Talk to Your SQL Database