Tech Blog: Collecting logs in docker clusters
By Tomislav Levanić on March 29, 2021
This blog post is a part of the Velebit AI Tech Blog series where we discuss good practices for a scalable and robust production deployment.
Formatting logs: Collecting logs: Visualizing logs:
We deploy our microservices redundantly across a cluster of servers, and reading log files on servers directly is not an option. In order to make the log inspection job feasible, we need to collect the logs from all of these machines and send them to one central location.
As discussed in uwsgi json logging, the idea behind using JSON logs is to simplify log collection stack. Instead of using Filebeat, Logstash and Elasticsearch, we can simply use Fluent Bit + Elasticsearch.
Popular log collection solutions
Logstash and Beats
One of the most popular solutions out there, and with good reason - it was built by the same folks that make Elasticsearch. Logstash is the L in ELK stack. It, however, does not do any log collection. For that purpose you can use lightweight data shippers called Beats. In our case, we are interested in Filebeat - a shipper intended to collect log data from files.
Filebeat can process log files of various formats, do basic processing, and finally send data to Logstash or directly to Elasticsearch. Kafka and Redis outputs are also supported. If the processing options provided by Filebeat are good enough for your setup, this sounds like a really solid choice.
Of course, if you at any point get the need to do more complex processing, you can add Logstash in your log collection stack and this will most likely cover all of your processing wishes. But, as we all know, complexity has its negative sides.
It should also be noted that Filebeat must be deployed on every server in the cluster, while Logstash should be deployed separately, on a central location. Separate deployment means that it requires additional resources to run it, which potentially increases the deployment cost.
Unfortunately, for our use case Filebeat processors lack a bit of power so we would have to end up using Logstash and our previous experience with Logstash showed that it requires maintenance we would prefer to avoid.
Also, when using Filebeat without Logstash you basically must use Elasticsearch. Logstash on the other hand offers a lot of choices for outputs. There are more lightweight options out there that decouple you from Elasticsearch as your only log storage option.
Fluentd
Fluentd is a log collector, processor, and aggregator. This makes it an alternative for both Logstash and some Beats. If you are interested in collecting stdout logs from services running inside of docker containers, a neat feature you can use is Fluentd logging driver.
Being more lightweight than Logstash, this makes Fluentd a good choice when you require strong aggregation features. There is one catch though, while Logstash is deployed on a central location, Fluentd also does log collection so you need to deploy it on every machine just like Filebeat. If you truly have needs for strong aggregation, this feels like it could easily end up requiring even more resources than a Logstash solution.
One solution for this problem would be to have a similar architecture required for Logstash - simpler Fluentd instances for log collection without any aggregation sending logs to separately deployed central Fluentd instances performing heavy aggregation. But it still feels like an overkill. Can we do better?
Fluentd + Fluent Bit
An often occurring setup uses Fluent Bit for per-server log collecting and simple processing, with separate per-cluster Fluentd instances. In a similar manner Filebeat and Logstash are supposed to work. Since Fluent Bit requires even less resources than Filebeat, this makes Fluent Bit + Fluend a less resource hungry (but fully featured) alternative to Filebeat + Logstash.
That sounds good, but we skipped a few steps there. What is Fluent Bit?
Fluent Bit
Another subproject under the umbrella of Fluentd, which somewhat explains the combination with Fluentd. As stated on the official web of the project, Fluent Bit is an open source and multi-platform log processor and forwarder which allows you to collect data/logs from different sources, unify and send them to multiple destinations.
Besides sending logs to Fluentd instances, Fluent Bit can also send them directly to various destinations, including Elasticsearch and many others.
Even though Fluent Bit does not have aggregation features on the level of Fluentd or Logstash, it is still quite powerful. Yes, you can actually do basic aggregations using Fluent Bit! Besides aggregations, Fluent Bit also provides an impressive processing system for such a low-resource system.
Here at Velebit, we don’t use complex aggregation features, so the Fluent Bit feature set covers our needs quite well.
Why did we choose Fluent Bit?
Obviously, Filebeat/Logstash and Fluend aren’t the only Fluent Bit alternatives out there. So, why did we pick Fluent Bit? Well, besides being well supported piece of software, there are several important reasons:
- It is extremely lightweight
- Configuration is very simple
- Powerful processing capabilities
- Out-of-the-box integration with docker using fluentd logging driver
Fluent Bit seemed like a great choice for us. It is, however, not the only choice for this job, and you should decide whether Fluent Bit or something else serves you better for your custom deployment.
1. It is extremely lightweight
In order to reduce both CPU and memory requirements, Fluent Bit developers opted to use C programming language, and this definitely shows. You can find the memory usage numbers for all 4 competitors collected after running a simple docker run
without any configuration.
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
ec60d791262d logstash 1.83% 766.9MiB / 15.2GiB 4.93% 9.79kB / 157B 2.65MB / 8.16MB 57
b97a778cdb5e fluentd 0.17% 49.6MiB / 15.2GiB 0.32% 4.5kB / 0B 3.05MB / 28.7kB 13
6b52446b0164 fluentbit 0.05% 3.516MiB / 15.2GiB 0.02% 4.5kB / 0B 0B / 0B 3
9a37bbba7996 filebeat 0.00% 22.48MiB / 15.2GiB 0.14% 5.18kB / 782B 291kB / 8.19kB 13
Besides requiring little resources, lightweight programs also have less moving parts, which leaves less space for bugs and that is a great thing. In the column PID you can see the number of processes the kernel started inside this container, which can be another indicator of the complexity.
2. Configuration is very simple
Before getting started with configuring Fluent Bit - you can find all of the code from this post here. Let’s create a simple configuration for Fluent Bit collecting logs using the fluentd docker logging driver. Fluentd logging driver returns a JSON-like structure
{
"log": "example log message",
"container_name": "/simple_log_creator_1",
"source": "stdout",
"container_id": "527c3810bed057714db1bad3d012567752d1975893a9fe426f299676efe23c1e",
// ...
"SERVICE_NAME": "log_creator"
}
where
log
- the raw string read directly fromsource
source
- the origin of the logscontainer_id
- the ID of the container producing the logscontainer_name
- the name of the container producing the logsSERVICE_NAME
- additional value we can pass using docker logging configuration when creating the container. We use this for convenience and an easy way to differentiate various services. As you can imagine, you can pass various useful information this way.
When parsing log data, we want to do a little bit of processing. For example, let’s rename the key SERVICE_NAME
to a key service_name
. Another thing we want to do is parse log
string. We prepared our services to work with JSON
formatted logs, and now we want to use that.
Below is a configuration achieving this along with outputting logs to stdout - the “trivial” log destination.
Note. All of the files in this example should be in the same folder.
./config.conf
[SERVICE]
# import parsers defined in a parsers file
Parsers_File /configs/parsers.conf
# collect docker logs using fluend logging driver
[INPUT]
Name forward
Listen 0.0.0.0
port 24224
# make SERVICE_NAME key lowercase
[FILTER]
Name modify
Match *
Rename SERVICE_NAME service_name
# try parsing log as json and lift its keys to the first-level
[FILTER]
Name parser
Match *
Parser json
Key_Name log
Reserve_Data On
# send logs to stdout
[OUTPUT]
Name stdout
Match *
As you can notice, we import parsers from file /configs/parsers.conf
. This is a user-defined file where we can define various types of parsers, for example parsing raw nginx logs with regex, etc. For this example, json
parser is the option we want to use.
./parsers.conf
[PARSER]
Name json
Format json
Finally, we want to create a Fluent Bit process to collect logs.
./logging.yml
version: '2.3'
services:
fluentbit:
image: fluent/fluent-bit:1.7.2
container_name: fluentbit
command: "/fluent-bit/bin/fluent-bit -c /configs/config.conf"
ports:
- "24224:24224/tcp"
- "24224:24224/udp"
volumes:
- ./config.conf:/configs/config.conf
- ./parsers.conf:/configs/parsers.conf
The only thing left to do is to test this setup. We will start a simple container running a shell script outputting JSON
formatted string every second.
./service.yml
version: '2.3'
services:
log_creator:
image: alpine:3.12.4
container_name: log_creator
command: >
sh -c '
counter=0
while :
do
sleep 1
echo "{\"message\": \"log line $$counter\", \"something\": \"else\"}"
counter=$$(($$counter + 1))
done'
environment:
SERVICE_NAME: log_creator
logging:
driver: fluentd
options:
env: SERVICE_NAME
You can run this toy-example using commands
docker-compose -f logging.yml up -d
docker-compose -f service.yml up -d
By running docker-compose -f logging.yml logs -f fluentbit
, you can preview the results
fluentbit | Fluent Bit v1.7.2
fluentbit | * Copyright (C) 2019-2021 The Fluent Bit Authors
fluentbit | * Copyright (C) 2015-2018 Treasure Data
fluentbit | * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
fluentbit | * https://fluentbit.io
fluentbit |
fluentbit | [2021/03/29 18:37:10] [ info] [engine] started (pid=1)
fluentbit | [2021/03/29 18:37:10] [ info] [storage] version=1.1.1, initializing...
fluentbit | [2021/03/29 18:37:10] [ info] [storage] in-memory
fluentbit | [2021/03/29 18:37:10] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
fluentbit | [2021/03/29 18:37:10] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
fluentbit | [2021/03/29 18:37:10] [ info] [sp] stream processor started
fluentbit | [0] c0d7e76ab44f: [1617043036.000000000, {"message"=>"log line 0", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit | [1] c0d7e76ab44f: [1617043037.000000000, {"message"=>"log line 1", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit | [2] c0d7e76ab44f: [1617043038.000000000, {"message"=>"log line 2", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit | [3] c0d7e76ab44f: [1617043039.000000000, {"message"=>"log line 3", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit | [0] c0d7e76ab44f: [1617043040.000000000, {"message"=>"log line 4", "something"=>"else", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6"}]
fluentbit | [1] c0d7e76ab44f: [1617043041.000000000, {"message"=>"log line 5", "something"=>"else", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout"}]
fluentbit | [2] c0d7e76ab44f: [1617043042.000000000, {"message"=>"log line 6", "something"=>"else", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout"}]
fluentbit | [3] c0d7e76ab44f: [1617043043.000000000, {"message"=>"log line 7", "something"=>"else", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6"}]
fluentbit | [4] c0d7e76ab44f: [1617043044.000000000, {"message"=>"log line 8", "something"=>"else", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6"}]
fluentbit | [0] c0d7e76ab44f: [1617043045.000000000, {"message"=>"log line 9", "something"=>"else", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator"}]
fluentbit | [1] c0d7e76ab44f: [1617043046.000000000, {"message"=>"log line 10", "something"=>"else", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator"}]
fluentbit | [2] c0d7e76ab44f: [1617043047.000000000, {"message"=>"log line 11", "something"=>"else", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator"}]
fluentbit | [3] c0d7e76ab44f: [1617043048.000000000, {"message"=>"log line 12", "something"=>"else", "container_name"=>"/log_creator", "source"=>"stdout", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6"}]
fluentbit | [4] c0d7e76ab44f: [1617043049.000000000, {"message"=>"log line 13", "something"=>"else", "service_name"=>"log_creator", "container_id"=>"c0d7e76ab44ff43e71d7147a1a514b0ccb40612613f712a9b07f226bfa1956b6", "container_name"=>"/log_creator", "source"=>"stdout"}]
Finally, you can clean up everything by running
docker-compose -f service.yml down --remove-orphans
docker-compose -f logging.yml down --remove-orphans
3. Powerful processing capabilities
Fluent Bit covers a lot of log processing cases with various configurable filters, however, if you need even more control and custom processing, you can use Lua filter.
A good example would be uwsgi
timestamp. uwsgi
allows for microsecond precision timestamps, however, it logs the timestamp as an integer
, the number of microseconds. On the other hand, working with nginx
and Python
, the more natural way would be to report a float
, the number of seconds with arbitrary precision. We used a Lua filter to write a simple script that converts integer
of microseconds to a float
of seconds.
-- uwsgi timestamp with microsecond precision can only be an integer in
-- microseconds, instead of a float number in seconds, with microsecond
-- precision. Here we convert that uwsgi integer timestamp to a "standard"
-- float timestamp
function parse_uwsgi_timestamp(tag, timestamp, record)
if record["source"] == "uwsgi" or record["source"] == "uwsgi-req" then
record["time"] = tonumber(record["time"]) / 1000000
end
return 1, timestamp, record
end
4. Out-of-the-box integration with docker using fluentd logging driver
As you could already see above, we used the fluentd
logging driver without much configuration. This is very convenient and allows you to specify which docker logging driver each of your services use. If you use fluentd
, your logs will get collected, otherwise, they won’t.
There is, however, a negative side to this. fluentd logging driver redirects stdout to fluentd driver, which effectively disables the use of the docker logs
command for services using it. Users of Docker Enterprise can make use of “dual logging”, which allows you to enable multiple docker logging drivers, which you can use to enable the docker logs command together with any other logging driver. You can read about this and other limitations of logging drivers here.
Good news is that as of version 20.10.1 docker CE outputs logs to both stdout and the selected logging driver as described here. A bit different than full dual logging support, as you cannot use arbitrary logging drivers combinations. For example, you cannot use the fluentd + json combination. Instead, you will end up using a fluentd + local driver combination, or any other driver combined with local logging driver. In any case, it is better than nothing. Upgrade your docker!
An alternative solution to this would be not to use the fluentd
logging driver. When working with json-file
logging driver, stdout
is stored in a file, somewhere in /var/lib/containers/*/*.json
. Since Fluent Bit should be deployed to every instance on the cluster anyway, you can simply configure Fluent Bit to parse JSON log files directly on the host. This is a very viable and often used option for production setups.
Another way to collect all the log files from docker would be by using Logspout, but this adds an additional component to your architecture. A very simple component, but still an additional one.
We opted for the fluentd logging driver solution even though at the time docker CE did not support any sort of dual logging. It just works, and in the end, we rarely end up inspecting logs directly on our VM instances anyway.
Stay tuned for Part 4, where we will continue this theme by exploring log and metric visualization options.
Recent Blog Posts
We build AI for your needs
Partner with us to develop an AI solution specifically tailored to your business.
Contact us