Anonymize logs produced by docker

tl;dr: The handling with the GDPR-requirements for the anonymization of protocol files that are produced by Docker seems to be a not trivial task. Help is available here in the form of syslog-ng which offers various configuration options as a replacement for syslog / rsyslogd.

Contents

During the last weekend, I made the interesting discovery, how hard it is, to anonymize logfiles, generated by docker. While there is plenty of documentation for the larger webservers (e.g. Nginx or Apache), the number of people, who try to anonymize docker logs seems to be small.

Docker allows you, to configure the logging adapter that is used. By default, all logs are written into json files (adapter: json-file) and you don’t get a chance to modify them in the process. The journald/systemd community seems to be completely ignorant on this topic (even though GDPR is quite a thing…)1

I ended up with syslog-ng which is a dropin replacement for syslog or rsyslogd and provides a good support for both custom filters and rewrite operations. A good introdution on the topic of anonymized logs in syslog-ng can be found on moblog2.

Setup

To separate all docker logs from other system logs, I opted for a custom socket that is used by docker to publish log events. Each event is then rewritten using a regex that replaces the last part of any IP with a zero.

First you need to install syslog-ng and then create a file in /etc/syslog-ng/conf.d which contains the definition (e.g. docker.conf):

Match and rewrite ip

rewrite r_ip {
  subst('\b(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\b',
  "$1\.$2\.$3\.0]", value("MESSAGE"), type("pcre"), flags("global"));
};

# Open local port 1000 for docker logs
source s_net { tcp(ip(127.0.0.1) port(1000)); };
destination d_docker { file("/var/log/docker.log"); };

# Apply chain
log { source(s_net);  rewrite(r_ip); destination(d_docker); };

Now you can enable the logging-adapter by default in /etc/docker/daemon.json:

{
  "log-driver": "syslog",
  "log-opts": {
    "syslog-address": "tcp://127.0.0.1:1000",
    "tag": "{{.ImageName}}/{{.Name}}/{{.ID}}"
  }
}

Note: the tag is optional, but should be configured as otherwise you’ll only get the ID of the docker-container in your logs. Other possible tags are documented3.

As soon as you now restart both dockerdand syslog-ng, the new logfile will be created and any logs written there.

Note: if you start docker container manually, you’ll be notified, that no output is displayed due to the chosen log adapter. You can override the manually both in docker4 and docker-compose5.

Quick excursus: Filtering

As a goody, you can also use the filter(...) operation to filter out logs, that you are not interested in. Filters are applied to fields of the log entry. Some of the available fields are:

  • host(…)
  • message(…)
  • program(…)
# Filter
filter f_foo { not message(".*Foobar.*") };

# Apply chain
log { source(s_net); filter(f_foo); rewrite(r_ip); destination(d_docker); };

Footnotes

Tags

Comments

Related