Skip to content

Commit

Permalink
Add email anonymizer option in case is needed for GDPR
Browse files Browse the repository at this point in the history
This new feature will anonymize emails in the Postfix logs. This allows
you to keep them indefinetely while being compliant with GDPR.

Based on excellent work on [this pull request](#91).

Check `README.md` for more details.
  • Loading branch information
Sergio Del Río Mayoral authored and bokysan committed Dec 7, 2021
1 parent 370c126 commit a1a2082
Show file tree
Hide file tree
Showing 13 changed files with 640 additions and 8 deletions.
6 changes: 3 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,9 @@ COPY /configs/supervisord.conf /etc/supervisord.conf
COPY /configs/rsyslog*.conf /etc/
COPY /configs/opendkim.conf /etc/opendkim/opendkim.conf
COPY /configs/smtp_header_checks /etc/postfix/smtp_header_checks
COPY /scripts/*.sh /
COPY /scripts/* /scripts/

RUN chmod +x /run.sh /opendkim.sh
RUN chmod +x /scripts/*

# Set up volumes
VOLUME [ "/var/spool/postfix", "/etc/postfix", "/etc/opendkim/keys" ]
Expand All @@ -77,4 +77,4 @@ WORKDIR /tmp
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 CMD printf "EHLO healthcheck\n" | nc 127.0.0.1 587 | grep -qE "^220.*ESMTP Postfix"

EXPOSE 587
CMD [ "/bin/sh", "-c", "/run.sh" ]
CMD [ "/bin/sh", "-c", "/scripts/run.sh" ]
56 changes: 56 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Simple postfix relay host ("postfix null client") for your Docker containers. Ba
* [POSTFIX_mynetworks](#postfix_mynetworks)
* [POSTFIX_message_size_limit](#postfix_message_size_limit)
* [Overriding specific postfix settings](#overriding-specific-postfix-settings)
* [ANONYMIZE_EMAILS](#anonymize_emails)
* [DKIM / DomainKeys](#dkim--domainkeys)
* [Supplying your own DKIM keys](#supplying-your-own-dkim-keys)
* [Auto-generating the DKIM selectors through the image](#auto-generating-the-dkim-selectors-through-the-image)
Expand Down Expand Up @@ -331,6 +332,61 @@ Any Postfix [configuration option](http://www.postfix.org/postconf.5.html) can b
environment variables, e.g. `POSTFIX_allow_mail_to_commands=alias,forward,include`. Specifying no content (empty
variable) will remove that variable from postfix config.
#### ANONYMIZE_EMAILS
Anonymize email in Postfix logs. It mask the email content by putting `*` in the middle of the name and the domain.
For example: `from=<a*****************s@a***********.com>`
Syntax: `<masker-name>[;options]`
The following filters are provided with this implementation:
##### The `default` (`smart`) filter
Enable the filter by setting `ANONYMIZE_EMAILS=smart`.
The filter has no options and is enabled by setting the value to `on`, `true`, `1`, `default` or `smart`. The filter
masker will take an educated guess at how to best mask the emails, specifically:
* It will leave the first and the last letter of the local part (if it's oly one letter, it will get repated)
* If the local part is in quotes, it will remove the quotes (Warning: if the email starts with a space, this might look weird in logs)
* It will replace all the letters inbetween with **ONE** asterisk, even if there are none
* It will replace everything but a TLD with a star
* Address-style domains will see the number replaced with stars
E.g.:
* `[email protected]` -> `d*o@*******.org`
* `[email protected]` -> `j*e@*******.solutions`
* `sa@localhost` -> `s*a@*********`
* `s@[192.168.8.10]` -> `s*s@[*.*.*.*]`
* `"multi....dot"@[IPv6:2001:db8:85a3:8d3:1319:8a2e:370:7348]` -> `"m*t"@[IPv6:***********]`
##### The `paranoid` filter
The paranoid filter works similar to smart filter but will:
* Replace the local part with **ONE** asterisk
* Replace the domain part (sans TLD) with **ONE asterisk
E.g.:
* `[email protected]` -> `*@*.org`
* `[email protected]` -> `*@*.solutions`
* `sa@localhost` -> `*@*`
* `s@[192.168.8.10]` -> `*@[*]`
* `"multi....dot"@[IPv6:2001:db8:85a3:8d3:1319:8a2e:370:7348]` -> `*@[IPv6:*]`
##### The `noop` filter
This filter doesn't do anything. It's used for testing purposes only.
##### Writting your own filters
It's easy enough to write your own filters. The simplest way would be to take the `email-anonymizer.py` filte in this
image, write your own and then attach it to the container image under `/scripts`. If you're feeling adentorous, you can
also install your own Python package -- the script will automatically pick up the class name.
### DKIM / DomainKeys
**This image is equipped with support for DKIM.** If you want to use DKIM you will need to generate DKIM keys. These can
Expand Down
1 change: 1 addition & 0 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

export DOCKER_BUILDKIT=1
export DOCKER_CLI_EXPERIMENTAL=enabled
export BUILDKIT_PROGRESS=plain

if ! docker buildx inspect multiarch > /dev/null; then
docker buildx create --name multiarch
Expand Down
8 changes: 7 additions & 1 deletion configs/rsyslog.conf
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,18 @@ template(name="plain" type="list") {
constant(value="\n")
}

#<email-anonymizer>
#module(load="mmexternal")
#</email-anonymizer>

if $syslogseverity <= '6' then {
# Do not log healthchecks
if ($msg contains "connect from localhost[127.0.0.1]") then { stop }
if ($msg contains "lost connection after EHLO from localhost[127.0.0.1]") then { stop }
if ($msg contains "disconnect from localhost[127.0.0.1] ehlo=1 commands=1") then { stop }
# matching logs will be saved
#<email-anonymizer>
#action(type="mmexternal" binary="/scripts/email-anonymizer.sh <anon-email-format>" interface.input="msg")
#</email-anonymizer>
action(type="omfile" DynaFile="devicelog" template="<log-format>" DirCreateMode="0755" FileCreateMode="0644")
# enable below to stop processing further this log
stop
Expand Down
4 changes: 2 additions & 2 deletions configs/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ stdout_logfile_maxbytes = 0
stderr_logfile_maxbytes = 0

[program:postfix]
command = /postfix.sh
command = /scripts/postfix.sh
autostart = true
autorestart = false
directory = /etc/postfix
startsecs = 0

[program:opendkim]
command = /opendkim.sh
command = /scripts/opendkim.sh
user = opendkim
autostart = true
autorestart = true
Expand Down
21 changes: 21 additions & 0 deletions scripts/common-run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,27 @@ rsyslog_log_format() {
sed -i -E "s/<log-format>/${log_format}/" /etc/rsyslog.conf
}

anon_email_log() {
local anon_email="${ANONYMIZE_EMAILS}"
if [[ "${anon_email}" == "true" || "${anon_email}" == "1" || "${anon_email}" == "yes" || "${anon_email}" == "y" ]]; then
anon_email="default"
fi
if [[ -n "${anon_email}" && "${anon_email}" != "0" ]]; then
notice "Using ${emphasis}${anon_email}${reset} filter for email anonymization."
sed -i -E "s/<anon-email-format>/${anon_email}/" /etc/rsyslog.conf
sed -i -E '
/^\s*#\s*<email-anonymizer>\s*$/,/^\s*#\s*<\/email-anonymizer>\s*$/{
/^\s*#\s*<email-anonymizer>\s*$/n
/^\s*#\s*<\/email-anonymizer>\s*$/! {
s/(\s*)#(.*)$/\1\2/g
}
}
' /etc/rsyslog.conf
else
info "Emails in the logs will not be anonymized. Set ${emphasis}ANONYMIZE_EMAILS${reset} to enable this feature."
fi
}

setup_conf() {
local srcfile
local dstfile
Expand Down
207 changes: 207 additions & 0 deletions scripts/email-anonymizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
#!/usr/bin/env python3

"""
Filter to anonyimize email addresses. It reads input line by line,
finds all emails in the input and masks them using given filter.
Big thanks to [Sergio Del Río Mayoral](https://github.com/sdelrio)
for the concept and the idea, although not a lot of the code went
into this commit in the end.
"""

import re
import logging
import typing
import json
import sys
import importlib

logger = logging.getLogger(__name__)

# BIG FAT NOTICE on emails and regular expressions:
# If you're planning on using a regular expression to validate an email: don't. Emails
# are much more complext than you would imagine and most regular expressions will not
# cover all usecases. Newer RFCs even allow for international (read: UTF-8) email addresses.
# Most of your favourite programming languages will have a dedicated library for validating
# addresses.
#
# This pattern below, should, however match (hopefully) anything that looks like an email
# It is too broad, though, as it will match things which are not considered valid email
# addresses as well. But for our use case, that's OK and more than sufficient.
EMAIL_CATCH_ALL_PATTERN = '([^ "\\[\\]<>]+|".+")@(\[([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|[A-Za-z0-9]+:.+)\]|([^ \\{}():;]+(\.[^ \\{}():;]+)*))'
EMAIL_CATCH_ALL = re.compile(EMAIL_CATCH_ALL_PATTERN)
EMPTY_RESPONSE = json.dumps({})

# Postfix formats message IDs like this. Let's not mask them
# 20211207101128.0805BA272@31bfa77a2cab
MESSAGE_ID_PATTERN = '[0-9]+\.[0-9A-F]+@[0-9a-f]+'
MESSAGE_ID = re.compile(MESSAGE_ID_PATTERN)

"""A default filter, if none other is provided."""
DEFAULT_FILTER_CLASS: str = 'SmartFilter'

"""Map filter names to friendly names"""
FILTER_MAPPINGS = {
'default': DEFAULT_FILTER_CLASS,
'smart': 'SmartFilter',
'paranoid': 'ParanoidFilter',
'noop': 'NoopFilter',
}

# ---------------------------------------- #

class Filter():
def init(self, args: list[str]) -> None:
pass

def processMessage(self, msg: str) -> str:
pass

"""
This filter does nothing.
"""
class NoopFilter(Filter):
def processMessage(self, msg: str) -> str:
return EMPTY_RESPONSE

"""
This filter will take an educated guess at how to best mask the emails, specifically:
* It will leave the first and the last letter of the local part (if it's oly one letter, it will get repated)
* If the local part is in quotes, it will remove the quotes (Warning: if the email starts with a space, this might look weird in logs)
* It will replace all the letters inbetween with **ONE** asterisk
* It will replace everything but a TLD with a star
* Address-style domains will see the number replaced with stars
E.g.:
* `[email protected]` -> `d*o@*******.org`
* `[email protected]` -> `j*e@*******.solutions`
* `sa@localhost` -> `s*a@*********`
* `s@[192.168.8.10]` -> `s*s@[*]`
* `"multi....dot"@[IPv6:2001:db8:85a3:8d3:1319:8a2e:370:7348]` -> `m*t@[IPv6:*]`
"""
class SmartFilter(Filter):
mask_symbol: str = '*'

def mask_local(self, local: str) -> str:
if local[0] == '"' and local[-1] == '"':
return local[:2] + self.mask_symbol + local[-2:]
else:
return local[0] + self.mask_symbol + local[-1]

def mask_domain(self, domain: str) -> str:
if domain[0] == '[' and domain[-1] == ']': # Numerical domain
if ':' in domain[1:-1]:
left, right = domain.split(":", 1)
return left + ':' + (len(right)-1) * self.mask_symbol + ']'
else:
return '[*.*.*.*]'
elif '.' in domain: # Normal domain
s, tld = domain.rsplit('.', 1)
return len(s) * self.mask_symbol + '.' + tld
pass
else: # Local domain
return len(domain) * self.mask_symbol

def replace(self, match: re.match) -> str:
email = match.group()

# Return the details unchanged if they look like Postfix message ID
if bool(MESSAGE_ID.match(email)):
return email

# The "@" can show up in the local part, but shouldn't appear in the
# domain part (at least not that we know).
local, domain = email.rsplit("@", 1)

local = self.mask_local(local)
domain = self.mask_domain(domain)

return local + '@' + domain

def processMessage(self, msg: str) -> typing.Optional[str]:
result = EMAIL_CATCH_ALL.sub(
lambda x: self.replace(x), msg
)
return json.dumps({'msg': result}, ensure_ascii=False) if result != msg else EMPTY_RESPONSE

class ParanoidFilter(SmartFilter):

def mask_local(self, local: str) -> str:
return self.mask_symbol

def mask_domain(self, domain: str) -> str:
if domain[0] == '[' and domain[-1] == ']': # Numerical domain
if ':' in domain[1:-1]:
left, right = domain.split(":", 1)
return left + ':*]'
else:
return '[*]'
elif '.' in domain: # Normal domain
s, tld = domain.rsplit('.', 1)
return self.mask_symbol + '.' + tld
pass
else: # Local domain
return self.mask_symbol

# ---------------------------------------- #

def get_filter() -> Filter:
"""
Initialize the filter
This method will check your configuration and create a new filter
:return: Returns a specific implementation of the `Filter`
"""
opts: list[str] = []
clazz: typing.Optional[str] = None

if len(sys.argv) > 1:
clazz = sys.argv[1].strip()
opts = sys.argv[2:]

if clazz.lower() in FILTER_MAPPINGS:
clazz = FILTER_MAPPINGS[clazz.lower()]

if clazz is None or clazz.strip() == '':
clazz = DEFAULT_FILTER_CLASS

logger.debug(f"Constructing new {clazz} filter.")

try:
if "." in clazz:
module_name, class_name = clazz.rsplit(".", 1)
filter_class = getattr(importlib.import_module(module_name), class_name)
filter_obj: Filter = filter_class()
else:
filter_class = getattr(sys.modules[__name__], clazz)
filter_obj: Filter = filter_class()
except Exception as e:
raise RuntimeError(f'Could not instatiate filter named "{clazz}"!') from e

try:
filter_obj.init(opts)
except Exception as e:
raise RuntimeError(f'Init of filter "{clazz}" with parameters {opts} failed!') from e

return filter_obj


def process(f: Filter) -> None:
while True:
message = sys.stdin.readline()
if message:
message = message[:-1] # Remove line feed
result = f.processMessage(message)
print(result)
sys.stdout.flush()
else:
# Empty line. stdin has been closed
break

process(get_filter())
13 changes: 13 additions & 0 deletions scripts/email-anonymizer.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env bash
set -e

SCRIPT_DIR=$(CDPATH='' cd -- "$(dirname -- "$0")" && pwd)
##
# Email anonymizer is a filter which goes through every line reported in syslog and filters
# out email addresess.
# This ensures that python output buffering is disabled and outputs
# are sent straight to the terminal
##
while ! env PYTHONUNBUFFERED=1 python3 "$SCRIPT_DIR/email-anonymizer.py" "$@"; do
sleep 1
done
Loading

0 comments on commit a1a2082

Please sign in to comment.