-
Notifications
You must be signed in to change notification settings - Fork 155
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add email anonymizer option in case is needed for GDPR
This new feature will anonymize emails in the Postfix logs. This allows you to keep them indefinetely while being compliant with GDPR. Based on excellent work on [this pull request](#91). Check `README.md` for more details.
- Loading branch information
Showing
13 changed files
with
640 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ Simple postfix relay host ("postfix null client") for your Docker containers. Ba | |
* [POSTFIX_mynetworks](#postfix_mynetworks) | ||
* [POSTFIX_message_size_limit](#postfix_message_size_limit) | ||
* [Overriding specific postfix settings](#overriding-specific-postfix-settings) | ||
* [ANONYMIZE_EMAILS](#anonymize_emails) | ||
* [DKIM / DomainKeys](#dkim--domainkeys) | ||
* [Supplying your own DKIM keys](#supplying-your-own-dkim-keys) | ||
* [Auto-generating the DKIM selectors through the image](#auto-generating-the-dkim-selectors-through-the-image) | ||
|
@@ -331,6 +332,61 @@ Any Postfix [configuration option](http://www.postfix.org/postconf.5.html) can b | |
environment variables, e.g. `POSTFIX_allow_mail_to_commands=alias,forward,include`. Specifying no content (empty | ||
variable) will remove that variable from postfix config. | ||
#### ANONYMIZE_EMAILS | ||
Anonymize email in Postfix logs. It mask the email content by putting `*` in the middle of the name and the domain. | ||
For example: `from=<a*****************s@a***********.com>` | ||
Syntax: `<masker-name>[;options]` | ||
The following filters are provided with this implementation: | ||
##### The `default` (`smart`) filter | ||
Enable the filter by setting `ANONYMIZE_EMAILS=smart`. | ||
The filter has no options and is enabled by setting the value to `on`, `true`, `1`, `default` or `smart`. The filter | ||
masker will take an educated guess at how to best mask the emails, specifically: | ||
* It will leave the first and the last letter of the local part (if it's oly one letter, it will get repated) | ||
* If the local part is in quotes, it will remove the quotes (Warning: if the email starts with a space, this might look weird in logs) | ||
* It will replace all the letters inbetween with **ONE** asterisk, even if there are none | ||
* It will replace everything but a TLD with a star | ||
* Address-style domains will see the number replaced with stars | ||
E.g.: | ||
* `[email protected]` -> `d*o@*******.org` | ||
* `[email protected]` -> `j*e@*******.solutions` | ||
* `sa@localhost` -> `s*a@*********` | ||
* `s@[192.168.8.10]` -> `s*s@[*.*.*.*]` | ||
* `"multi....dot"@[IPv6:2001:db8:85a3:8d3:1319:8a2e:370:7348]` -> `"m*t"@[IPv6:***********]` | ||
##### The `paranoid` filter | ||
The paranoid filter works similar to smart filter but will: | ||
* Replace the local part with **ONE** asterisk | ||
* Replace the domain part (sans TLD) with **ONE asterisk | ||
E.g.: | ||
* `[email protected]` -> `*@*.org` | ||
* `[email protected]` -> `*@*.solutions` | ||
* `sa@localhost` -> `*@*` | ||
* `s@[192.168.8.10]` -> `*@[*]` | ||
* `"multi....dot"@[IPv6:2001:db8:85a3:8d3:1319:8a2e:370:7348]` -> `*@[IPv6:*]` | ||
##### The `noop` filter | ||
This filter doesn't do anything. It's used for testing purposes only. | ||
##### Writting your own filters | ||
It's easy enough to write your own filters. The simplest way would be to take the `email-anonymizer.py` filte in this | ||
image, write your own and then attach it to the container image under `/scripts`. If you're feeling adentorous, you can | ||
also install your own Python package -- the script will automatically pick up the class name. | ||
### DKIM / DomainKeys | ||
**This image is equipped with support for DKIM.** If you want to use DKIM you will need to generate DKIM keys. These can | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,207 @@ | ||
#!/usr/bin/env python3 | ||
|
||
""" | ||
Filter to anonyimize email addresses. It reads input line by line, | ||
finds all emails in the input and masks them using given filter. | ||
Big thanks to [Sergio Del Río Mayoral](https://github.com/sdelrio) | ||
for the concept and the idea, although not a lot of the code went | ||
into this commit in the end. | ||
""" | ||
|
||
import re | ||
import logging | ||
import typing | ||
import json | ||
import sys | ||
import importlib | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
# BIG FAT NOTICE on emails and regular expressions: | ||
# If you're planning on using a regular expression to validate an email: don't. Emails | ||
# are much more complext than you would imagine and most regular expressions will not | ||
# cover all usecases. Newer RFCs even allow for international (read: UTF-8) email addresses. | ||
# Most of your favourite programming languages will have a dedicated library for validating | ||
# addresses. | ||
# | ||
# This pattern below, should, however match (hopefully) anything that looks like an email | ||
# It is too broad, though, as it will match things which are not considered valid email | ||
# addresses as well. But for our use case, that's OK and more than sufficient. | ||
EMAIL_CATCH_ALL_PATTERN = '([^ "\\[\\]<>]+|".+")@(\[([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|[A-Za-z0-9]+:.+)\]|([^ \\{}():;]+(\.[^ \\{}():;]+)*))' | ||
EMAIL_CATCH_ALL = re.compile(EMAIL_CATCH_ALL_PATTERN) | ||
EMPTY_RESPONSE = json.dumps({}) | ||
|
||
# Postfix formats message IDs like this. Let's not mask them | ||
# 20211207101128.0805BA272@31bfa77a2cab | ||
MESSAGE_ID_PATTERN = '[0-9]+\.[0-9A-F]+@[0-9a-f]+' | ||
MESSAGE_ID = re.compile(MESSAGE_ID_PATTERN) | ||
|
||
"""A default filter, if none other is provided.""" | ||
DEFAULT_FILTER_CLASS: str = 'SmartFilter' | ||
|
||
"""Map filter names to friendly names""" | ||
FILTER_MAPPINGS = { | ||
'default': DEFAULT_FILTER_CLASS, | ||
'smart': 'SmartFilter', | ||
'paranoid': 'ParanoidFilter', | ||
'noop': 'NoopFilter', | ||
} | ||
|
||
# ---------------------------------------- # | ||
|
||
class Filter(): | ||
def init(self, args: list[str]) -> None: | ||
pass | ||
|
||
def processMessage(self, msg: str) -> str: | ||
pass | ||
|
||
""" | ||
This filter does nothing. | ||
""" | ||
class NoopFilter(Filter): | ||
def processMessage(self, msg: str) -> str: | ||
return EMPTY_RESPONSE | ||
|
||
""" | ||
This filter will take an educated guess at how to best mask the emails, specifically: | ||
* It will leave the first and the last letter of the local part (if it's oly one letter, it will get repated) | ||
* If the local part is in quotes, it will remove the quotes (Warning: if the email starts with a space, this might look weird in logs) | ||
* It will replace all the letters inbetween with **ONE** asterisk | ||
* It will replace everything but a TLD with a star | ||
* Address-style domains will see the number replaced with stars | ||
E.g.: | ||
* `[email protected]` -> `d*o@*******.org` | ||
* `[email protected]` -> `j*e@*******.solutions` | ||
* `sa@localhost` -> `s*a@*********` | ||
* `s@[192.168.8.10]` -> `s*s@[*]` | ||
* `"multi....dot"@[IPv6:2001:db8:85a3:8d3:1319:8a2e:370:7348]` -> `m*t@[IPv6:*]` | ||
""" | ||
class SmartFilter(Filter): | ||
mask_symbol: str = '*' | ||
|
||
def mask_local(self, local: str) -> str: | ||
if local[0] == '"' and local[-1] == '"': | ||
return local[:2] + self.mask_symbol + local[-2:] | ||
else: | ||
return local[0] + self.mask_symbol + local[-1] | ||
|
||
def mask_domain(self, domain: str) -> str: | ||
if domain[0] == '[' and domain[-1] == ']': # Numerical domain | ||
if ':' in domain[1:-1]: | ||
left, right = domain.split(":", 1) | ||
return left + ':' + (len(right)-1) * self.mask_symbol + ']' | ||
else: | ||
return '[*.*.*.*]' | ||
elif '.' in domain: # Normal domain | ||
s, tld = domain.rsplit('.', 1) | ||
return len(s) * self.mask_symbol + '.' + tld | ||
pass | ||
else: # Local domain | ||
return len(domain) * self.mask_symbol | ||
|
||
def replace(self, match: re.match) -> str: | ||
email = match.group() | ||
|
||
# Return the details unchanged if they look like Postfix message ID | ||
if bool(MESSAGE_ID.match(email)): | ||
return email | ||
|
||
# The "@" can show up in the local part, but shouldn't appear in the | ||
# domain part (at least not that we know). | ||
local, domain = email.rsplit("@", 1) | ||
|
||
local = self.mask_local(local) | ||
domain = self.mask_domain(domain) | ||
|
||
return local + '@' + domain | ||
|
||
def processMessage(self, msg: str) -> typing.Optional[str]: | ||
result = EMAIL_CATCH_ALL.sub( | ||
lambda x: self.replace(x), msg | ||
) | ||
return json.dumps({'msg': result}, ensure_ascii=False) if result != msg else EMPTY_RESPONSE | ||
|
||
class ParanoidFilter(SmartFilter): | ||
|
||
def mask_local(self, local: str) -> str: | ||
return self.mask_symbol | ||
|
||
def mask_domain(self, domain: str) -> str: | ||
if domain[0] == '[' and domain[-1] == ']': # Numerical domain | ||
if ':' in domain[1:-1]: | ||
left, right = domain.split(":", 1) | ||
return left + ':*]' | ||
else: | ||
return '[*]' | ||
elif '.' in domain: # Normal domain | ||
s, tld = domain.rsplit('.', 1) | ||
return self.mask_symbol + '.' + tld | ||
pass | ||
else: # Local domain | ||
return self.mask_symbol | ||
|
||
# ---------------------------------------- # | ||
|
||
def get_filter() -> Filter: | ||
""" | ||
Initialize the filter | ||
This method will check your configuration and create a new filter | ||
:return: Returns a specific implementation of the `Filter` | ||
""" | ||
opts: list[str] = [] | ||
clazz: typing.Optional[str] = None | ||
|
||
if len(sys.argv) > 1: | ||
clazz = sys.argv[1].strip() | ||
opts = sys.argv[2:] | ||
|
||
if clazz.lower() in FILTER_MAPPINGS: | ||
clazz = FILTER_MAPPINGS[clazz.lower()] | ||
|
||
if clazz is None or clazz.strip() == '': | ||
clazz = DEFAULT_FILTER_CLASS | ||
|
||
logger.debug(f"Constructing new {clazz} filter.") | ||
|
||
try: | ||
if "." in clazz: | ||
module_name, class_name = clazz.rsplit(".", 1) | ||
filter_class = getattr(importlib.import_module(module_name), class_name) | ||
filter_obj: Filter = filter_class() | ||
else: | ||
filter_class = getattr(sys.modules[__name__], clazz) | ||
filter_obj: Filter = filter_class() | ||
except Exception as e: | ||
raise RuntimeError(f'Could not instatiate filter named "{clazz}"!') from e | ||
|
||
try: | ||
filter_obj.init(opts) | ||
except Exception as e: | ||
raise RuntimeError(f'Init of filter "{clazz}" with parameters {opts} failed!') from e | ||
|
||
return filter_obj | ||
|
||
|
||
def process(f: Filter) -> None: | ||
while True: | ||
message = sys.stdin.readline() | ||
if message: | ||
message = message[:-1] # Remove line feed | ||
result = f.processMessage(message) | ||
print(result) | ||
sys.stdout.flush() | ||
else: | ||
# Empty line. stdin has been closed | ||
break | ||
|
||
process(get_filter()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/usr/bin/env bash | ||
set -e | ||
|
||
SCRIPT_DIR=$(CDPATH='' cd -- "$(dirname -- "$0")" && pwd) | ||
## | ||
# Email anonymizer is a filter which goes through every line reported in syslog and filters | ||
# out email addresess. | ||
# This ensures that python output buffering is disabled and outputs | ||
# are sent straight to the terminal | ||
## | ||
while ! env PYTHONUNBUFFERED=1 python3 "$SCRIPT_DIR/email-anonymizer.py" "$@"; do | ||
sleep 1 | ||
done |
Oops, something went wrong.