Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated Report IDs #535

Open
fabm3n opened this issue Jul 16, 2024 · 2 comments
Open

Duplicated Report IDs #535

fabm3n opened this issue Jul 16, 2024 · 2 comments

Comments

@fabm3n
Copy link

fabm3n commented Jul 16, 2024

I just started using parsedmarc and got the same DMARC Report from Google twice:
image

As found out by this Reddit post, this is a TTL issue. https://www.reddit.com/r/DMARC/comments/1bafpk5/getting_multiple_identical_reports_from_google/

For me, the behaviour from parsedmarc is also wrong because both reports with the same report id have been added to the Elasticsearch database:
image

I expected that the report id is unique and there should be only one document in the database.
The best way could be to override the document with the last processed dmarc report.

@fabm3n
Copy link
Author

fabm3n commented Jul 16, 2024

I checked the code and noticed, this should not happen because there is a duplication check:
https://github.com/domainaware/parsedmarc/blob/master/parsedmarc/elastic.py#L410

When i now try and move one of the processed mails from the archive to the inbox, the check works:
WARNING:cli.py:100:An aggregate report ID 16651655217010351577 from google.com about DOMAIN with a date range of 2024-07-13 00:00:00Z UTC to 2024-07-13 23:59:59Z UTC already exists in Elasticsearch

I enabled debug and try to reproduce the issue.

@fabm3n
Copy link
Author

fabm3n commented Jul 16, 2024

Ok, already reproduced it. I cleared my index and moved both mails in the inbox.
After that, they are processed as a batch, and it looks like there is no duplication check:

Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:__init__.py:1433:Found 2 messages in INBOX
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:__init__.py:1441:Processing 2 messages
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:__init__.py:1445:Processing message 1 of 2: UID 6
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:     INFO:__init__.py:1085:Parsing mail from [email protected] on 2024-07-13 16:59:59-07:00
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:utils.py:388:IP address IPwas found in cache
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:__init__.py:1445:Processing message 2 of 2: UID 7
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:     INFO:__init__.py:1085:Parsing mail from [email protected] on 2024-07-13 16:59:59-07:00
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:utils.py:388:IP address IPwas found in cache
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:__init__.py:1506:Moving aggregate report messages from INBOX to Archive/Aggregate
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:__init__.py:1513:Moving message 1 of 2: UID 6
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:__init__.py:1513:Moving message 2 of 2: UID 7
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:     INFO:elastic.py:369:Saving aggregate report to Elasticsearch
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:    DEBUG:elastic.py:289:Creating Elasticsearch index: dmarc_aggregate-2024-07-13
Jul 16 13:18:29 parsedmarc parsedmarc[1023588]:     INFO:elastic.py:369:Saving aggregate report to Elasticsearch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant