Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with email tags in RIPE db changes #11

Open
bernhardreiter opened this issue Nov 27, 2020 · 3 comments
Open

Deal with email tags in RIPE db changes #11

bernhardreiter opened this issue Nov 27, 2020 · 3 comments

Comments

@bernhardreiter
Copy link
Member

bernhardreiter commented Nov 27, 2020

If an organisation changes its email address, there is a warning when running ripe_diff.py.

However, if this email address has had extra information (like being disabled or tags), there maybe additional
steps necessary. This issue should make sure that they are done by either flagging them to the update operator
or doing the changes itself (if possible).

considerations

The table email_tag is independent from the manual and automatic tables and has effects on all email addresses when it is attempted to use the address.

So if for an organisation the contact with emailA is replaced by a different contact with emailB,
what should happen?

Let us consider the cases. We have two email addresses:

  • emailA
  • emailB

They both maybe used in one or several contacts (both manual or automatic).

Case 1 both have no status info

Nothing to do (solved)

Case 2 emailA has status info, emailB does not

If emailB is to be used for the first time, it could inherit the status info of emailA automatically.
(Though it is unclear if in all cases the new email address' operator actually wants this.)

Else it is not clear because the other organisations that already have emailB did not have that status
so it cannot be transfered without surprising them. -> Flag for manual decision.

Case 3 emailA does not, emailB has status info

If emailB has status infos, it is already used by another organisation, so it is unclear. -> Flag for manual decison.

Case 4 emailA and emailB both have status info

If they are the same status info -> do nothing (solved).

Else (the info differs) and it is undecided which is the better one for the remaining orgs for emailB
and the new one. -> Flag for decision.

-> Flag

As there are only a one subcase (2.1) where the script could automatically do the right thing,
a good approach seems to be to flag the situation.

@bernhardreiter
Copy link
Member Author

bernhardreiter commented Feb 18, 2021

creating test cases

Using fresh temporary database with postgresql and download ripe data as described in the documentation.
Then we import only a part of the ripe data and use Norway as an example (fishshell).

set -l day 2021-03-03
cp -a $day $day-trunc
cd $day-trunc
gzip -d --stdout ../$day/ripe.db.inet6num.gz | head --lines=100000 | gzip > ripe.db.inet6num.gz
gzip -d --stdout ../$day/ripe.db.inetnum.gz | head --lines=100000 | gzip > ripe.db.inetnum.gz

set -l b /home/bern/dev/certbund-contact-git
env PYTHONPATH=$b: python3 $b/intelmq_certbund_contact/ripe/ripe_import.py -v --restrict-to-country NO --conninfo 'host=localhost dbname=contactdb'

# using approximately 3% of the entries
psql -h localhost contactdb -c "select email from contact_automatic TABLESAMPLE BERNOULLI (3);" --tuples-only | \
 head --lines=-1 | cut --characters=2- >sample_emails.txt

# sed 's/\(.*\)/UPDATE contact_automatic SET email=\'x-\1\' WHERE email=\'\1\';/' sample_emails.txt >sample_emails.sql
# psql -h localhost contactdb -f sample_emails.sql

# sed  's/\(.*\)/x-\1/' sample_emails.txt >sample_emails_new.txt

env PYTHONPATH=$b: python3 $b/intelmq_certbund_contact/ripe/ripe_diff.py -v --restrict-to-country NO --conninfo 'host=localhost dbname=contactdb'

bernhardreiter added a commit that referenced this issue Feb 25, 2021
As #11 has a number of cases that we want to detect
and possibly handle differently, test cases are needed.

To create them reliably and without we use a script that creates
sql commands. An alternative is have been to go through the
fody-backend, but which would require one more component to run
for building the test cases.
@bernhardreiter
Copy link
Member Author

bernhardreiter commented Feb 26, 2021

creating test cases

To find 10 email address that have been used in several contacts:

SELECT * FROM (
  SELECT COUNT(email), email FROM contact_automatic GROUP BY email ORDER BY COUNT(email) Desc) AS c 
WHERE count >1 LIMIT 10;

as single command to produce a list

psql  -h localhost contactdb -c "SELECT email
   FROM (SELECT COUNT(email), email FROM contact_automatic 
               GROUP BY email ORDER BY COUNT(email) Desc) AS c
   WHERE count >1 LIMIT 10;
 " --tuples-only | head --lines=-1 | cut --characters=2- > multiple_emails.txt

To change the email address of only one contact of several for one email, the sql command is more involved:

sed 's/\(.*\)/UPDATE contact_automatic SET email=\'x-\1\' WHERE contact_automatic_id IN (SELECT contact_automatic_id FROM contact_automatic WHERE email=\'\1\' LIMIT 1);/'

with the combination of tests/create_email_tag_cases.py , a lot more test cases can be created.

bernhardreiter added a commit that referenced this issue Feb 26, 2021
@bernhardreiter
Copy link
Member Author

There is a difficulty: the ripe_import workflow does not look at the differences, it only completely deletes the old contents and inserts the new database completely. So in order to find the organisation with changed email addresses, a ripe_diff run is needed, but ripe_diff does not writing anything to the database.

One implementation strategy could be to let the diff run create the necessary changes in the email tags to be saved in a file and executed after the import run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant