Deal with email tags in RIPE db changes #11

bernhardreiter · 2020-11-27T15:06:40Z

If an organisation changes its email address, there is a warning when running ripe_diff.py.

However, if this email address has had extra information (like being disabled or tags), there maybe additional
steps necessary. This issue should make sure that they are done by either flagging them to the update operator
or doing the changes itself (if possible).

considerations

The table email_tag is independent from the manual and automatic tables and has effects on all email addresses when it is attempted to use the address.

So if for an organisation the contact with emailA is replaced by a different contact with emailB,
what should happen?

Let us consider the cases. We have two email addresses:

emailA
emailB

They both maybe used in one or several contacts (both manual or automatic).

Case 1 both have no status info

Nothing to do (solved)

Case 2 emailA has status info, emailB does not

If emailB is to be used for the first time, it could inherit the status info of emailA automatically.
(Though it is unclear if in all cases the new email address' operator actually wants this.)

Else it is not clear because the other organisations that already have emailB did not have that status
so it cannot be transfered without surprising them. -> Flag for manual decision.

Case 3 emailA does not, emailB has status info

If emailB has status infos, it is already used by another organisation, so it is unclear. -> Flag for manual decison.

Case 4 emailA and emailB both have status info

If they are the same status info -> do nothing (solved).

Else (the info differs) and it is undecided which is the better one for the remaining orgs for emailB
and the new one. -> Flag for decision.

-> Flag

As there are only a one subcase (2.1) where the script could automatically do the right thing,
a good approach seems to be to flag the situation.

The text was updated successfully, but these errors were encountered:

bernhardreiter · 2021-02-18T15:58:27Z

creating test cases

Using fresh temporary database with postgresql and download ripe data as described in the documentation.
Then we import only a part of the ripe data and use Norway as an example (fishshell).

set -l day 2021-03-03
cp -a $day $day-trunc
cd $day-trunc
gzip -d --stdout ../$day/ripe.db.inet6num.gz | head --lines=100000 | gzip > ripe.db.inet6num.gz
gzip -d --stdout ../$day/ripe.db.inetnum.gz | head --lines=100000 | gzip > ripe.db.inetnum.gz

set -l b /home/bern/dev/certbund-contact-git
env PYTHONPATH=$b: python3 $b/intelmq_certbund_contact/ripe/ripe_import.py -v --restrict-to-country NO --conninfo 'host=localhost dbname=contactdb'

# using approximately 3% of the entries
psql -h localhost contactdb -c "select email from contact_automatic TABLESAMPLE BERNOULLI (3);" --tuples-only | \
 head --lines=-1 | cut --characters=2- >sample_emails.txt

# sed 's/\(.*\)/UPDATE contact_automatic SET email=\'x-\1\' WHERE email=\'\1\';/' sample_emails.txt >sample_emails.sql
# psql -h localhost contactdb -f sample_emails.sql

# sed  's/\(.*\)/x-\1/' sample_emails.txt >sample_emails_new.txt

env PYTHONPATH=$b: python3 $b/intelmq_certbund_contact/ripe/ripe_diff.py -v --restrict-to-country NO --conninfo 'host=localhost dbname=contactdb'

As #11 has a number of cases that we want to detect and possibly handle differently, test cases are needed. To create them reliably and without we use a script that creates sql commands. An alternative is have been to go through the fody-backend, but which would require one more component to run for building the test cases.

bernhardreiter · 2021-02-26T15:07:06Z

creating test cases

To find 10 email address that have been used in several contacts:

SELECT * FROM (
  SELECT COUNT(email), email FROM contact_automatic GROUP BY email ORDER BY COUNT(email) Desc) AS c 
WHERE count >1 LIMIT 10;

as single command to produce a list

psql  -h localhost contactdb -c "SELECT email
   FROM (SELECT COUNT(email), email FROM contact_automatic 
               GROUP BY email ORDER BY COUNT(email) Desc) AS c
   WHERE count >1 LIMIT 10;
 " --tuples-only | head --lines=-1 | cut --characters=2- > multiple_emails.txt

To change the email address of only one contact of several for one email, the sql command is more involved:

sed 's/\(.*\)/UPDATE contact_automatic SET email=\'x-\1\' WHERE contact_automatic_id IN (SELECT contact_automatic_id FROM contact_automatic WHERE email=\'\1\' LIMIT 1);/'

with the combination of tests/create_email_tag_cases.py , a lot more test cases can be created.

useful for #11

bernhardreiter · 2021-03-05T16:50:57Z

There is a difficulty: the ripe_import workflow does not look at the differences, it only completely deletes the old contents and inserts the new database completely. So in order to find the organisation with changed email addresses, a ripe_diff run is needed, but ripe_diff does not writing anything to the database.

One implementation strategy could be to let the diff run create the necessary changes in the email tags to be saved in a file and executed after the import run.

bernhardreiter added a commit that referenced this issue Feb 26, 2021

complete create_email_tag_cases.py with all cases

bed6ca6

useful for #11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal with email tags in RIPE db changes #11

Deal with email tags in RIPE db changes #11

bernhardreiter commented Nov 27, 2020 •

edited

Loading

bernhardreiter commented Feb 18, 2021 •

edited

Loading

bernhardreiter commented Feb 26, 2021 •

edited

Loading

bernhardreiter commented Mar 5, 2021

Deal with email tags in RIPE db changes #11

Deal with email tags in RIPE db changes #11

Comments

bernhardreiter commented Nov 27, 2020 • edited Loading

considerations

Case 1 both have no status info

Case 2 emailA has status info, emailB does not

Case 3 emailA does not, emailB has status info

Case 4 emailA and emailB both have status info

-> Flag

bernhardreiter commented Feb 18, 2021 • edited Loading

creating test cases

bernhardreiter commented Feb 26, 2021 • edited Loading

creating test cases

bernhardreiter commented Mar 5, 2021

bernhardreiter commented Nov 27, 2020 •

edited

Loading

bernhardreiter commented Feb 18, 2021 •

edited

Loading

bernhardreiter commented Feb 26, 2021 •

edited

Loading