Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration Test Case for is_latest_issue using DB #925

Open
wants to merge 31 commits into
base: dev
Choose a base branch
from

Conversation

xavier-xia-99
Copy link

@xavier-xia-99 xavier-xia-99 commented Jun 6, 2022

Summary

This is a fix for issue #896.

The earlier integration tests did not consist of a connection to the database, instead using a MagicMock() object. This did not emulate the actual behaviour of the system. This PR includes a test file is_latest_issue.py where the database was actively used to query to ensure that the is_latest_issue flag is accurately reflected in signal_latest and signal_history. It also checks for the addition of new src , sig and geo_value, geo_type into respective *_dim tables.

It also includes an edge case that was identified with regards to the db pipeline.

Fixes:

added edge case

added prettytable for better vizuals

all tests passed
@xavier-xia-99 xavier-xia-99 changed the title Integration Test Case for Database Integration Test Case for is_latest_issue using DB Jun 6, 2022
Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should revert test_covidcast_meta_caching.py (its changes are minuscule and not part of this issue) and delete dev/local/output.txt

there are a lot of repeated strings in the SQL statements. they can be saved to variables for re-use -- which can also mean less reading for anyone doing maintenance on this.

i think i would remove the print() statements and prettytable and maxDiff=None stuff, unless you think theyre particularly useful. or maybe just reduce them to one call to your view_table() helper fn in the test_*() methods?

clean up these and the other comments, and we can take another pass at it!

sql2 = '''SELECT `issue` FROM `signal_history` where `time_value` '''
self._db._cursor.execute(sql2)
record3 = self._db._cursor.fetchall()
self.assertEqual(3,self.totalRows + 1) #ensure 3 added (1 of which refreshed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this is checking what you want... totalRows doesnt get updated and is still equal to 2, which you set at the very top of this test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here was what I was trying to test: Initially, there are 2 rows and I update one of them and added a new one, so I wanted to make sure that there were only 3 rows in the end. If this is confusing, I am trying to think of better ways to test it.
(1 old & not updated)
(1 old but updated)
(1 added newly)

self.assertEqual(20200416,max(list(record3))[0]) #max of the outputs is 20200416 , extracting from tuple

#check older issue not inside latest, empty field
sql = '''SELECT * FROM `signal_latest` where `time_value` = 20200414 and `issue` = 20200415 '''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean issue=14? theres never an issue=15 with time_value=14

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that! Removed issue = 15 totally too from the dummy data

#setting baseline variables
self._db._cursor.execute('''SELECT * FROM `geo_dim`''')
record = self._db._cursor.fetchall()
self.geoDimRows = len(list(record))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like before, these variables dont need to be attached to the self object

oldSrcSig = [
CovidcastRow('src', 'sig', 'day', 'state', 20211111, 'pa', #new src, new sig
99, 99, 99, nmv, nmv, nmv, 20211111, 1),
CovidcastRow('src', 'sig', 'day', 'county', 20211111, 'ca', #new src, new sig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use a geo_value that doesnt look like a state abbreviation when you have a geo_type of "county" (just for reader clarity, it doesnt actually matter for the test)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! What are some sample geo_value normally used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any length-5 string of numbers. "11111" should suffice

Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, just a few little things:

  • there are still some unused imports (including prettytable, so you can also remove that from requirements.txt)
  • you should revert the changes in integrations/acquisition/covidcast/test_covidcast_meta_caching.py
  • check your comments on the CovidcastRow() lines, i think some need to be updated
  • some spacing nits (usually want an empty line right before a class or method definition line, usually dont want an empty line right after; probably dont want double empty lines unless separating important or otherwise large sections of code)

self._db.insert_or_update_bulk(rows)
self._db.run_dbjobs()
#preview
self._db._cursor.execute('''SELECT * FROM `signal_history`''')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.viewSignalHistory?

#dynamic check for signal_history
self._db._cursor.execute('''SELECT `issue` FROM `signal_history`''')
record3 = self._db._cursor.fetchall()
self.assertEqual(2,totalRows + 1) #ensure 3 added (1 of which refreshed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make more sense to check len(record3) here?

rows = [
CovidcastRow('src', 'sig', 'day', 'state', 20200414, 'pa', #
2, 2, 2, nmv, nmv, nmv, 20200414, 0),
CovidcastRow('src', 'sig', 'day', 'county', 20200414, '11111', # updating previous entry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats not an update!

def test_src_sig(self):
#BASE CASES
rows = [
CovidcastRow('src', 'sig', 'day', 'state', 20200414, 'pa', #
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CovidcastRow('src', 'sig', 'day', 'state', 20200414, 'pa', #
CovidcastRow('src', 'sig', 'day', 'state', 20200414, 'pa',

Comment on lines 67 to 70
self.viewSignalLatest = '''SELECT * FROM `signal_latest`'''
self.viewSignalHistory = '''SELECT * FROM `signal_history`'''
self.viewSignalDim = '''SELECT `source`, `signal` FROM `signal_dim`'''
self.viewGeoDim = '''SELECT `geo_type`,`geo_value` FROM `geo_dim`'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For easier maintainability, use e.g. Database.history_table instead of "signal_history" in these

Base automatically changed from v4-schema-revisions-release-prep-prep to v4-schema-revisions-release-prep June 15, 2022 20:14
"""Integration tests for covidcast's is_latest_issue boolean."""
# standard library
import unittest
import time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import time

# standard library
import unittest
import time
import threading
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import threading



# third party
from aiohttp.client_exceptions import ClientResponseError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from aiohttp.client_exceptions import ClientResponseError

import mysql.connector
import pytest
# first party
from delphi.epidata.acquisition.covidcast.logger import get_structured_logger
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from delphi.epidata.acquisition.covidcast.logger import get_structured_logger

from delphi_utils import Nans
from delphi.epidata.client.delphi_epidata import Epidata
from delphi.epidata.acquisition.covidcast.database import Database, CovidcastRow
from delphi.epidata.acquisition.covidcast.covidcast_meta_cache_updater import main as update_covidcast_meta_cache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from delphi.epidata.acquisition.covidcast.covidcast_meta_cache_updater import main as update_covidcast_meta_cache

# third party
from aiohttp.client_exceptions import ClientResponseError
import mysql.connector
import pytest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import pytest

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes for maintainability and a question

#dynamic check for signal_history's list of issue
self._db._cursor.execute(f'SELECT `issue` FROM {Database.history_table}')
record3 = self._db._cursor.fetchall()
self.assertEqual(len(record3),totalRows + 1) #ensure 3 added (1 of which refreshed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If len(record3) is equal to totalRows + 1 but totalRows + 1 is not equal to 3, this will pass. Is that intended?

#ensure new entries are added in latest
self._db._cursor.execute(self.viewSignalLatest)
record = self._db._cursor.fetchall()
self.assertEqual(len(list(record)), sigLatestRows + 2) #2 original, 2 added
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pattern like this will be easier to maintain -- we'll be able to add new tests without having to update all subsequent expressions.

Suggested change
self.assertEqual(len(list(record)), sigLatestRows + 2) #2 original, 2 added
record_length = len(list(record))
self.assertEqual(record_length, sigLatestRows + 2) #2 original, 2 added
sigLatestRows = record_length

self._db._cursor.execute(self.viewSignalDim)
record = self._db._cursor.fetchall()
res = [('src', 'sig'), ('new_src', 'sig'), ('src', 'new_sig')]
self.assertEqual(res , (record))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update sigDimRows here


self._db._cursor.execute(self.viewSignalLatest)
record = self._db._cursor.fetchall()
self.assertEqual(len(list(record)),sigLatestRows + 6) #total entries = 2(initial) + 6(test)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update sigLatestRows here


self._db._cursor.execute(f'SELECT `geo_type`,`geo_value` FROM `geo_dim`')
record = self._db._cursor.fetchall()
self.assertEqual(len(list(record)),geoDimRows + 3) #2 + 3 new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update geoDimRows here


self._db._cursor.execute(self.viewSignalLatest)
record = self._db._cursor.fetchall()
self.assertEqual(len(list(record)),sigLatestRows + 6 + 3) #total entries = 2(initial) + 6(test)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update sigLatestRows here

self._db.run_dbjobs()
self._db._cursor.execute(f'SELECT `issue` FROM {Database.latest_table} ')
record = self._db._cursor.fetchall()
self.assertEqual(record[0][0], 20200417) #20200416 != 20200417
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(record[0][0], 20200417) #20200416 != 20200417
# Make sure the 4/17 issue is listed even though 4/16 was imported after it
self.assertEqual(record[0][0], 20200417)

@xavier-xia-99 xavier-xia-99 force-pushed the xavier/integration/tests branch from 8a39f36 to 18a2758 Compare July 26, 2022 15:22
Comment on lines 128 to 137
<<<<<<< HEAD
res = [('src', 'sig'), ('new_src', 'sig'), ('src', 'new_sig')]
self.assertEqual(res , (record))
=======
self.sigDimRows = len(list(record))

res = set([('new_src', 'sig'), ('src', 'new_sig'), ('src', 'sig')])
self.assertEqual(res , set(record))
self.assertEqual(3, self.sigDimRows)
>>>>>>> 8a39f36b (used set to remove ordering of elements in return list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the conflict edit didn't stick -- are you okay to fix?

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a thorough pass through to:

  • change object members to local variables (except where they reduce duplication, like in self.viewSignalLatest)
  • either strip out redundant line comments or add context that explains the reasoning behind the operation.

self._db.run_dbjobs()
self._db._cursor.execute(self.viewGeoDim)
record = self._db._cursor.fetchall()
geoDimRows = len(list(record))
self.geoDimRows = len(list(record))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does geoDimRows need to be an object member, or can it be a local variable instead? Remember, test methods probbly shouldnt alter the TestCase object without a good reason


#sanity check for adding dummy data
sql = f'SELECT `issue` FROM {Database.latest_table} where `time_value` = 20200414'
self._db._cursor.execute(sql)
record = self._db._cursor.fetchall()
self.assertEqual(record[0][0], 20200414)
self.assertEqual(len(record), 1) #check 1 entry
self.assertEqual(len(record), 1) #check 1 entry present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line comments should add context for future readers, beyond just restating the code -- for example, this modification explains why we only expect 1 entry for this query:

Suggested change
self.assertEqual(len(record), 1) #check 1 entry present
self.assertEqual(len(record), 1) #placeholder data only has one issue for 20200414

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited them accordingly, let me know if they showed up!

Base automatically changed from v4-schema-revisions-release-prep to dev September 22, 2022 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants