-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration Test Case for is_latest_issue using DB #925
base: dev
Are you sure you want to change the base?
Conversation
added edge case added prettytable for better vizuals all tests passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should revert test_covidcast_meta_caching.py
(its changes are minuscule and not part of this issue) and delete dev/local/output.txt
there are a lot of repeated strings in the SQL statements. they can be saved to variables for re-use -- which can also mean less reading for anyone doing maintenance on this.
i think i would remove the print()
statements and prettytable
and maxDiff=None
stuff, unless you think theyre particularly useful. or maybe just reduce them to one call to your view_table()
helper fn in the test_*()
methods?
clean up these and the other comments, and we can take another pass at it!
sql2 = '''SELECT `issue` FROM `signal_history` where `time_value` ''' | ||
self._db._cursor.execute(sql2) | ||
record3 = self._db._cursor.fetchall() | ||
self.assertEqual(3,self.totalRows + 1) #ensure 3 added (1 of which refreshed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think this is checking what you want... totalRows
doesnt get updated and is still equal to 2, which you set at the very top of this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here was what I was trying to test: Initially, there are 2 rows and I update one of them and added a new one, so I wanted to make sure that there were only 3 rows in the end. If this is confusing, I am trying to think of better ways to test it.
(1 old & not updated)
(1 old but updated)
(1 added newly)
self.assertEqual(20200416,max(list(record3))[0]) #max of the outputs is 20200416 , extracting from tuple | ||
|
||
#check older issue not inside latest, empty field | ||
sql = '''SELECT * FROM `signal_latest` where `time_value` = 20200414 and `issue` = 20200415 ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean issue=14
? theres never an issue=15
with time_value=14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for that! Removed issue = 15 totally too from the dummy data
#setting baseline variables | ||
self._db._cursor.execute('''SELECT * FROM `geo_dim`''') | ||
record = self._db._cursor.fetchall() | ||
self.geoDimRows = len(list(record)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like before, these variables dont need to be attached to the self
object
oldSrcSig = [ | ||
CovidcastRow('src', 'sig', 'day', 'state', 20211111, 'pa', #new src, new sig | ||
99, 99, 99, nmv, nmv, nmv, 20211111, 1), | ||
CovidcastRow('src', 'sig', 'day', 'county', 20211111, 'ca', #new src, new sig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use a geo_value
that doesnt look like a state abbreviation when you have a geo_type
of "county" (just for reader clarity, it doesnt actually matter for the test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! What are some sample geo_value
normally used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any length-5 string of numbers. "11111" should suffice
Co-authored-by: melange396 <[email protected]>
removed additional BASE_URL Co-authored-by: melange396 <[email protected]>
remove unnecessary package Co-authored-by: melange396 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, just a few little things:
- there are still some unused imports (including
prettytable
, so you can also remove that fromrequirements.txt
) - you should revert the changes in
integrations/acquisition/covidcast/test_covidcast_meta_caching.py
- check your comments on the CovidcastRow() lines, i think some need to be updated
- some spacing nits (usually want an empty line right before a class or method definition line, usually dont want an empty line right after; probably dont want double empty lines unless separating important or otherwise large sections of code)
self._db.insert_or_update_bulk(rows) | ||
self._db.run_dbjobs() | ||
#preview | ||
self._db._cursor.execute('''SELECT * FROM `signal_history`''') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use self.viewSignalHistory
?
#dynamic check for signal_history | ||
self._db._cursor.execute('''SELECT `issue` FROM `signal_history`''') | ||
record3 = self._db._cursor.fetchall() | ||
self.assertEqual(2,totalRows + 1) #ensure 3 added (1 of which refreshed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make more sense to check len(record3)
here?
rows = [ | ||
CovidcastRow('src', 'sig', 'day', 'state', 20200414, 'pa', # | ||
2, 2, 2, nmv, nmv, nmv, 20200414, 0), | ||
CovidcastRow('src', 'sig', 'day', 'county', 20200414, '11111', # updating previous entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats not an update!
def test_src_sig(self): | ||
#BASE CASES | ||
rows = [ | ||
CovidcastRow('src', 'sig', 'day', 'state', 20200414, 'pa', # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CovidcastRow('src', 'sig', 'day', 'state', 20200414, 'pa', # | |
CovidcastRow('src', 'sig', 'day', 'state', 20200414, 'pa', |
self.viewSignalLatest = '''SELECT * FROM `signal_latest`''' | ||
self.viewSignalHistory = '''SELECT * FROM `signal_history`''' | ||
self.viewSignalDim = '''SELECT `source`, `signal` FROM `signal_dim`''' | ||
self.viewGeoDim = '''SELECT `geo_type`,`geo_value` FROM `geo_dim`''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For easier maintainability, use e.g. Database.history_table
instead of "signal_history" in these
"""Integration tests for covidcast's is_latest_issue boolean.""" | ||
# standard library | ||
import unittest | ||
import time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import time |
# standard library | ||
import unittest | ||
import time | ||
import threading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import threading |
|
||
|
||
# third party | ||
from aiohttp.client_exceptions import ClientResponseError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from aiohttp.client_exceptions import ClientResponseError |
import mysql.connector | ||
import pytest | ||
# first party | ||
from delphi.epidata.acquisition.covidcast.logger import get_structured_logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from delphi.epidata.acquisition.covidcast.logger import get_structured_logger |
from delphi_utils import Nans | ||
from delphi.epidata.client.delphi_epidata import Epidata | ||
from delphi.epidata.acquisition.covidcast.database import Database, CovidcastRow | ||
from delphi.epidata.acquisition.covidcast.covidcast_meta_cache_updater import main as update_covidcast_meta_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from delphi.epidata.acquisition.covidcast.covidcast_meta_cache_updater import main as update_covidcast_meta_cache |
# third party | ||
from aiohttp.client_exceptions import ClientResponseError | ||
import mysql.connector | ||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes for maintainability and a question
#dynamic check for signal_history's list of issue | ||
self._db._cursor.execute(f'SELECT `issue` FROM {Database.history_table}') | ||
record3 = self._db._cursor.fetchall() | ||
self.assertEqual(len(record3),totalRows + 1) #ensure 3 added (1 of which refreshed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If len(record3)
is equal to totalRows + 1
but totalRows + 1
is not equal to 3, this will pass. Is that intended?
#ensure new entries are added in latest | ||
self._db._cursor.execute(self.viewSignalLatest) | ||
record = self._db._cursor.fetchall() | ||
self.assertEqual(len(list(record)), sigLatestRows + 2) #2 original, 2 added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pattern like this will be easier to maintain -- we'll be able to add new tests without having to update all subsequent expressions.
self.assertEqual(len(list(record)), sigLatestRows + 2) #2 original, 2 added | |
record_length = len(list(record)) | |
self.assertEqual(record_length, sigLatestRows + 2) #2 original, 2 added | |
sigLatestRows = record_length |
self._db._cursor.execute(self.viewSignalDim) | ||
record = self._db._cursor.fetchall() | ||
res = [('src', 'sig'), ('new_src', 'sig'), ('src', 'new_sig')] | ||
self.assertEqual(res , (record)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update sigDimRows here
|
||
self._db._cursor.execute(self.viewSignalLatest) | ||
record = self._db._cursor.fetchall() | ||
self.assertEqual(len(list(record)),sigLatestRows + 6) #total entries = 2(initial) + 6(test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update sigLatestRows here
|
||
self._db._cursor.execute(f'SELECT `geo_type`,`geo_value` FROM `geo_dim`') | ||
record = self._db._cursor.fetchall() | ||
self.assertEqual(len(list(record)),geoDimRows + 3) #2 + 3 new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update geoDimRows here
|
||
self._db._cursor.execute(self.viewSignalLatest) | ||
record = self._db._cursor.fetchall() | ||
self.assertEqual(len(list(record)),sigLatestRows + 6 + 3) #total entries = 2(initial) + 6(test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update sigLatestRows here
self._db.run_dbjobs() | ||
self._db._cursor.execute(f'SELECT `issue` FROM {Database.latest_table} ') | ||
record = self._db._cursor.fetchall() | ||
self.assertEqual(record[0][0], 20200417) #20200416 != 20200417 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertEqual(record[0][0], 20200417) #20200416 != 20200417 | |
# Make sure the 4/17 issue is listed even though 4/16 was imported after it | |
self.assertEqual(record[0][0], 20200417) |
8a39f36
to
18a2758
Compare
<<<<<<< HEAD | ||
res = [('src', 'sig'), ('new_src', 'sig'), ('src', 'new_sig')] | ||
self.assertEqual(res , (record)) | ||
======= | ||
self.sigDimRows = len(list(record)) | ||
|
||
res = set([('new_src', 'sig'), ('src', 'new_sig'), ('src', 'sig')]) | ||
self.assertEqual(res , set(record)) | ||
self.assertEqual(3, self.sigDimRows) | ||
>>>>>>> 8a39f36b (used set to remove ordering of elements in return list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the conflict edit didn't stick -- are you okay to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a thorough pass through to:
- change object members to local variables (except where they reduce duplication, like in
self.viewSignalLatest
) - either strip out redundant line comments or add context that explains the reasoning behind the operation.
self._db.run_dbjobs() | ||
self._db._cursor.execute(self.viewGeoDim) | ||
record = self._db._cursor.fetchall() | ||
geoDimRows = len(list(record)) | ||
self.geoDimRows = len(list(record)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does geoDimRows need to be an object member, or can it be a local variable instead? Remember, test methods probbly shouldnt alter the TestCase object without a good reason
|
||
#sanity check for adding dummy data | ||
sql = f'SELECT `issue` FROM {Database.latest_table} where `time_value` = 20200414' | ||
self._db._cursor.execute(sql) | ||
record = self._db._cursor.fetchall() | ||
self.assertEqual(record[0][0], 20200414) | ||
self.assertEqual(len(record), 1) #check 1 entry | ||
self.assertEqual(len(record), 1) #check 1 entry present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line comments should add context for future readers, beyond just restating the code -- for example, this modification explains why we only expect 1 entry for this query:
self.assertEqual(len(record), 1) #check 1 entry present | |
self.assertEqual(len(record), 1) #placeholder data only has one issue for 20200414 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited them accordingly, let me know if they showed up!
Summary
This is a fix for issue #896.
The earlier integration tests did not consist of a connection to the database, instead using a MagicMock() object. This did not emulate the actual behaviour of the system. This PR includes a test file
is_latest_issue.py
where the database was actively used to query to ensure that the is_latest_issue flag is accurately reflected in signal_latest and signal_history. It also checks for the addition of newsrc
,sig
andgeo_value
,geo_type
into respective *_dim tables.It also includes an edge case that was identified with regards to the db pipeline.
Fixes:
insert_or_update_bulk
is a thin wrapper aroundinsert_or_update_batch
without a purpose? #926