Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tdb/elife_upload: Update assay_date parsing #146

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Commits on Nov 15, 2023

  1. tdb/elife_upload: Refactor assay_date parsing

    Pull out assay_date parsing from filename into a separate method
    `parse_assay_date_from_filename`.
    
    This is done in preparation to update the assay date parsing for VIDRL
    flat files.
    joverlee521 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    23a3248 View commit details
    Browse the repository at this point in the history
  2. tdb/elife_upload: Only parse assay_date once

    The assay date is the same for all measurement records from the same
    fstem, so only parse the assay date once.
    joverlee521 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    37f9f3b View commit details
    Browse the repository at this point in the history
  3. tdb/elife_upload: Update parse_assay_date_from_filename

    Use regex to find all matches for the expected date format 'YYYYMMDD'.
    Then use the datetime module to validate the date string and check that
    the date is earlier than the date we are parsing the file.
    
    I made the decision to use the latest date if there are multiple matches
    with the expectation that we would be parsing files not long after the
    assay date.
    joverlee521 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    124508a View commit details
    Browse the repository at this point in the history
  4. tdb/elife_upload: Only use fstem assay date if record does not have one

    Only fill in with the assay date parsed from the fstem when the
    measurment does not have an assay date. This is done in preparation for
    parsing the flat VIDRL files which will include assay date as a column.¹
    
    This ensures that the fstem date is only a backup to the date that is
    set within `format_date`.²
    
    ¹ https://bedfordlab.slack.com/archives/C03KWDET9/p1699914235686809
    ² https://github.com/nextstrain/fauna/blob/8088646ce0ba438310cdc9f919080950d0767c46/tdb/upload.py#L328
    joverlee521 committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    e510854 View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2023

  1. tdb/elife_upload: Raise Exception if filename contains multiple dates

    It's unclear if we'll ever run into the case where there are multiple
    assay dates in the filename, but if we do, raise an exception to alert
    the user to manually fix the filename before upload.
    joverlee521 committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    c3cbfed View commit details
    Browse the repository at this point in the history