Merge pull request #26 from dholab/proposals

adding a workflow that will splice a table into the readme
dholab · May 22, 2024 · e5bce7f · e5bce7f
2 parents 551fadf + 0edd2d8
commit e5bce7f
Show file tree

Hide file tree

Showing 7 changed files with 357 additions and 14 deletions.
diff --git a/.github/workflows/normalize.yml b/.github/workflows/normalize.yml
@@ -1,4 +1,4 @@
-# .github/workflows/normalize-data.yml
+# .github/workflows/normalize.yml
 
 name: Normalize Data
 

diff --git a/.github/workflows/render_readme.yml b/.github/workflows/render_readme.yml
@@ -0,0 +1,60 @@
+# .github/workflows/render_readme.yml
+
+name: Render new README based on tally updates
+
+on:
+    pull_request:
+        branches:
+            - main
+        paths:
+            - assets/positivity_tally.tsv
+            - .github/workflows/render_readme.yml
+            - scripts/splice_readme.py
+
+jobs:
+    render-readme:
+        name: Render new README
+        runs-on: ubuntu-latest
+
+        steps:
+            - name: Checkout Code
+              uses: actions/checkout@v3
+
+            - name: Set up Python
+              uses: actions/setup-python@v4
+              with:
+                  python-version: "3.11"
+
+            - name: Install Dependencies
+              run: |
+                  pip install uv
+                  uv venv
+                  source .venv/bin/activate
+                  uv pip install -r requirements.txt
+
+            - name: Generate Markdown Table
+              run: |
+                  source .venv/bin/activate
+                  python3 scripts/tsv_to_md.py \
+                  assets/positivity_tally.tsv \
+                  > assets/positivity_tally.md
+
+            - name: Splice Table into README
+              run: |
+                  source .venv/bin/activate
+                  python3 scripts/splice_readme.py
+
+            - name: Replace previous README
+              run: |
+                  rm README.md && \
+                  mv new_readme.md README.md
+
+            - name: Commit Updated README
+              if: success()
+              run: |
+                  git config --global user.name 'GitHub Actions Bot'
+                  git config --global user.email '[email protected]'
+                  git add README.md
+                  git fetch origin proposals
+                  git commit -m "Updated the README file"
+                  git push --force-with-lease origin HEAD:proposals
diff --git a/README.md b/README.md
@@ -4,6 +4,23 @@
 
 HPAI (high-pathogenic avian influenza) RNA has been detected in consumer dairy products in the United States. Many labs around the country have been using these products to monitor the extent of the ongoing H5N1 outbreak in dairy cattle. Due to the dual threat this outbreak poses to public health and industry, it is imperative that HPAI positivity data for dairy products is shared in both a transparent and responsible manner. The purpose of this repository is to gather and make available dairy product HPAI PCR and sequencing data to coordinate monitoring efforts while also setting a standard for sensitive metadata stewardship.
 
+## Positivity Tally by State
+
+Processing Plant State  |  Negative Cartons  |  Positive Cartons  |  Total Cartons  |  As of
+------------------------|--------------------|--------------------|-----------------|------------
+CO                      |  1                 |  2                 |  3              |  2024-05-02
+IA                      |  5                 |  0                 |  5              |  2024-04-24
+IN                      |  1                 |  0                 |  1              |  2024-05-02
+KY                      |  0                 |  1                 |  1              |  2024-05-02
+MI                      |  0                 |  1                 |  1              |  2024-05-02
+MN                      |  2                 |  0                 |  2              |  2024-04-24
+NY                      |  1                 |  0                 |  1              |  2024-05-02
+OH                      |  1                 |  0                 |  1              |  2024-05-02
+TX                      |  0                 |  1                 |  1              |  2024-05-02
+UT                      |  1                 |  0                 |  1              |  2024-05-02
+VA                      |  1                 |  0                 |  1              |  2024-05-02
+WI                      |  6                 |  0                 |  6              |  2024-04-26
+
 ## Sampling Dairy Products for HPAI RNA
 
 Dairy products can be easily obtained from grocery stores and/or other vendors. All dairy products registered on the [FDA Interstate Milk Shippers List](https://www.fda.gov/food/federalstate-food-programs/interstate-milk-shippers-list#rules) have an Interstate Milk Shippers (IMS) code that can be used to trace each unit back to the dairy plant where it was processed. IMS codes consist of a two letter state code and a four letter plant code separated by a hyphen. This code, used in tandem with the website [whereismymilkfrom.com](https://www.whereismymilkfrom.com), can help inform sampling strategy. While IMS codes identify the exact locations of specific dairy processing plants, it is mandatory that only state-level information is shared on this repository (see Metadata Stewardship).

diff --git a/assets/positivity_tally.tsv b/assets/positivity_tally.tsv
@@ -1,13 +1,13 @@
 Processing Plant State	Negative Cartons	Positive Cartons	Total Cartons	As of
-CO	1.0	2.0	3.0	2024-05-02
-IA	5.0	0.0	5.0	2024-04-24
-IN	1.0	0.0	1.0	2024-05-02
-KY	0.0	1.0	1.0	2024-05-02
-MI	0.0	1.0	1.0	2024-05-02
-MN	2.0	0.0	2.0	2024-04-24
-NY	1.0	0.0	1.0	2024-05-02
-OH	1.0	0.0	1.0	2024-05-02
-TX	0.0	1.0	1.0	2024-05-02
-UT	1.0	0.0	1.0	2024-05-02
-VA	1.0	0.0	1.0	2024-05-02
-WI	6.0	0.0	6.0	2024-04-26
+CO	1	2	3	2024-05-02
+IA	5	0	5	2024-04-24
+IN	1	0	1	2024-05-02
+KY	0	1	1	2024-05-02
+MI	0	1	1	2024-05-02
+MN	2	0	2	2024-04-24
+NY	1	0	1	2024-05-02
+OH	1	0	1	2024-05-02
+TX	0	1	1	2024-05-02
+UT	1	0	1	2024-05-02
+VA	1	0	1	2024-05-02
+WI	6	0	6	2024-04-26
diff --git a/scripts/positivity_tally.py b/scripts/positivity_tally.py
@@ -89,7 +89,7 @@ def main() -> None:
     sorted_results_df = results_df.sort_values(by="Processing Plant State")
 
     ## Save sorted_results_df as a tsv file
-    sorted_results_df.to_csv(output_path, sep="\t", index=False)
+    sorted_results_df.to_csv(output_path, sep="\t", index=False, float_format="%.01g")
 
 
 if __name__ == "__main__":

diff --git a/scripts/splice_readme.py b/scripts/splice_readme.py
@@ -0,0 +1,93 @@
+#!/usr/bin/env python3
+
+"""
+usage: splice_readme.py [-h] [-r README] [-f TALLY_FILE]
+
+Space a table from one input markdown file into the README.
+
+options:
+  -h, --help            show this help message and exit
+  -r README, --readme README
+                        The readme to be updated.
+  -f TALLY_FILE, --tally_file TALLY_FILE
+                        The file to be spliced into the readme.
+"""
+
+import argparse
+from io import TextIOWrapper
+from pathlib import Path
+from typing import List
+
+
+def parse_command_line_args() -> argparse.Namespace:
+    """
+    Parse a couple named arguments from the command line
+    """
+    parser = argparse.ArgumentParser(
+        description="Space a table from one input markdown file into the README."
+    )
+    parser.add_argument(
+        "-r",
+        "--readme",
+        type=Path,
+        default=Path("README.md"),
+        required=False,
+        help="The readme to be updated.",
+    )
+    parser.add_argument(
+        "-f",
+        "--tally_file",
+        type=Path,
+        default=Path("assets/positivity_tally.md"),
+        required=False,
+        help="The file to be spliced into the readme.",
+    )
+
+    return parser.parse_args()
+
+
+def splice_readme_lines(
+    readme_lines: List[str], tally_lines: List[str], new_readme: TextIOWrapper
+):
+    """
+    Test a few conditions on each line to make sure the tally table is properly
+    spliced into the readme.
+    """
+    ignore = False
+    for line in readme_lines:
+        if line.startswith("## Positivity Tally by State"):
+            new_readme.write("## Positivity Tally by State\n\n")
+            for tally_line in tally_lines:
+                new_readme.write(f"{tally_line}")
+            ignore = True
+
+        if line.startswith("## Sampling Dairy Products for HPAI RNA"):
+            new_readme.write("\n")
+            ignore = False
+
+        if not ignore:
+            new_readme.write(f"{line}")
+            continue
+
+
+def main() -> None:
+    """
+    Script entrypoint
+    """
+
+    # parse out command line args
+    args = parse_command_line_args()
+
+    # open the input readme and tally md file and collect the lines from each
+    with open(args.readme, "r", encoding="utf8") as readme_handle:
+        readme_lines = [line for line in readme_handle.readlines()]
+    with open(args.tally_file, "r", encoding="utf8") as tally_handle:
+        tally_lines = [line for line in tally_handle.readlines()]
+
+    # open the new readme and handle splicing the table into the new readme
+    with open("new_readme.md", "w", encoding="utf8") as new_readme:
+        splice_readme_lines(readme_lines, tally_lines, new_readme)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/tsv_to_md.py b/scripts/tsv_to_md.py
@@ -0,0 +1,173 @@
+#!/usr/bin/env python3
+
+"""
+Based on
+
+Convert your TSV files into Markdown tables.
+
+More info: http://github.com/mplewis/csvtomd
+"""
+
+import argparse
+import csv
+import sys
+
+DEFAULT_PADDING = 2
+
+
+def check_negative(value):
+    try:
+        ivalue = int(value)
+    except ValueError:
+        raise argparse.ArgumentTypeError('"%s" must be an integer' % value)
+    if ivalue < 0:
+        raise argparse.ArgumentTypeError('"%s" must not be a negative value' % value)
+    return ivalue
+
+
+def pad_to(unpadded, target_len):
+    """
+    Pad a string to the target length in characters, or return the original
+    string if it's longer than the target length.
+    """
+    under = target_len - len(unpadded)
+    if under <= 0:
+        return unpadded
+    return unpadded + (" " * under)
+
+
+def normalize_cols(table):
+    """
+    Pad short rows to the length of the longest row to help render "jagged"
+    CSV files
+    """
+    longest_row_len = max([len(row) for row in table])
+    for row in table:
+        while len(row) < longest_row_len:
+            row.append("")
+    return table
+
+
+def pad_cells(table):
+    """Pad each cell to the size of the largest cell in its column."""
+    col_sizes = [max(map(len, col)) for col in zip(*table)]
+    for row in table:
+        for cell_num, cell in enumerate(row):
+            row[cell_num] = pad_to(cell, col_sizes[cell_num])
+    return table
+
+
+def horiz_div(col_widths, horiz, vert, padding):
+    """
+    Create the column dividers for a table with given column widths.
+
+    col_widths: list of column widths
+    horiz: the character to use for a horizontal divider
+    vert: the character to use for a vertical divider
+    padding: amount of padding to add to each side of a column
+    """
+    horizs = [horiz * w for w in col_widths]
+    div = "".join([padding * horiz, vert, padding * horiz])
+    return div.join(horizs)
+
+
+def add_dividers(row, divider, padding):
+    """Add dividers and padding to a row of cells and return a string."""
+    div = "".join([padding * " ", divider, padding * " "])
+    return div.join(row)
+
+
+def md_table(table, *, padding=DEFAULT_PADDING, divider="|", header_div="-"):
+    """
+    Convert a 2D array of items into a Markdown table.
+
+    padding: the number of padding spaces on either side of each divider
+    divider: the vertical divider to place between columns
+    header_div: the horizontal divider to place between the header row and
+        body cells
+    """
+    table = normalize_cols(table)
+    table = pad_cells(table)
+    header = table[0]
+    body = table[1:]
+
+    col_widths = [len(cell) for cell in header]
+    horiz = horiz_div(col_widths, header_div, divider, padding)
+
+    header = add_dividers(header, divider, padding)
+    body = [add_dividers(row, divider, padding) for row in body]
+
+    table = [header, horiz]
+    table.extend(body)
+    table = [row.rstrip() for row in table]
+    return "\n".join(table)
+
+
+def csv_to_table(file, delimiter):
+    return list(csv.reader(file, delimiter=delimiter))
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Read one or more CSV files and output their contents in "
+        "the form of Markdown tables."
+    )
+    parser.add_argument(
+        "files",
+        metavar="csv_file",
+        type=str,
+        nargs="*",
+        default=["-"],
+        help="One or more CSV files to be converted. " "Use - for stdin.",
+    )
+    parser.add_argument(
+        "-n",
+        "--no-filenames",
+        action="store_false",
+        dest="show_filenames",
+        help="Don't display filenames when outputting " "multiple Markdown tables.",
+    )
+    parser.add_argument(
+        "-p",
+        "--padding",
+        type=check_negative,
+        default=DEFAULT_PADDING,
+        help="The number of spaces to add between table cells "
+        "and column dividers. Default is 2 spaces.",
+    )
+    parser.add_argument(
+        "-d",
+        "--delimiter",
+        default="\t",
+        help="The delimiter to use when parsing CSV data. " 'Default is "%(default)s"',
+    )
+
+    args = parser.parse_args()
+    first = True
+
+    if "-" in args.files and len(args.files) > 1:
+        print("Standard input can only be used alone.", file=sys.stderr)
+        exit(1)
+    for file_num, filename in enumerate(args.files):
+        # Print space between consecutive tables
+        if not first:
+            print("")
+        else:
+            first = False
+        # Read the CSV files
+        if filename == "-":
+            table = csv_to_table(sys.stdin, args.delimiter)
+        else:
+            with open(filename, "r") as f:
+                table = csv_to_table(f, args.delimiter)
+        # Print filename for each table if --no-filenames wasn't passed and
+        # more than one CSV was provided
+        file_count = len(args.files)
+        if args.show_filenames and file_count > 1:
+            print(filename + "\n")
+        # Generate and print Markdown table
+        print(md_table(table, padding=args.padding))
+
+
+if __name__ == "__main__":
+    main()