Skip to content

Commit

Permalink
Merge pull request #26 from dholab/proposals
Browse files Browse the repository at this point in the history
adding a workflow that will splice a table into the readme
  • Loading branch information
ajlail98 authored May 22, 2024
2 parents 551fadf + 0edd2d8 commit e5bce7f
Show file tree
Hide file tree
Showing 7 changed files with 357 additions and 14 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/normalize.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# .github/workflows/normalize-data.yml
# .github/workflows/normalize.yml

name: Normalize Data

Expand Down
60 changes: 60 additions & 0 deletions .github/workflows/render_readme.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# .github/workflows/render_readme.yml

name: Render new README based on tally updates

on:
pull_request:
branches:
- main
paths:
- assets/positivity_tally.tsv
- .github/workflows/render_readme.yml
- scripts/splice_readme.py

jobs:
render-readme:
name: Render new README
runs-on: ubuntu-latest

steps:
- name: Checkout Code
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"

- name: Install Dependencies
run: |
pip install uv
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
- name: Generate Markdown Table
run: |
source .venv/bin/activate
python3 scripts/tsv_to_md.py \
assets/positivity_tally.tsv \
> assets/positivity_tally.md
- name: Splice Table into README
run: |
source .venv/bin/activate
python3 scripts/splice_readme.py
- name: Replace previous README
run: |
rm README.md && \
mv new_readme.md README.md
- name: Commit Updated README
if: success()
run: |
git config --global user.name 'GitHub Actions Bot'
git config --global user.email '[email protected]'
git add README.md
git fetch origin proposals
git commit -m "Updated the README file"
git push --force-with-lease origin HEAD:proposals
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,23 @@

HPAI (high-pathogenic avian influenza) RNA has been detected in consumer dairy products in the United States. Many labs around the country have been using these products to monitor the extent of the ongoing H5N1 outbreak in dairy cattle. Due to the dual threat this outbreak poses to public health and industry, it is imperative that HPAI positivity data for dairy products is shared in both a transparent and responsible manner. The purpose of this repository is to gather and make available dairy product HPAI PCR and sequencing data to coordinate monitoring efforts while also setting a standard for sensitive metadata stewardship.

## Positivity Tally by State

Processing Plant State | Negative Cartons | Positive Cartons | Total Cartons | As of
------------------------|--------------------|--------------------|-----------------|------------
CO | 1 | 2 | 3 | 2024-05-02
IA | 5 | 0 | 5 | 2024-04-24
IN | 1 | 0 | 1 | 2024-05-02
KY | 0 | 1 | 1 | 2024-05-02
MI | 0 | 1 | 1 | 2024-05-02
MN | 2 | 0 | 2 | 2024-04-24
NY | 1 | 0 | 1 | 2024-05-02
OH | 1 | 0 | 1 | 2024-05-02
TX | 0 | 1 | 1 | 2024-05-02
UT | 1 | 0 | 1 | 2024-05-02
VA | 1 | 0 | 1 | 2024-05-02
WI | 6 | 0 | 6 | 2024-04-26

## Sampling Dairy Products for HPAI RNA

Dairy products can be easily obtained from grocery stores and/or other vendors. All dairy products registered on the [FDA Interstate Milk Shippers List](https://www.fda.gov/food/federalstate-food-programs/interstate-milk-shippers-list#rules) have an Interstate Milk Shippers (IMS) code that can be used to trace each unit back to the dairy plant where it was processed. IMS codes consist of a two letter state code and a four letter plant code separated by a hyphen. This code, used in tandem with the website [whereismymilkfrom.com](https://www.whereismymilkfrom.com), can help inform sampling strategy. While IMS codes identify the exact locations of specific dairy processing plants, it is mandatory that only state-level information is shared on this repository (see Metadata Stewardship).
Expand Down
24 changes: 12 additions & 12 deletions assets/positivity_tally.tsv
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
Processing Plant State Negative Cartons Positive Cartons Total Cartons As of
CO 1.0 2.0 3.0 2024-05-02
IA 5.0 0.0 5.0 2024-04-24
IN 1.0 0.0 1.0 2024-05-02
KY 0.0 1.0 1.0 2024-05-02
MI 0.0 1.0 1.0 2024-05-02
MN 2.0 0.0 2.0 2024-04-24
NY 1.0 0.0 1.0 2024-05-02
OH 1.0 0.0 1.0 2024-05-02
TX 0.0 1.0 1.0 2024-05-02
UT 1.0 0.0 1.0 2024-05-02
VA 1.0 0.0 1.0 2024-05-02
WI 6.0 0.0 6.0 2024-04-26
CO 1 2 3 2024-05-02
IA 5 0 5 2024-04-24
IN 1 0 1 2024-05-02
KY 0 1 1 2024-05-02
MI 0 1 1 2024-05-02
MN 2 0 2 2024-04-24
NY 1 0 1 2024-05-02
OH 1 0 1 2024-05-02
TX 0 1 1 2024-05-02
UT 1 0 1 2024-05-02
VA 1 0 1 2024-05-02
WI 6 0 6 2024-04-26
2 changes: 1 addition & 1 deletion scripts/positivity_tally.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def main() -> None:
sorted_results_df = results_df.sort_values(by="Processing Plant State")

## Save sorted_results_df as a tsv file
sorted_results_df.to_csv(output_path, sep="\t", index=False)
sorted_results_df.to_csv(output_path, sep="\t", index=False, float_format="%.01g")


if __name__ == "__main__":
Expand Down
93 changes: 93 additions & 0 deletions scripts/splice_readme.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#!/usr/bin/env python3

"""
usage: splice_readme.py [-h] [-r README] [-f TALLY_FILE]
Space a table from one input markdown file into the README.
options:
-h, --help show this help message and exit
-r README, --readme README
The readme to be updated.
-f TALLY_FILE, --tally_file TALLY_FILE
The file to be spliced into the readme.
"""

import argparse
from io import TextIOWrapper
from pathlib import Path
from typing import List


def parse_command_line_args() -> argparse.Namespace:
"""
Parse a couple named arguments from the command line
"""
parser = argparse.ArgumentParser(
description="Space a table from one input markdown file into the README."
)
parser.add_argument(
"-r",
"--readme",
type=Path,
default=Path("README.md"),
required=False,
help="The readme to be updated.",
)
parser.add_argument(
"-f",
"--tally_file",
type=Path,
default=Path("assets/positivity_tally.md"),
required=False,
help="The file to be spliced into the readme.",
)

return parser.parse_args()


def splice_readme_lines(
readme_lines: List[str], tally_lines: List[str], new_readme: TextIOWrapper
):
"""
Test a few conditions on each line to make sure the tally table is properly
spliced into the readme.
"""
ignore = False
for line in readme_lines:
if line.startswith("## Positivity Tally by State"):
new_readme.write("## Positivity Tally by State\n\n")
for tally_line in tally_lines:
new_readme.write(f"{tally_line}")
ignore = True

if line.startswith("## Sampling Dairy Products for HPAI RNA"):
new_readme.write("\n")
ignore = False

if not ignore:
new_readme.write(f"{line}")
continue


def main() -> None:
"""
Script entrypoint
"""

# parse out command line args
args = parse_command_line_args()

# open the input readme and tally md file and collect the lines from each
with open(args.readme, "r", encoding="utf8") as readme_handle:
readme_lines = [line for line in readme_handle.readlines()]
with open(args.tally_file, "r", encoding="utf8") as tally_handle:
tally_lines = [line for line in tally_handle.readlines()]

# open the new readme and handle splicing the table into the new readme
with open("new_readme.md", "w", encoding="utf8") as new_readme:
splice_readme_lines(readme_lines, tally_lines, new_readme)


if __name__ == "__main__":
main()
173 changes: 173 additions & 0 deletions scripts/tsv_to_md.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
#!/usr/bin/env python3

"""
Based on
Convert your TSV files into Markdown tables.
More info: http://github.com/mplewis/csvtomd
"""

import argparse
import csv
import sys

DEFAULT_PADDING = 2


def check_negative(value):
try:
ivalue = int(value)
except ValueError:
raise argparse.ArgumentTypeError('"%s" must be an integer' % value)
if ivalue < 0:
raise argparse.ArgumentTypeError('"%s" must not be a negative value' % value)
return ivalue


def pad_to(unpadded, target_len):
"""
Pad a string to the target length in characters, or return the original
string if it's longer than the target length.
"""
under = target_len - len(unpadded)
if under <= 0:
return unpadded
return unpadded + (" " * under)


def normalize_cols(table):
"""
Pad short rows to the length of the longest row to help render "jagged"
CSV files
"""
longest_row_len = max([len(row) for row in table])
for row in table:
while len(row) < longest_row_len:
row.append("")
return table


def pad_cells(table):
"""Pad each cell to the size of the largest cell in its column."""
col_sizes = [max(map(len, col)) for col in zip(*table)]
for row in table:
for cell_num, cell in enumerate(row):
row[cell_num] = pad_to(cell, col_sizes[cell_num])
return table


def horiz_div(col_widths, horiz, vert, padding):
"""
Create the column dividers for a table with given column widths.
col_widths: list of column widths
horiz: the character to use for a horizontal divider
vert: the character to use for a vertical divider
padding: amount of padding to add to each side of a column
"""
horizs = [horiz * w for w in col_widths]
div = "".join([padding * horiz, vert, padding * horiz])
return div.join(horizs)


def add_dividers(row, divider, padding):
"""Add dividers and padding to a row of cells and return a string."""
div = "".join([padding * " ", divider, padding * " "])
return div.join(row)


def md_table(table, *, padding=DEFAULT_PADDING, divider="|", header_div="-"):
"""
Convert a 2D array of items into a Markdown table.
padding: the number of padding spaces on either side of each divider
divider: the vertical divider to place between columns
header_div: the horizontal divider to place between the header row and
body cells
"""
table = normalize_cols(table)
table = pad_cells(table)
header = table[0]
body = table[1:]

col_widths = [len(cell) for cell in header]
horiz = horiz_div(col_widths, header_div, divider, padding)

header = add_dividers(header, divider, padding)
body = [add_dividers(row, divider, padding) for row in body]

table = [header, horiz]
table.extend(body)
table = [row.rstrip() for row in table]
return "\n".join(table)


def csv_to_table(file, delimiter):
return list(csv.reader(file, delimiter=delimiter))


def main():
parser = argparse.ArgumentParser(
description="Read one or more CSV files and output their contents in "
"the form of Markdown tables."
)
parser.add_argument(
"files",
metavar="csv_file",
type=str,
nargs="*",
default=["-"],
help="One or more CSV files to be converted. " "Use - for stdin.",
)
parser.add_argument(
"-n",
"--no-filenames",
action="store_false",
dest="show_filenames",
help="Don't display filenames when outputting " "multiple Markdown tables.",
)
parser.add_argument(
"-p",
"--padding",
type=check_negative,
default=DEFAULT_PADDING,
help="The number of spaces to add between table cells "
"and column dividers. Default is 2 spaces.",
)
parser.add_argument(
"-d",
"--delimiter",
default="\t",
help="The delimiter to use when parsing CSV data. " 'Default is "%(default)s"',
)

args = parser.parse_args()
first = True

if "-" in args.files and len(args.files) > 1:
print("Standard input can only be used alone.", file=sys.stderr)
exit(1)
for file_num, filename in enumerate(args.files):
# Print space between consecutive tables
if not first:
print("")
else:
first = False
# Read the CSV files
if filename == "-":
table = csv_to_table(sys.stdin, args.delimiter)
else:
with open(filename, "r") as f:
table = csv_to_table(f, args.delimiter)
# Print filename for each table if --no-filenames wasn't passed and
# more than one CSV was provided
file_count = len(args.files)
if args.show_filenames and file_count > 1:
print(filename + "\n")
# Generate and print Markdown table
print(md_table(table, padding=args.padding))


if __name__ == "__main__":
main()

0 comments on commit e5bce7f

Please sign in to comment.