Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added more processing logic to find institution name, applying credits, and to export HU and BU invoices #12

Merged
merged 1 commit into from
Apr 11, 2024

Conversation

QuanMPhm
Copy link
Contributor

@QuanMPhm QuanMPhm commented Apr 3, 2024

Closes #6 This is a draft PR that implemented most of the functionalities @joachimweyl asked for. These include:

  1. add_institution() to get the institute name for all PIs in the invoice
  2. apply_credits_new_pi() to apply the New PI credit to all projects in the invoice
  3. export_HU_only() and export_HU_BU() to export HU and BU specific invoices

A feature worth considering is to export the CSVs to a storage service like Google Drive, will per @knikolla's suggestion, should be put in a separate issue.

Aside from the implementation of the processing functions to apply credits and institution, as well as the export functions, some structural changes are added:

  1. The invoice fields names have been placed in top-level constants to provide a single convenient place to edit these field names, should they change in the future
  2. The mapping between institution email domains and institution name is moved to the configuration file institute_map.json

@QuanMPhm QuanMPhm force-pushed the 6/more_processing branch 2 times, most recently from 9bc423c to f23dd4e Compare April 4, 2024 14:21
@QuanMPhm QuanMPhm marked this pull request as ready for review April 4, 2024 18:17
@joachimweyl
Copy link
Contributor

I don't know Python well enough to check your code but the logic is sound.

@QuanMPhm QuanMPhm force-pushed the 6/more_processing branch 2 times, most recently from 22170f8 to 3f135bd Compare April 4, 2024 21:03
Copy link
Contributor

@knikolla knikolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing this PR and the good work.

Some suggestions, besides the comments inline. This PR implements at least 3 new features to this codebase.

  • Deriving institution name from a PIs username.
  • Exporting two additional versions of the output for only BU, and HU/BU.
  • The credits system.

There's not a lot of overlap between them and in situations like this in the future I would encourage you to break them up into several PRs. In this case I would have gone with 3.

This makes the code easier to review. You don't need to break this one up, just a suggestion for the future.

process_report/process_report.py Show resolved Hide resolved
process_report/process_report.py Outdated Show resolved Hide resolved
process_report/tests/unit_tests.py Show resolved Hide resolved
process_report/process_report.py Outdated Show resolved Hide resolved
process_report/tests/unit_tests.py Outdated Show resolved Hide resolved
process_report/process_report.py Outdated Show resolved Hide resolved
process_report/process_report.py Outdated Show resolved Hide resolved
process_report/process_report.py Outdated Show resolved Hide resolved
process_report/process_report.py Outdated Show resolved Hide resolved
process_report/process_report.py Outdated Show resolved Hide resolved
@QuanMPhm QuanMPhm force-pushed the 6/more_processing branch 3 times, most recently from 2f53b15 to 916549d Compare April 9, 2024 20:21
@QuanMPhm
Copy link
Contributor Author

QuanMPhm commented Apr 9, 2024

The ordering of institute_map.json no longer matter, as we are using exact matches instead of substrings. Per @naved001's suggestion.

process_report/process_report.py Outdated Show resolved Hide resolved
process_report/process_report.py Outdated Show resolved Hide resolved
process_report/process_report.py Outdated Show resolved Hide resolved
@QuanMPhm QuanMPhm force-pushed the 6/more_processing branch 2 times, most recently from af098d6 to e233a3e Compare April 10, 2024 15:47
process_report/process_report.py Show resolved Hide resolved

def get_institution_from_pi(pi_uname):

dir_path = os.path.dirname(__file__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When loading a file from a program, it is good practice to have the path be relative to the current working directory, not the location of the script.

###


def get_institution_from_pi(pi_uname):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will read and load this file at every function call. Split the function into two, load_institution_map() -> dict() and get_institution_for_pi(pi_name) -> str.

filtered_dataframe.to_csv(output_file, index=False)


def validate_billables(dataframe):
# Validate PI name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment would be unnecessary if that was the function name instead. I know that you are writing it in preparation of validating more things, but for now it's the only thing you are validating.

for i, row in dataframe.iterrows():
pi_name = row[PI_FIELD]
if pandas.isna(pi_name):
print(f"Project {row[PROJECT_FIELD]} has no PI") # Nan check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is unnecessary because the function name describes the same thing due to its name isna.

with open(f'{dir_path}/institute_map.json', 'r') as f:
institute_map = json.load(f)

if '@' in pi_uname:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be rewritten as the following, due to how split will still work regardless of whether there is @ in the string, and how we always want the last entry in the list, accessible by -1 index.

institution_key = pi_uname.split('@')[-1]
institution = institute_map.get(institution_key, '')

@QuanMPhm QuanMPhm force-pushed the 6/more_processing branch 2 times, most recently from 37e6a45 to 49400f9 Compare April 11, 2024 14:56


def is_old_pi(old_pi_dict, pi, invoice_month):
if pi in old_pi_dict and old_pi_dict[pi] != invoice_month:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work if you are processing old invoices. Say you want to process invoices for 2024-02 that is 2 months ago. A PI with first invoice of 2024-03 is going to appear as an old PI.

I'm okay with fixing this in a follow-up PR while introducing a mechanism to update the PI file.

Copy link
Contributor

@knikolla knikolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay, though I'd like @naved001 to review as well before merging in case I missed something.

self.dataframe = pandas.DataFrame(data)
old_pi = ['PI2,2023-09', 'PI3,2024-02', 'PI4,2024-03'] # Case with old and new pi in pi file
old_pi_file = tempfile.NamedTemporaryFile(delete=False, mode='w', suffix='.csv')
for pi in old_pi: old_pi_file.write(pi + "\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please write this in 2 lines

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuanMPhm after this is addressed you can merge this PR!

… for each PI, and exporting HU and BU invoices
@QuanMPhm QuanMPhm merged commit 7ba0ee0 into CCI-MOC:main Apr 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automate More of the Invoice post processing steps
4 participants