Skip to content

Conversation

@Hellio1006
Copy link

No description provided.

Copy link
Collaborator

@AmandaBirmingham AmandaBirmingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this work into a PR! I have provided comments on some misunderstandings and asked for clarification on a few points. Also, I have allowed the continuous integration to run, which identified a number of lint issues in the code (click on the red-icon item under "Some checks were not successful" on the PR's page to see the details); these are easy-to-fix formatting issues.

@@ -0,0 +1,38 @@
[Header],,,,,,,,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was not previously in source control. Where does it come from and why do you feel it should be added?

"outputs": [],
"source": [
"## INPUT\n",
"# when paramaters does not exist use the variable value, and if exhists, use the key in the test dic.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos in the comment :)

"paramaters" -> "parameters"
"does" -> "do
"exhists" -> "exist"

"metadata": {},
"outputs": [],
"outputs": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, github is not very good at diff-ing jupyter notebook files. To accommodate this limitation, please open this notebook, clear all outputs, stop the kernel, save the file, and then re-commit it. This will remove a lot of the contents that are confusing the diff (but will leave any code changes you made intact, so that they can be viewed by a reviewer).

Investigator Name,Enter the investigator name (optional),,,,,,,,,,
Project Name,TestProjA_10002_1_2_3_4_10to1dilution,,,,,,,,,,
Date,2025-04-01,,,,,,,,,,
Date,2025-09-04,,,,,,,,,,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change, while little, is an indication of a drawback of the current testing method. Some of the files that are produced by the notebook (the kind called "samplesheet" files) include the date on which they were produced--it will be different every day that they are run! This means that our method of just checking line-by-line for perfect matches will always fail with this kind of file. It wasn't an issue for notebook D, because notebook D doesn't produce that kind of output file ... but notebook C produces three of them! This suggests that the testing approach used by notebook D is not robust enough to work for all the other notebooks.

2.Comparing the contents of that file to a known "expected" version of the file.
'''

# imports are executed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not useful since a programmer reading the code will know this already from seeing the import statements. The same is true for many of the comments below ("define constants variable", "class definition is executed", etc). If you add comments like this to help yourself as you are becoming familiar with new code, you will want to take them out before committing to repository.

os.chdir(prev_cwd)

# collect produced data files under tmp/test_output (skip .ipynb)
produced_files = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting elaboration that I hadn't thought of--checking if the code emitted any unexpected files as well as checking that it did create the expected ones. If we were going this direction, I think it would not make sense to look only for unexpected .txt/.tsv/.csv files, though. I think instead one would want to exclude all expected files (including the ipynb) and all hidden files and then report anything ELSE, regardless of extension!

}

# helpful once: see what was actually produced
print("Produced files:", sorted(p.name for p in produced_files))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of statement is useful while debugging but would want to remove it for production code (even production unit tests :D )

expected_fp = self._find_expected_path(name)

# guard: prevent self-compare if paths somehow coincide
if Path(expected_fp).resolve() == produced_fp.resolve():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting ... I think it would never have occurred to me to test for this :) While I still think it is easier to just pass in the directories explicitly, I congratulate you on identifying an unusual edge case in the code ecosystem you have set up!

"Ensure notebook wrote into tmp and goldens remain in repo."
)

self._assert_text_files_equal(produced_fp, Path(expected_fp))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that currently the test will ALWAYS fail due to the date situation in the samplesheet files

)


run_params = {
Copy link
Collaborator

@AmandaBirmingham AmandaBirmingham Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although you have provided run parameters here and are passing them into the notebook in the execute_notebook command, these parameters will never be used. The reason is all the lines in notebook C that set these parameters--as the first example, the cell that reads

## INPUT
current_set_id = "col19to24"

Your code will set all parameters in the top cell, but then they will be OVERWRITTEN by the individual statements throughout the code that currently set inputs. As we discussed during our meeting, in order for this test approach to work, it will be necessary to comment out the code lines in notebook C that set the inputs. This will be fine--the users can comment them back in when they go to run the code (at which point they will need to enter their own particular input values anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants