Edit the unit test for notebookC #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Hellio1006 wants to merge 1 commit into biocore:master from Hellio1006:master

Hellio1006 commented Oct 15, 2025

No description provided.


          Describe what you changed

6dfe413

AmandaBirmingham reviewed

View reviewed changes

Collaborator

AmandaBirmingham left a comment

Thanks for putting this work into a PR! I have provided comments on some misunderstandings and asked for clarification on a few points. Also, I have allowed the continuous integration to run, which identified a number of lint issues in the code (click on the red-icon item under "Some checks were not successful" on the PR's page to see the details); these are easy-to-fix formatting issues.

good_standard_metagv0.csv

    
            @@ -0,0 +1,38 @@
          
              [Header],,,,,,,,

Collaborator

AmandaBirmingham Oct 30, 2025

This file was not previously in source control. Where does it come from and why do you feel it should be added?

notebooks/tellseq_D_variable_volume_pooling.ipynb

    
                 "outputs": [],

                 "source": [

                  "## INPUT\n",

                  "# when paramaters does not exist use the variable value, and if exhists, use the key in the test dic.\n",

Collaborator

AmandaBirmingham Oct 30, 2025

Typos in the comment :)

"paramaters" -> "parameters"
"does" -> "do
"exhists" -> "exist"

notebooks/tellseq_C_equal_volume_pooling.ipynb

    
                 "metadata": {},

                 "outputs": [],

                 "outputs": [

Collaborator

AmandaBirmingham Oct 30, 2025

Unfortunately, github is not very good at diff-ing jupyter notebook files. To accommodate this limitation, please open this notebook, clear all outputs, stop the kernel, save the file, and then re-commit it. This will remove a lot of the contents that are confusing the diff (but will leave any code changes you made intact, so that they can be viewed by a reviewer).

notebooks/test_output/SampleSheets/Tellseq_samplesheet_instrument_iseq_set_col19to24.csv

    
              Investigator Name,Enter the investigator name (optional),,,,,,,,,,

              Project Name,TestProjA_10002_1_2_3_4_10to1dilution,,,,,,,,,,

              Date,2025-04-01,,,,,,,,,,

              Date,2025-09-04,,,,,,,,,,

Collaborator

AmandaBirmingham Oct 30, 2025

This change, while little, is an indication of a drawback of the current testing method. Some of the files that are produced by the notebook (the kind called "samplesheet" files) include the date on which they were produced--it will be different every day that they are run! This means that our method of just checking line-by-line for perfect matches will always fail with this kind of file. It wasn't an issue for notebook D, because notebook D doesn't produce that kind of output file ... but notebook C produces three of them! This suggests that the testing approach used by notebook D is not robust enough to work for all the other notebooks.

notebooks/tests/test_tellseq_D_variable_volume_pooling.py

    
              2.Comparing the contents of that file to a known "expected" version of the file.

              '''

              # imports are executed

Collaborator

AmandaBirmingham Oct 30, 2025

This comment is not useful since a programmer reading the code will know this already from seeing the import statements. The same is true for many of the comments below ("define constants variable", "class definition is executed", etc). If you add comments like this to help yourself as you are becoming familiar with new code, you will want to take them out before committing to repository.

notebooks/tests/test2_tellseq_C_equal_volume_pooling.py

    
                              os.chdir(prev_cwd)

                          # collect produced data files under tmp/test_output (skip .ipynb)

                          produced_files = [

Collaborator

AmandaBirmingham Oct 30, 2025

This is an interesting elaboration that I hadn't thought of--checking if the code emitted any unexpected files as well as checking that it did create the expected ones. If we were going this direction, I think it would not make sense to look only for unexpected .txt/.tsv/.csv files, though. I think instead one would want to exclude all expected files (including the ipynb) and all hidden files and then report anything ELSE, regardless of extension!

notebooks/tests/test2_tellseq_C_equal_volume_pooling.py

    
                          }

                          # helpful once: see what was actually produced

                          print("Produced files:", sorted(p.name for p in produced_files))

Collaborator

AmandaBirmingham Oct 31, 2025

This sort of statement is useful while debugging but would want to remove it for production code (even production unit tests :D )

notebooks/tests/test2_tellseq_C_equal_volume_pooling.py

    
                              expected_fp = self._find_expected_path(name)

                              # guard: prevent self-compare if paths somehow coincide

                              if Path(expected_fp).resolve() == produced_fp.resolve():

Collaborator

AmandaBirmingham Oct 31, 2025

Interesting ... I think it would never have occurred to me to test for this :) While I still think it is easier to just pass in the directories explicitly, I congratulate you on identifying an unusual edge case in the code ecosystem you have set up!

notebooks/tests/test2_tellseq_C_equal_volume_pooling.py

    
                                      "Ensure notebook wrote into tmp and goldens remain in repo."

                                  )

                              self._assert_text_files_equal(produced_fp, Path(expected_fp))

Collaborator

AmandaBirmingham Oct 31, 2025

Note that currently the test will ALWAYS fail due to the date situation in the samplesheet files

notebooks/tests/test2_tellseq_C_equal_volume_pooling.py

    
                          )

                          run_params = {

Collaborator

AmandaBirmingham Oct 31, 2025 •

edited

Loading

Although you have provided run parameters here and are passing them into the notebook in the execute_notebook command, these parameters will never be used. The reason is all the lines in notebook C that set these parameters--as the first example, the cell that reads

## INPUT
current_set_id = "col19to24"

Your code will set all parameters in the top cell, but then they will be OVERWRITTEN by the individual statements throughout the code that currently set inputs. As we discussed during our meeting, in order for this test approach to work, it will be necessary to comment out the code lines in notebook C that set the inputs. This will be fine--the users can comment them back in when they go to run the code (at which point they will need to enter their own particular input values anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet