Skip to content

Conversation

@antgonza
Copy link
Member

@antgonza antgonza commented Mar 6, 2025

No description provided.

@antgonza antgonza changed the title fixes after deploy 03.25 WIP: fixes after deploy 03.25 Mar 6, 2025
qp_klp/Assays.py Outdated
# (run-id, project_qid, other) and cleave
# the qiita-id from the project_name.
qid = _file.split('.')[1].split('_')[-1]
if not qid.isnumeric():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know the structure of the filename sufficiently to express this as a regex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about regex and if adding that will actually help. Anyway, for clarity, as the comment says the file name is: run-id, project_qid, other - for example: 20231009_FS10001773_73_BTR67718-2515.U19_ADRC_14736.1.tsv so we could do something like '{self.pipeline.run_id}.{project_qid}.{extra}' where we are looking for the digits after '_', in this case '14736'. Does this help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be this, right? Which would allow for matching 5 or 6 digit study IDs to future proof for a while?

>>> x = "20231009_FS10001773_73_BTR67718-2515.U19_ADRC_14736.1.tsv"
>>> re.compile(r"(?P<runid>[a-zA-Z0-9_-]+)\.(?P<qname>[a-zA-Z0-9_]+)(?P<qid>[0-9]{5,6})\..\.tsv").match(x).groups()
('20231009_FS10001773_73_BTR67718-2515', 'U19_ADRC_', '14736')
>>>

If we know the pattern, then it seems reasonable to formally describe the pattern as presumably anything which does not fit is erroneous. If we do not know the pattern, then it suggests we should enforce control upstream to minimize inference and favor explicit behavior

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable but before making the change, how do you think we should handle errors, like in this example:

>>> x = 'XX'
>>> re.compile(r"(?P<runid>[a-zA-Z0-9_-]+)\.(?P<qname>[a-zA-Z0-9_]+)(?P<qid>[0-9]{5,6})\..\.tsv").match(x).groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'

just a try/except-pass?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just store the result of match in a variable and test whether it is None?

@antgonza antgonza changed the title WIP: fixes after deploy 03.25 fixes after deploy 03.25 Mar 10, 2025
@antgonza
Copy link
Member Author

Thank you for the review @wasade !

@antgonza antgonza merged commit 2d0f041 into qiita-spots:main Mar 10, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants