-
Notifications
You must be signed in to change notification settings - Fork 0
feat: support union and optional arguments #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Looks really useful, thanks. @jacanchaplais are you taking this one? |
|
Thanks both @dionhaefner and @jpbrodrick89. Yep, will review this today or tomorrow. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jpbrodrick89. Thanks so much for the PR, these features are obviously useful!
I like using a UI fallback of a textbox for union types. However, since the key changes in what's rendered in the web UI is to replace specialised inputs with a textbox, and an optional type is to add a checkbox to the input container, I think we can probably avoid type checking the unions, and tracking if it could be a number, etc.
Tesseract-Core's reliance on Pydantic gives us type checking for free, and also we can just use a few simple if statements on the strings obtained from the textbox inputs. Python strings have a tonne of useful methods for checking the type of what they contain, if you tab complete str.is you'll get
isalnum, isalpha, isascii, isdecimal, isdigit, isidentifier, islower, isnumeric, isprintable, isspace, istitle, isupper
To make sure we can manage code complexity, I'd be keen to implement this feature with a smaller diff, just focusing on detecting if a union type has turned up, and adding a new "type" expected by the Jinja template of "union". Then in the Jinja template, you'd just need to add one extra if statement to create a text_input field if it's a union type, and you could inject a function into the template source that looks something like this (more explicit compared with attempting to parse as JSON and catching exceptions):
def _autocast(data: str) -> int | float | bool | str:
if data.isdigit():
return int(data)
if data.replace(".", "", 1).isdigit():
return float(data)
if data.lower() == "false":
return False
if data.lower() == "true":
return True
return dataAs for the optional type, rather than adding more UI, I think it might simpler to append "(optional)" to the title of the input field. Then if they leave the field blank, it'll be assumed that the value should be None.
As for the collections of composite types, I think your JSON parsing idea is reasonable, but I'd rather make that its own PR. I documented it as a limitation of Tesseract Streamlit at the bottom of the README, and I'd like to think about how the UX will be impacted on its own terms.
I hope that makes sense! Thanks again for the contribution, I think if we can add this feature with a smaller footprint on the codebase, it'll add a lot of value and will open us up for some great extensibility moving forward! 💙
Description of changes
Support union and optional types (e.g.
float | str,str | None,Hobby | list[Hobby]:Resolution rules:
1. If null is in union, remove it and set is_optional=True
2. If a single type remains, return that with is_optional
3. If only int/float types remain, resolve to "number"
4. Otherwise resolve to "union" which is parsed by _autocast (tries to convert to int -> float -> string)
Added checkboxes for optional arguments if unselected then None is passed. Unfortunately, removing the input field conditionally is not possible in a jinja template. Any reason why we're using jinja templates and not pure streamlit here?
Errors now raised for unsupported types
Testing done
dummywhich is not in the OutputSchema and changedint.hobby.nametostr.hobby.name. I have no idea how tests passed previously...Added test cases to mock schemas and goodbyeworld.
Added unit tests for parsing functions.
Check errors raise for unsupported types
Manual testing.