Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Get Field Descriptions from OBBject #6117

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

deeleeramone
Copy link
Contributor

@deeleeramone deeleeramone commented Feb 23, 2024

  1. Why? (1-3 sentences or a bullet point list):

The purpose is to have a way to easily extract the field descriptions (and their keys) from the Python interface. This method can be used to generate tool tips and read the metadata associated with each field returned within the results.

  1. What? (1-3 sentences or a bullet point list):

It's a dictionary where the keys are the fields returned, and the values are descriptions from the provider data model.

Fields automatically created by the alias generator will not have a description.

data = obb.some.function("DOGE")

data.get_field_descriptions()
  1. Impact (1-2 sentences or a bullet point list):
  • Not a breaking change.
  • Relies on the model definitions for output.
  1. Testing Done:
  • Added integration and unit tests.
  • A bunch of functions with a variety of output, obb.some.function().get_field_descriptions()
In [10]: obb.equity.discovery.filings().get_field_descriptions()
Out[10]: 
{'symbol': 'Symbol representing the entity requested in the data.',
 'cik': 'Central Index Key (CIK) for the requested entity.',
 'title': 'Title of the filing.',
 'date': 'The date of the data.',
 'form_type': 'The form type of the filing',
 'link': 'URL to the filing page on the SEC site.'}

@deeleeramone deeleeramone added enhancement Enhancement platform OpenBB Platform v4 PRs for v4 labels Feb 23, 2024
Copy link
Contributor

@IgorWounds IgorWounds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do the charting fix in another PR as it is not related to this one. Just to keep that in mind moving forward.

Aside from that, an idea might be to grab the descriptions from our maps like the ProviderInterface or the like which might make it easier than dumping the model and pulling things out from it? Just an idea as we already have done the work from which you can grab what you need.

Copy link
Contributor

@hjoaquim hjoaquim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to like the idea but, isn't it a lot of overhead to have that method in the OBBject when one can simply do something like this:

image

Or, alternatively:

image

@montezdesousa
Copy link
Contributor

Agree with @hjoaquim, model_fields yields the pydantic fields or model_json_schema() gets you that in a json format.

Having it as a method here is not reliable because we don't impose any restriction on the shape of results, it can be a list, dict or any nested serializable object.

If we really need it, it will be safer to inject the schema into OBBject itself, like in the .extra during execution.

o = obb.equity.price.historical("AAPL")
o.results[0].model_fields

o.results[0].model_json_schema()

image

@deeleeramone
Copy link
Contributor Author

Aside from that, an idea might be to grab the descriptions from our maps like the ProviderInterface or the like which might make it easier than dumping the model and pulling things out from it?

The purpose of this is to get them post-request, so you only get descriptions for what is returned and is specific to that particular object.

@deeleeramone
Copy link
Contributor Author

I tend to like the idea but, isn't it a lot of overhead to have that method in the OBBject when one can simply do something like this:

Yes, that's too much work and setup. None of those outputs are a clean dictionary of key:value that are ready-to-go out of the box.

@deeleeramone
Copy link
Contributor Author

Agree with @hjoaquim, model_fields yields the pydantic fields or model_json_schema() gets you that in a json format.

Having it as a method here is not reliable because we don't impose any restriction on the shape of results, it can be a list, dict or any nested serializable object.

If it is a model, then it will work. If it is not a model, then there are no descriptions. The advantage of doing a to_df() move is that it looks at the fields across all the models in a list of models and accounts for instances where the fields are not None in every model within the list.

If you rely on the first item in results only it will not be an accurate representation of the shape cast by to_df().

@deeleeramone
Copy link
Contributor Author

deeleeramone commented Mar 1, 2024

If we really need it, it will be safer to inject the schema into OBBject itself, like in the .extra during execution.

This would duplicate schemas, because they already exist under the results object, and add unnecessary bloat to the OBBject by serializing a whole bunch of text. We simply need a method for retrieving the information that already exists within the object. Please keep in mind that this is specifically for the Python interface, and not for the API. API users do not get the benefits of the OBBject methods.

@montezdesousa
Copy link
Contributor

If we really need it, it will be safer to inject the schema into OBBject itself, like in the .extra during execution.

This would duplicate schemas, because they already exist under the results object, and add unnecessary bloat to the OBBject by serializing a whole bunch of text. We simply need a method for retrieving the information that already exists within the object. Please keep in mind that this is specifically for the Python interface, and not for the API. API users do not get the benefits of the OBBject methods.

True, nvm that idea.

Anyway the schema is easily accessible o.results[0].model_json_schema() and I don't see why we need a method just to get field descriptions specifically after the command is executed (otherwise obb.coverage provides that info pre-execution). Seems a very specific use case that should be handled by the user. That's my opinion but whatever team decides.

@deeleeramone
Copy link
Contributor Author

deeleeramone commented Mar 2, 2024

Anyway the schema is easily accessible o.results[0].model_json_schema()

if you rely on the first entry in results only, you will not get an accurate representation of the shape cast by to_df().

Wading through obb.coverage to get the specific information that's relevant to the object returned is not convenient.

obb.coverage.command_schemas(filter_by_provider="yfinance")

filter_by_provider doesn't seem to work. Even then, keying through that to get relevant information is not at all simple or intuitive.

The point is to very quickly and reliably extract information, not the entire schema. Only the returned field's descriptions - if they exist - as a dictionary. Nothing more, nothing less.

Frontend components interacting with the Python interface will find this useful.

Getting that same information (what's only relevant to the actual object returned) from obb.coverage would be quite a struggle, and would require stitching together multiple nested items.

Screenshot 2024-03-02 at 3 07 14 PM

@IgorWounds
Copy link
Contributor

Hi @deeleeramone ,

I think that this might be a custom OBBject extension and that we shouldn't have it there by default as we don't really see any use-case or need from users for this. There are coverage endpoints, maps, and documentation that all facilitate this. Also, @hjoaquim has shown how to easily get what is needed.

"filter_by_provider doesn't seem to work. Even then, keying through that to get relevant information is not at all simple or intuitive." -> For me, this works without issues.

obb.coverage.command_schemas(filter_by_provider="yfinance")[".equity.price.historical"]["input"].model_fields

At this stage, we should prioritise more what the users need and our OKRs which means adding less features and focusing more at tasks at hand.

@deeleeramone
Copy link
Contributor Author

deeleeramone commented Mar 6, 2024

"filter_by_provider doesn't seem to work. Even then, keying through that to get relevant information is not at all simple or intuitive." -> For me, this works without issues.

obb.coverage.command_schemas(filter_by_provider="yfinance")[".equity.price.historical"]["input"].model_fields

I'm looking for the output, but even then it does not fulfill the purpose of this method. The "coverage" endpoints are not relevant to the actual returned object.

This compiled description is not relevant to the actual returned object. The purpose is to get only relevant information, not all the extra noise and things that you have to wade through and filter out.

description='The adjusted close price. (provider: alpha_vantage, fmp);\n    Adjusted closing price during the period. (provider: intrinio);\n    Adjusted closing price during the period. (provider: tiingo)')

There is definitely a use-case, "tooltips", and @hjoaquim's solution cannot be applied universally. I'm struggling to understand what the harm of cleanly extracting field descriptions is.

By this same logic, there is no purpose of "to_dict()" because you could just do, "data.model_dump().get("results")"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement platform OpenBB Platform v4 PRs for v4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants