Skip to content

Conversation

@gbakthavatchalam
Copy link

This PR adds the data recipe that lets user augment new features to the dataset by using the augment service

https://github.com/h2oai/h2oai/issues/20586

Copy link

@surenH2oai surenH2oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

data/augment.py Outdated
@@ -0,0 +1,758 @@
"""

This data recipe lets the user to augment new features to the dataset using the Augment Cloud Service.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add more description about this recipe from the perspective of DAI. Example starting with requirements.
SnowFlake,
DataSet
DAI

description about augmentation, where the output of augment will be persisted, and how it is consumed by DAI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do it

data/augment.py Outdated
6. The recipe polls the API for the completion of the table creation
6. The recipe exports the dataset back to user's snowflake account
7. The recipe downloads, saves the dataset from snowflake into driverlessai instance and returns the file path
8. A new dataset is created in DAI with the augmented columns

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

customer facing recipe i would use full product name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surenH2oai I have it updated now :)

return "", str(e)


class AugmentDataset(CustomData):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the CustomData used? I guess this is needed since data recipe?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surenH2oai CustomData is the base class and we are overriding the method create_data in the subclass 'AugmentDataset. DAI will find out the subclass that derives from CustomDatawhich in this case isAugmentDatasetand it will invoke thecreate_data` method to get the updated dataframe with original columns + augmented columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants