Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Ingestion Component #1

Open
vaasu2002 opened this issue Jan 13, 2023 · 0 comments
Open

Data Ingestion Component #1

vaasu2002 opened this issue Jan 13, 2023 · 0 comments
Assignees

Comments

@vaasu2002
Copy link
Member

Image

It's the first step of the machine learning pipeline. It is responsible for acquiring and importing data from various sources into the pipeline. In this project, we are using MongoDB as a data source. The schema. YAML file contains a list of column names that should be dropped from the data. We decided which columns to drop in the EDA part which we did before making the pipeline. Once the data has been ingested into the pipeline, it splits data into training and testing sets. Artifacts are used to store the training and testing data, in order to make them available to other components of the pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants