Data Ingestion Component #1

vaasu2002 · 2023-01-13T03:14:23Z

It's the first step of the machine learning pipeline. It is responsible for acquiring and importing data from various sources into the pipeline. In this project, we are using MongoDB as a data source. The schema. YAML file contains a list of column names that should be dropped from the data. We decided which columns to drop in the EDA part which we did before making the pipeline. Once the data has been ingested into the pipeline, it splits data into training and testing sets. Artifacts are used to store the training and testing data, in order to make them available to other components of the pipeline

vaasu2002 assigned vaasu2002 and Vanshdugar Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Ingestion Component #1

Data Ingestion Component #1

vaasu2002 commented Jan 13, 2023

Data Ingestion Component #1

Data Ingestion Component #1

Comments

vaasu2002 commented Jan 13, 2023