Skip to content

Enthu-muskan/-Task-1-Understanding-Dataset-Data-Types

Repository files navigation

AI & ML Internship – Task 1: Understanding Dataset & Data Types

This repository contains my submission for Task 1 of the AI & ML Internship. The task focuses on understanding the dataset structure, identifying different data types, and analyzing whether the dataset is suitable for machine learning.

Task Objective

The main objective of this task is to:

Understand the dataset and its structure

Identify different types of data such as numerical, categorical, ordinal, and binary

Analyze data quality issues like missing values and imbalance

Identify the target variable and input features

Check if the dataset is suitable for machine learning

Dataset Used

For this task, I worked on:

Titanic Dataset

Students Performance Dataset

These datasets are commonly used for practicing machine learning concepts.

Tools Used

Python

Pandas

NumPy

Google Colab Notebook

Work Done

Loaded the dataset in Google Colab using Pandas.

Displayed the first and last few rows to understand the structure.

Used df.info() to check data types and missing values.

Used df.describe() to view statistical summaries of numerical columns.

Identified different feature types:

Numerical

Categorical

Ordinal

Binary

Checked unique values in categorical columns to understand data distribution.

Identified the target variable and input features.

Analyzed the dataset size and discussed its suitability for machine learning.

Wrote observations about missing values and possible data imbalance.

Observations

The dataset contains a mix of numerical and categorical features.

Some columns have missing values that need preprocessing.

Certain categories appear imbalanced, which may affect model performance.

After cleaning and preprocessing, the dataset is suitable for machine learning tasks.

Deliverables

Google Colab Notebook with complete data analysis

One-page dataset analysis report

Learning Outcome

Through this task, I learned:

How to explore and understand a dataset before modeling

The importance of identifying feature types

How to detect missing values and imbalance

Why data understanding is a critical step in machine learning

How to Use

Clone this repository using GitHub.

Open the notebook in Google Colab or Jupyter Notebook.

Run the cells to view the dataset analysis.

Author

Muskan Pandey AI & ML Internship

About

Analyzed the dataset to understand its structure, data types, target variable, data quality, and suitability for machine learning before model building.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors