Skip to content

This repository contains various datasets for data analysis, machine learning, and educational purposes

License

lovnishverma/datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

My Datasets Repository

This repository contains various datasets for data analysis, machine learning, and educational purposes. Below is a brief description of each dataset available in this repository.

Want to download any csv file for local use? Follow the steps mentioned below: πŸ‘‡

  1. Go to a csv file in a repository of your choice
  2. From the top right bar just above the file section, select and click on "Raw" button
  3. A page will appear with comma separated data with no styling
  4. Copy the page url
  5. Make a folder in your desktop
  6. Open that folder in your favourite code editor and make a simple python file inside the folder. Name it as you please.
  7. Copy this code [From the section below]
  8. Run the python file
  9. The csv file will get downloaded within sometime, depending upon file size
  10. Now you are ready the use it locally!!
import requests
import pandas as pd
url = '{(copied url here)}' 
res = requests.get(url, allow_redirects=True)
with open('download_file_name.csv','wb') as file:
    file.write(res.content)
download_file_name = pd.read_csv('download_file_name.csv') 

Available Datasets

1. BMI_Data.csv

  • Contains Body Mass Index (BMI) data.
  • Useful for health and fitness analysis.

2. departments.csv

  • Contains department-related information.
  • Useful for organizational data processing.

3. employees.csv

  • Contains employee details.
  • Can be used for HR analytics and workforce management.

4. iris.csv

  • Classic Iris dataset for machine learning.
  • Contains different species of iris flowers with their measurements.

5. item_similarity_df.csv

  • Contains item similarity data.
  • Useful for recommendation system development.

6. movies.csv

  • Dataset containing information about movies.
  • Useful for movie recommendation models.

7. music_genre.csv

  • Contains music genre classification data.
  • Can be used for genre prediction models.

8. nielit.patt

  • Not a database it's for AVR custom Marker

9. pandas.csv

  • Sample dataset for practicing pandas library operations.
  • Useful for learning data manipulation.

10. pandas_tutorial1.csv

  • Another dataset for pandas tutorials.
  • Contains structured data for training purposes.

11. ratings.csv

  • Contains user ratings for various items.
  • Useful for collaborative filtering and recommendation systems.

12. sample.csv

  • A sample dataset.
  • Can be used for testing and learning purposes.

13. test.csv

  • A test dataset.
  • Used for validation and experimentation.

Explore More Datasets on my Kaggle

Usage

These datasets can be used for:

  • Machine learning projects
  • Data analysis and visualization
  • Educational and tutorial purposes

How to Contribute

If you have additional datasets to contribute, feel free to upload them and update this README with the necessary descriptions.

License

These datasets are provided for educational and research purposes. Please check individual datasets for any specific license information.


For any questions or suggestions, feel free to raise an issue or contact Lovnish Verma.

πŸ“Š Machine Learning Dataset Sources

A list of public datasets for machine learning, AI, data science, and analytics projects.


πŸ”Ή General-Purpose ML Repositories


πŸ”Ή Government & Open Data Portals


πŸ”Ή Domain-Specific Datasets

πŸ–ΌοΈ Computer Vision

🌐 Web & NLP

🧬 Bio, Medical & Health

πŸ—£οΈ Speech & Audio

  • OpenSLR – Speech recognition datasets.
  • LibriSpeech ASR – Audiobook dataset for speech recognition.

πŸ—ΊοΈ Maps & Geospatial


βœ… Quick Access Table

Name Domain Link
UCI ML Repo General Link
Kaggle General Link
IndiaAI Govt (India) Link
Data.gov.in Govt (India) Link
Data.gov Govt (USA) Link
Data World General Link
Hugging Face NLP/ML Link
Papers with Code Benchmarks Link
Zenodo Research Link

πŸ“Œ Tip

For code integration and automatic downloads, you can often use Python libraries such as:

from datasets import load_dataset

dataset = load_dataset("imdb")  # Hugging Face example

You can also automate downloads from Kaggle via API:

kaggle datasets download -d username/dataset-name

Feel free to contribute more sources via pull request!

About

This repository contains various datasets for data analysis, machine learning, and educational purposes

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published