Skip to content

jonathanwvd/awesome-industrial-datasets

Repository files navigation

Awesome Industrial Datasets

🔗 Check the HTML version for better navigation.

Welcome to the Awesome Industrial Datasets repository! This project aims to simplify the access to high-quality industrial datasets across various sectors such as chemical, mechanical, oil and gas, and more. These datasets are invaluable for researchers, engineers, and data scientists working on machine learning models and other analytical tasks that require real-world industrial data.

If you find this repository useful, please consider giving it a ⭐ to show your support!

🤝 If you're interested in contributing, please refer to the Contribution Guidelines.

📊 Dataset Statistics

Total Datasets: 155

Datasets Table

Dataset Name Labeled Dataset Characteristics Data Source Additional Tags
3D Printer Likely Multivariate Real 3D Printing; Mechanical Engineering; Material Strength; Print Quality; Ultimaker S5; Regression; Classification
3W Likely Multivariate, Time-Series Both Oil and Gas; Real events; Fault detection; Multivariate data; Sensor data; Time-series analysis; Oil wells
AI4I 2020 Predictive Maintenance Dataset Yes Multivariate, Time-Series Synthetic Predictive maintenance; Synthetic data; Machine failure; Time-series data; Manufacturing process; Sensor data; Multivariate
AITEX Fabric Image Database Yes Image, Multiclass Real Textile fabrics; Defect detection; Image dataset; Fabric defect masks; Multiclass classification; Real images; Pattern recognition
APS Failure at Scania Trucks Yes Multivariate Real Heavy trucks; Air Pressure System; Component failure; Fault detection; Anonymized features; Industrial challenge dataset; Scania trucks
Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation Yes Time-Series, Multivariate Synthetic Tennessee Eastman Process; Anomaly detection; Industrial process simulation; Fault detection; Time-series data; Chemical process control; Machine learning benchmark
Air Quality Yes Multivariate, Time-Series Real Air quality monitoring; Chemical sensors; Time series data; Metal oxide sensors; Pollution measurement; Italian city data; Gas concentration
Anemometer Fault Detection Likely Multivariate, Time-Series Real Wind power industry; Anemometer data; Fault detection; Time-series data; Multivariate data; Sensor arrays; Energy forecasting
Appliances Energy Prediction No Multivariate, Time-Series Real Indoor environment monitoring; ZigBee wireless sensor network; Energy consumption data; Multivariate time series; Temperature and humidity; Weather data integration; M-bus energy meters
Asset Failure and Replacement Yes Multivariate, Time-Series Real Asset health; Failure prediction; Part replacement; Time-series data; Industrial monitoring; Health score; Prognostics and health management
AssetOpsBench No Multivariate, Time-Series Synthetic Asset operations; Maintenance engineering; IoT data; Anomaly detection; Time-series forecasting; AI agents; Multi-agent orchestration
Athabasca Oil Sands Dataset (McMurray Formation) Information not available Multivariate Real Athabasca Oil Sands; McMurray Formation; Wabiskaw Member; Geology; Well log data; Core analyses; Oil and Gas
BATADAL Water SCADA Cyber-Attack Dataset Partially Multivariate, Time-Series Synthetic Water distribution network; SCADA data; Cyber attack detection; Anomaly detection; Simulation data; EPANET; Time-series data
BSData Likely Multivariate Real Surface defects; Ball screw drives; Defect classification; Prognostics; Detection; Condition monitoring; Mechanical components
Bearing Likely Information not available Real NASA; Prognostics; Health Management; Aerospace; Time-Series Data; Sensor Data; Diagnostics
Beijing PM2.5 Data Yes Multivariate, Time-Series Real Air pollution; PM2.5; Time-series data; Meteorological data; Beijing; Climate and Environment; Pollution monitoring
Bosch Production Line Performance Yes Multivariate, Tabular Real Manufacturing; Production line data; Sensor measurements; Binary classification; Quality control; Assembly line monitoring; Matthews correlation coefficient
Brent Oil Prices Yes Time-Series, Univariate Real Brent crude oil; Time-series data; Oil price forecasting; Energy economics; Daily historical prices; U.S. Energy Information Administration; Financial data
Bridge Crack Dataset Likely Image, Multivariate Real Bridge cracks; Surface defects; Image dataset; Defect detection; Structural health monitoring; Computer vision; Machine learning
Business and Industry Reports Likely Multivariate, Time-Series Real US Census Bureau; Economic reports; Time series data; Business and industry; Multivariate data; Monthly and quarterly data; Economic indicators
C-MAPSS Aircraft Engine Simulator Data Yes Time-Series, Multivariate Synthetic Aircraft engine; Simulator data; Engine performance; Sensor data; Prognostics
CMAPSS Jet Engine Simulated Data Information not available Information not available Information not available Information not available
CNC Mill Tool Wear Yes Multivariate, Time-Series Real CNC machining; Tool wear detection; Classification; Time series data; Manufacturing; Sensor data; Industrial process monitoring
CWRU Bearing Data Information not available Information not available Information not available Information not available
Car Evaluation Yes Multivariate Real Hierarchical decision model; Automobile evaluation; Categorical features; Car acceptability; Classification dataset; Constructive induction; Attribute structure
Casting product image data for quality inspection Yes Image, Multiclass, Classification Real Casting manufacturing; Quality inspection; Industrial defect detection; Grayscale images; Image classification; Binary classification; Deep learning dataset
Chemical Composition of Ceramic Samples Yes Multivariate Real Energy Dispersive X-ray Fluorescence; Ceramic samples; Chemical composition; Multivariate dataset; Classification; Clustering; Physics and Chemistry
Chemical Production India 2013 to 2020 No Multivariate, Time-Series Real Chemical Production; India; Time-Series Data; Industrial Manufacturing; Chemical Industry; Metric Tonnes; Department of Chemicals and Petrochemicals
Chinese Power Line Insulator Dataset Likely Image, Multivariate Real Power line insulator; UAV images; Defect detection; Synthetic data; Image classification; Electrical equipment monitoring; Computer vision
Civil Engineering: Cement Manufacturing Dataset Yes Multivariate Real Civil Engineering; Cement Manufacturing; Concrete Compressive Strength; Regression Dataset; Material Science; Multivariate Data; Real-world Data
Combined Cycle Power Plant Yes Multivariate Real Combined Cycle Power Plant; Electrical Energy Output; Ambient Variables; Gas Turbine; Steam Turbine; Regression; Real-valued features
Concrete Compressive Strength Yes Multivariate Real Civil Engineering; Concrete Strength; Regression; Real-valued Features; No Missing Values; Material Science; Quantitative Data
Concrete Crack Images for Classification Yes Image, Multiclass Real Concrete; Crack detection; Image classification; RGB images; High-resolution images; Machine learning; Structural health monitoring
Condition Based Maintenance of Naval Propulsion Plants Likely Multivariate Synthetic Gas Turbine; Naval propulsion; Simulator data; Performance decay; Multivariate data; Regression task; Synthetic data
Condition Based Maintenance of Naval Propulsion Systems Yes Multivariate Synthetic Gas Turbine Simulator; Naval Propulsion System; Condition Based Maintenance; Performance Decay; Multivariate Data; Regression Task; Synthetic Data
Condition monitoring of hydraulic systems Yes Multivariate, Time-Series Real Hydraulic systems; Condition monitoring; Multivariate time-series; Sensor data; Fault diagnosis; Real-world data; Industrial equipment
Control loop datasets Information not available Multivariate, Time-Series Real Industrial datasets; Control loops; Oil and gas data; Time-series; Oscillation detection; Machine learning; Process control
Data-driven prediction of battery cycle life before capacity degradation Yes Multivariate, Time-Series Real Lithium-ion batteries; Battery life prediction; Fast charging; Time-series data; Battery degradation; Multivariate data; Thermocouple temperature measurement
Deep PCB Likely Image, Multiclass Real Surface defect detection; Printed circuit boards; Industrial inspection; Image classification; Real-world images; Defect localization; Machine learning dataset
Defective Solar Cells Dataset Likely Images, Classification Real Solar cells; Defect detection; Electroluminescence images; Photovoltaic; Computer vision; Anomaly detection; Machine learning
Degradation Measurement of Robot Arm Position Accuracy Likely Multivariate, Time-Series Real Robot arm position accuracy; Universal Robot UR5; Prognostics and health management; Positional degradation; Controller level sensing data; Multivariate time-series; Robot system health assessment
Detecting Anomalies in Wafer Manufacturing Yes Imbalanced, High Dimensionality, Classification Real Wafer manufacturing; Anomaly detection; Imbalanced dataset; Industrial IoT; High dimensionality; Semiconductor; Machine learning classification
Diesel Engine Faults Features Yes Multivariate, Time-Series Synthetic Diesel engine; Fault diagnosis; Predictive maintenance; Pressure curves; Torsional vibration; Thermodynamic model; Synthetic data
ECO dataset Yes Multivariate, Time-Series Real Electricity consumption; Occupancy detection; Non-intrusive load monitoring; Time-series data; Smart meter; Swiss households; Energy disaggregation
Electrical Grid Stability Simulated Data Yes Multivariate Synthetic Electrical grid stability; Decentralized control; 4-node star system; Synthetic data; Power grid simulation; Physics and Chemistry; Classification and Regression tasks
Electricity Load Diagrams 2011-2014 Likely Time-Series Real Electricity consumption; Time-series data; Energy data; Portuguese local time; Smart grid; Client consumption profiles; Daylight saving time adjustments
Energy Efficiency Yes Multivariate Synthetic Building energy efficiency; Heating load prediction; Cooling load prediction; Multivariate dataset; Synthetic building data; Regression tasks; Classification tasks
FEMTO (PRONOSTIA) Bearing Dataset Yes Time-Series, Multivariate Real bearings; run-to-failure; prognostics; remaining useful life; time-series analysis; accelerometer data; IEEE Prognostics Challenge
FailureSensorIQ Yes Multiple Choice Question Answering ISO standards and generated QA pairs predictive maintenance; industrial sensors; multi-choice QA; fault diagnosis
GC10-DET Yes Image, Multiclass Real Metal surface defects; Industrial dataset; Image classification; Object detection; Grayscale images; Manufacturing quality control; Steel sheet defects
GREEND Likely Multivariate, Time-Series Real Energy consumption; Household power measurements; Austria; Italy; Time-series data; Per device energy profiles; High frequency sampling
Gas Sensor Array Drift at Different Concentrations Yes Multivariate, Time-Series Real Chemical sensors; Sensor drift; Gas concentration; Time series data; Multivariate data; Environmental sensing; Pattern recognition
Gas sensor array temperature modulation Yes Multivariate, Time-Series Real Gas sensors; Temperature modulation; Metal oxide semiconductor sensors; Carbon monoxide; Humidity control; Time series data; Multivariate sensor data
Gas sensor array under dynamic gas mixtures Yes Multivariate, Time-Series Real Chemical sensors; Gas mixture analysis; Time series data; Multivariate sensor data; Continuous acquisition; Sensor array; Artificial intelligence research
Gearbox Fault Detection Likely Multivariate Real Gearbox fault detection; Accelerometer data; Bearing geometry; PHM Data Challenge 2009; Fault magnitude estimation; Prognostics and health management; Machine learning benchmark
Genesis Pick-and-Place Demonstrator Dataset Yes Multivariate, Time-Series Real Pick-and-place; Industrial automation; Pneumatic linear drive; Anomaly detection; Time-series sensor data; Predictive maintenance; Labeled anomalies
Global Power Plant Database No Multivariate, Global, Geospatial Real Power plants; Global dataset; Energy production; Primary fuel type; Yearly generation data; Open source; Geospatial data
Green House Gas Produce by Different Industry Likely Multivariate, Time-Series, Environmental Data Real Greenhouse Gas Emissions; Environmental Data; Industry Emissions; Multivariate; Time-Series; Carbon Footprint; ISO Measurement
HCI Industrial Optical Inspection Information not available Information not available Information not available
High Storage System Anomaly Detection Yes Multivariate, Time-Series, Anomaly Detection Real High Storage System; Energy Optimization; Anomaly Detection; Industry 4.0; Timed Automata; Conveyor Belt Sensors; Real-world Industrial Data
High Storage System Data for Energy Optimization Yes Multivariate, Time-Series Real High storage system; Conveyor belts; Energy optimization; Anomaly detection; Time-series data; Industrial IoT; Sensor data
Hill-Valley Yes Sequential Information not available Terrain data; Hill and valley classification; No noise and noise variations; Sequential data; Binary classification; Real-valued features; Creative Commons licensed
ISDB - International Stiction Data Base Likely Multivariate, Time-Series, Control Systems Real Control loops; Valve stiction; Nonlinearities in control systems; Fault diagnosis; Oscillation detection; MATLAB software; Process industries
IV2V and iV2I+ Industrial Datasets Information not available Multivariate Real Industrial wireless datasets; Vehicle-to-vehicle communication; Vehicle-to-infrastructure communication; AI4Mobile project; Industrial communication systems; Wireless sensor data; Machine learning support data
IndPenSim – Industrial Penicillin Fermentation Simulator Dataset Yes Multivariate, Time-series Simulated (mechanistic model, validated with industrial data) Biopharmaceutical; Fermentation; Process Control; Simulation; Raman Spectroscopy; Fault Detection; PAT; Batch Processes
Individual Household Electric Power Consumption Likely Yes Multivariate, Time-Series Real Electric power consumption; Time series; Household energy data; Multivariate data; Missing values present; Minute sampling rate; Sub-metering
Industrial Fault Detection Dataset Yes Multivariate Real IoT; Fault Detection; Industry 4.0; Sensor Data; Temperature Sensors; Vibration Sensors; Classification
Industrial IoT Dataset (Synthetic) Yes Time-Series, Multivariate Synthetic Predictive maintenance; Anomaly detection; Industry 5.0; Sensor data; Synthetic data; Machine operations; Failure forecasting
Industrial IoT Fault Detection Dataset Yes Multivariate, Time-Series Real Industrial IoT; Predictive Maintenance; Vibration Analysis; Temperature Monitoring; Pressure Sensing; Fault Detection; Time-Series Data
Industrial Safety and Health Analytics Database Likely Multivariate Real Industrial accidents; Workplace safety; Manufacturing plants; Occupational health; Accident severity levels; Multicountry data; Real-world data
Intelligent Manufacturing Dataset Yes Multivariate, Time-Series Synthetic Industrial IoT; 6G network slicing; Predictive maintenance; Anomaly detection; Manufacturing efficiency; Time-series sensor data; Deep learning benchmark
IoT-Integrated Predictive Maintenance Dataset Yes Multivariate, Time-Series Synthetic Predictive maintenance; Industrial sensors; IoT; Time-series data; Fault detection; Equipment health; Signal decomposition
Kolektor Surface-Defect Dataset (KolektorSDD) Yes Image data, Defect detection Real Surface-defect detection; Industrial images; Defect annotations; Machine vision; Deep learning; Image segmentation; Controlled environment
Kylberg Texture Dataset Yes Multiclass, Image, Texture Real Texture classification; Image patches; Multiclass dataset; Normalized images; Texture analysis; Computer vision; Image dataset
Large Scale Image Dataset of Wood Surface Defects Yes Image, Multiclass Classification Real Wood surface; Image dataset; Object detection; Defect detection; YOLO annotations; Automated quality control; Industrial inspection
Laser Welding Yes Multivariate Real Laser beam welding; Steel-copper lap joints; Welding parameters; Cracking detection; Screening design; Weld depth analysis; Material thickness
Li-ion Battery Aging Datasets Likely Time-Series, Multivariate, Run-to-Failure Real Li-ion batteries; Battery aging; Prognostics testbed; Electrochemical Impedance Spectroscopy; Run-to-Failure data; Remaining Useful Life prediction; NASA PCoE
MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation for Domain Generalization Task Information not available Time-Series, Audio Real anomalous sound detection; domain generalization; industrial machinery; acoustic condition monitoring; unsupervised learning; audio signals; DCASE 2022
MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection Yes Multivariate, Time-Series Real anomaly detection; machine fault diagnosis; acoustic condition monitoring; unsupervised learning; microphone array; audio data; industrial machine
MVTec Anomaly Detection (MVTec AD) Yes Image data, Anomaly detection, High-resolution images Real Industrial inspection; High-resolution images; Anomaly detection; Pixel-precise annotations; Image dataset; Unsupervised learning; Defect detection
Magnetic Tile Defect Information not available Information not available Information not available
Maintenance Action Recommendation Yes Multivariate, Time-Series, Classification Real Maintenance recommendation; Industrial equipment; Event codes; Parameter data; Remote monitoring; Diagnostics; PHM Society data challenge
Maintenance of Naval Propulsion Plants Data Set Yes Multivariate Synthetic Naval propulsion; Gas Turbine; Simulator data; Performance decay; Regression task; Multivariate data; Synthetic data
Manufacturing Defects Yes Univariate, Time-Series Real Manufacturing; Defect detection; Quality control; Time series analysis; Industrial data; Minor defects; Inspection data
Manufacturing cost Likely Yes Multivariate Real Manufacturing; Cost estimation; Economies of scale; Production volume; Regression; Real-world data
ManyWells Yes Multivariate, Time-Series, Large-Scale Synthetic multiphase flow; simulation; oil and gas; time-series; large-scale; machine learning; Hugging Face
Mechanic Component Images ( Normal / Defected) Yes Image, Multiclass Classification Real Mechanical components; Image data; Defect detection; Piston quality recognition; Computer vision; Multiclass classification; Manufacturing quality control
Mechanical Analysis Yes Multivariate Real Fault diagnosis; Electromechanical devices; Multivariate data; Classification; Machine components; Real-world data; Mechanical measurements
Mercedes-Benz Greener Manufacturing Yes Multivariate Real Automobiles; Manufacturing; Regression; Categorical data; Feature permutations; Test bench optimization; Carbon dioxide emissions
Milling Information not available Information not available Information not available NASA; Intelligent Systems; Autonomous Systems; Prognostics; Robust Software Engineering; Collaborative Systems; Ames Research Center
Milling Wear Information not available Information not available Information not available Information not available
Multi-stage continuous-flow manufacturing process Likely Multivariate, Time-Series Real Manufacturing Process; Continuous Flow; Multistage; Time-Series Data; Real Production Data; Regression Task; Process Control
N-CMAPSS_DL Information not available Information not available Information not available machine-learning; deep-learning; predictive-maintenance; remaining-useful-life; turbofan-engine; prognostics
NASA Bearing Dataset Yes Multivariate, Time-Series Real Bearing vibration data; Predictive maintenance; Prognostics; Test-to-failure experiments; Time-series sensor data; Accelerometer signals; Rotating machinery
NEU Surface Defect Dataset Information not available Image, Multiclass Real Surface defect detection; Steel strip images; Industrial defect classification; Image dataset; Hot-rolled steel; Machine learning; Computer vision
OECD Data - Crude Oil Production Likely No Multivariate, Time-Series Real Oil and Gas; Energy; Time-Series; Crude Oil Production; OECD Data; Regression; Country-level Data
Oil Storage Tanks Yes Image data, Multivariate Real Oil storage tanks; Satellite imagery; Object detection; Floating head tanks; Bounding box annotations; Google Earth images; Energy sector
Oil and Gas Yes Multivariate, Time-Series Real Oil production; Natural gas production; Energy prices; Exports data; Historical data; Time-series; Economic indicators
Oil well Yes Multivariate, Time-Series Real Oil well operation; Time-series data; Reservoir pressure; Oil and gas production; Water cut percentage; Dynamic level; Field development monitoring
One Year Industrial Component Degradation No Multivariate, Time-Series Real Industrial component degradation; Cutting blade; Predictive maintenance; Multivariate time series; Manufacturing; Sensor data; Remaining useful life prediction
Open Industrial Data Project (Oil & Gas) No Multivariate, Time-Series, Industrial Real Oil and Gas; Industrial Data; Time-Series Data; Predictive Maintenance; Condition Monitoring; Cognite Data Fusion; Aker BP; Open Data
Open Reaction Database Information not available Multivariate Real Chemical reactions; Machine learning; Reaction prediction; Synthesis planning; Experiment design; Open access; Chemical database
Oscillation detection artificial dataset Information not available Information not available Likely Synthetic Oscillation detection; Artificial dataset; Machine learning; Time series data; Signal analysis; Control systems; Industrial process monitoring
PHM 2008 Challenge Information not available Information not available Information not available PHM challenge; Predictive maintenance; Prognostics; NASA dataset; Machine learning dataset
PHM Data Challenge Yes Multivariate, Time-Series, Fault Detection Real Fault detection; Prognostics; Industrial plant monitoring; Time-series sensor data; Multivariate data; Predictive maintenance; Open competition
Panasonic 18650PF Li-ion Battery Data Likely Multivariate, Time-Series Real Lithium Ion Battery; State of Charge Estimation; Kalman Filtering; Neural Networks; Energy Storage; Electric Vehicles; Battery Testing
Parts Manufacturing Yes Multivariate Real Manufacturing; Parts dimensions; Operator performance; Multivariate data; Industrial dataset; Classification; Real-world data
Plant Fault Detection Likely Information not available Information not available PHM Society; Prognostics and Health Management; Fault detection; Industrial plant data; Maintenance data challenge; Predictive maintenance; Time-series likely
Plastic Extrusion Defects Likely Yes Multivariate, Time-Series, Tabular Real Plastic extrusion; Manufacturing defect detection; Time-series sensor data; Process parameters; Visual defect inspection; Film breakage; Multivariate data
Power Consumption of Tetouan City Likely Multivariate, Time-Series Real Power consumption; Time-series; Energy distribution networks; Morocco; Weather data integration; Multivariate dataset; Regression task
Predicting Manufacturing Defects Dataset Yes Tabular, Multivariate Synthetic Manufacturing; Defect Prediction; Synthetic Data; Quality Control; Production Metrics; Supply Chain; Classification Dataset
Production Plant Data for Condition Monitoring Yes Multivariate Real Condition monitoring; Predictive maintenance; Run-to-failure; Production plant; Self-Organizing Map; Degradation prediction; Industrial process data
Productivity Prediction of Garment Employees Yes Multivariate, Time-Series Real Garment manufacturing; Employee productivity; Labour-intensive industry; Time-series data; Regression task; Classification task; Industry expert validated
Prognostics Data Repository Likely Time-Series, Multivariate Both Prognostics; Time-Series Data; Run-to-Failure Data; NASA Ames Research Center; Battery Data; Engine Degradation; Industrial Equipment Monitoring
Pump sensor data Yes Multivariate, Time-Series Real Predictive maintenance; Water pump monitoring; Sensor data; Multivariate time series; Classification task; Industrial equipment; Real-world data
Quality Prediction in a Mining Process Yes Multivariate, Time-Series Real Mining process; Flotation plant; Iron ore; Silica impurity; Time series; Industrial data; Regression task
Quality Prediction in a Mining Process Yes Multivariate, Time-Series Real Mining process; Flotation plant; Iron ore quality; Silica impurity prediction; Process engineering; Time series data; Industrial manufacturing
RUL Datasets Yes Time-Series, Multivariate Both remaining useful life; PyTorch Lightning; turbofan engine; bearing dataset; time-series data; transfer learning; domain adaptation
Railway Surface Defect Detection Dataset Yes Image, Multivariate Real Railway inspection; Surface defect detection; Image dataset; Deep learning; Computer vision; Image defects; Industrial maintenance
Real-Time IoT-Driven Production System Dataset Yes Multivariate, Time-Series Real IoT; Smart Manufacturing; Digital Twin; Time-Series; Sensor Data; Predictive Maintenance; Production Efficiency
Renewable power plants Likely No Multivariate, Time-Series Real Renewable Energy; Power Plants; Energy Capacity; Time Series; European Countries; Multivariate Data; Energy Infrastructure
Road Surface Cracks Dataset Information not available Image data Real Road surface cracks; Image data; Crack detection; Material defects; Infrastructure monitoring; GitHub dataset; Computer vision
Robot Execution Failures Yes Multivariate, Time-Series Real Force and torque measurements; Robot failure detection; Multivariate time-series; Classification tasks; Physical sensor data; Integer features; Short time window data
SACAC Information not available Multivariate, Industrial Process Data Real Industrial process data; Control loop performance; PID control loops; Process industries; Multivariate data; Fault diagnosis; Process control monitoring
SECOM Yes Multivariate Real Semi-conductor manufacturing; Feature selection; Sensor data; Yield prediction; Multivariate data; Missing values; Classification
SML2010 Yes Multivariate, Sequential, Time-Series, Text Real Smart home monitoring; Domotic house; Environmental sensors; Time-series data; Multivariate data; Indoor temperature; Carbon dioxide levels
Secure Water Treatment (SWaT) Dataset Yes Multivariate, Time-Series, Cyber-Physical Systems Real Cyber-Physical Systems; Industrial Control Systems; Water Treatment; Anomaly Detection; Time-Series Data; Sensor Data; Cybersecurity
Severstal Steel Defect Detection Yes Multivariate, Image data Real Steel manufacturing; Defect detection; Image segmentation; Image classification; Manufacturing quality control; Multiclass defects; Kaggle competition
Smart Manufacturing IoT-Cloud Monitoring Dataset Yes Multivariate, Time-Series Synthetic IoT; Smart Manufacturing; Predictive Maintenance; Anomaly Detection; Sensor Data; Time-Series; Industrial Analytics
Solar Power Generation Data Likely Multivariate, Time-Series Real Solar power; Renewable energy; Time-Series; Multivariate; Sensor data; Power generation; India
Steel Dataset Yes Multivariate, Time-Series Real Steel industry; Energy consumption; Time-series data; Electricity usage; Reactive power; Environmental data; Manufacturing data
Steel Industry datasets Likely Multivariate, Time-Series Real Energy consumption; Steel production; Reactive power; CO2 emissions; Time-Series Data; Power Factor; Renewable energy integration
Steel Plates Faults Yes Multivariate Real Steel plates; Fault classification; Multivariate dataset; Integer features; Real features; Pattern recognition; No missing values
Superconductivty Data Yes Multivariate Real Superconductors; Physics and Chemistry; Critical temperature prediction; Multivariate data; Real-valued features; Chemical formula data; No missing values
TIG Welding Yes Image, Multivariate Real TIG welding; Aluminium 5083; HDR camera; Weld defect classification; Neural networks; Image data; Non-destructive testing
Tennessee Eastman Process Simulation Dataset Yes Multivariate, Time-Series, Synthetic Synthetic Chemical process simulation; Fault detection; Anomaly detection; Process monitoring; Time-series data; Multivariate data; Synthetic data
Textures Classification Dataset Likely Image data, Surface defect inspection Information not available Texture classification; Surface defect inspection; Image dataset; Convolutional neural network; Machine learning; Defect detection; Computer vision
Textures under varying Illumination Information not available Image, Multivariate, Variations in scale, pose, and illumination Real Texture images; Illumination variation; Pose variation; Scale variation; Material recognition; Image database; Computer vision
The Reference Energy Disaggregation Data Set (REDD) Yes Multivariate, Time-Series Real Energy disaggregation; Residential buildings; Power consumption; Multivariate time series; High frequency measurements; Circuit level data; Massachusetts
Tool Path Generation Likely Multivariate Real 5-axis machining; Tool path optimization; Shape deviation; Cutting conditions; Manufacturing; Regression; Multivariate data
Top Defense Manufacturers Yes Tabular, Multivariate Real Defense contractors; Military industry; Company revenue data; Business data; International companies; Defense sector; Revenue ranking
ToyADMOS dataset Yes Multivariate, Time-Series Real Machine operating sounds; Anomalous sound detection; Audio dataset; Condition monitoring; Toy car; Toy conveyor; Toy train
ToyADMOS2 dataset: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions Yes Time-Series, Multivariate Real Anomaly detection; Machine operating sounds; Domain shift; Audio data; Condition monitoring; MPEG-4 ALS; Benchmark dataset
Turbofan engine degradation simulation data set Yes Multivariate, Time-Series Synthetic Turbofan engine; Degradation simulation; C-MAPSS; Run-to-failure; Prognostics; Sensor data; Fault modes
Turning Dataset for Chatter Diagnosis Yes Multivariate, Time-Series Real Chatter diagnosis; Machining; Turning; Accelerometer data; Microphone data; Laser tachometer; Time-series sensor data
U.S. Crude Oil Imports Yes Multivariate, Time-Series Real Crude oil; Energy imports; Time-series data; United States; Oil and Gas; Economic analysis; Supply chain
UK-DALE dataset Yes Multivariate, Time-Series Real Energy consumption; Domestic electricity usage; Appliance-level data; High-frequency power data; Time-series data; Multivariate; United Kingdom
Urban Land Cover Yes Multivariate Real Urban land cover; High resolution aerial imagery; Multivariate data; Classification; Feature selection; Spectral data; Texture analysis
Vehicle Manufacturing Dataset Information not available Information not available Information not available
Versatile Production System Likely Multivariate, Time-Series, Condition Monitoring Real Industrial production; Condition monitoring; Predictive maintenance; Manufacturing data; Sensor data; Anomaly detection; Time-series data
WM811K Wafer Maps Likely Classification Real Wafer map; Semiconductor manufacturing; Defect classification; Pattern recognition; Large-scale dataset; Machine learning; Failure analysis
Water Distribution (WADI) Dataset Yes Multivariate, Time-Series Real Cyber-Physical Systems; Water distribution; Critical infrastructure; Attack scenarios; Sensor data; Time-series data; Anomaly detection
Wind Turbine Scada Dataset Yes Multivariate, Time-Series Real Wind turbine; SCADA data; Renewable energy; Time-series; Multivariate; Power generation; Wind speed and direction
Wine Quality Yes Multivariate Real Wine samples; Physicochemical tests; Wine quality; Red and white wine; Portuguese Vinho Verde; Classification; Regression
XJTU-SY Bearing Datasets Information not available Multivariate, Time-Series, Run-to-Failure Real Rolling element bearings; Run-to-failure; Vibration signals; Accelerated degradation; Prognostics; Remaining useful life; Time-series data

Acknowledgements

This repository was partly inspired by and developed using ideas from the following repositories:

Their work provided useful reference points during the development of this project.

LLM Assistance for JSON Generation

To save time and maintain a consistent standard, I used a Language Model (LLM) to automatically build the JSON files. The LLM reads the datasets.csv, loads each webpage, and creates the JSON based on a predefined template. This approach not only addresses time limitations but also makes it easier to update the JSON files if any changes are required.

Contribution Guidelines

Thank you for considering contributing to our repository.

How You Can Contribute

You can contribute in several ways:

  • Suggest a New Dataset: Propose a new dataset by creating an issue under the "Enhancement" label in the Issues tab.
  • Add a Dataset: Create a JSON file describing a dataset and submit a pull request to add it to the repository.
  • Suggest Changes: You can suggest improvements through the Issues tab or directly edit the JSON files and submit your changes via a pull request.

Adding a Dataset

Before adding a new dataset, please ensure that it is unique and not already included in the repository.

To add a dataset:

  1. Create a JSON file that accurately describes the dataset, following the same template as the existing datasets in the json folder.
  2. Place this JSON file in the json/manual folder.

Updating Documentation

To update the documentation (Markdown and HTML files) and refresh the README:

  1. Run the generate_documentation.py script located in the root of the repository. This script will:
    • Generate Markdown files in the markdown folder.
    • Generate HTML files in the html folder.
    • Update the README.md file with the latest datasets table.

Making a Pull Request

Please adhere to these guidelines when submitting a pull request:

  • Check for Duplicates: Ensure your contribution is unique and not already included.
  • Submit Separate Pull Requests: Submit individual pull requests for each suggestion or dataset.
  • Follow the format: Use our JSON template for datasets and maintain readability and structure in documentation.

About

A curated collection of public industrial datasets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published