Skip to content

aliallam123/IOT607U-Data-Mining-2024-25

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IOT607U - Data Mining 2024/25

Introduction

Technological advances have substantially enhanced our capabilities for generating and collecting data from diverse sources. This rapid growth in stored and transient data has created the need for new techniques and automated tools that can assist in transforming vast amounts of data into useful information and knowledge. Drawing on knowledge from the disciplines of statistics, machine learning, and artificial intelligence, the field of data mining focuses on the extraction of patterns representing knowledge stored in large databases, data warehouses, the Web, or data streams.

Learning Aims

This module provides a comprehensive introduction to the field of data mining, focusing on fundamental concepts and practical techniques for extracting information and knowledge from data sets. The module will cover all the major steps in a data mining pipeline, from data acquisition and storage to analysis and interpretation.

Key topics include:

1)Data representation, acquisition, and storage

2)Visualization techniques and exploratory data analysis

3)Machine learning techniques (classification, regression, clustering)

4)Frequent pattern mining and anomaly detection

5)Real-world applications of data mining

6)Ethical and policy issues related to data use

Learning Outcomes

By the end of this module, you will be able to:

Select appropriate data representations for specific problems. Apply data pre-processing and cleaning methods for both numerical and categorical data. Use data summarization and visualization techniques to gain insights from datasets. Understand the distinctions between various data mining tasks (classification, regression, clustering, association rules, outlier detection) and select appropriate methods. Use performance metrics and validation techniques and interpret the results. Solve practical data mining problems using Python and common data mining libraries. Understand the ethical considerations involved in data mining.

Study Topics

This module will cover the following topics:

Data: Attributes, types of data, data quality, similarity and distance measures. Data Preprocessing & Cleaning: Methods to prepare data for analysis. Data Visualization & Analysis: Techniques for summarizing and gaining insights from data. Classification & Regression: Supervised learning methods for predictive tasks. Clustering: Unsupervised learning techniques for grouping similar data. Association Analysis: Techniques for discovering patterns and associations in data. Anomaly Detection: Identifying unusual data points or outliers. Data Mining Applications: Practical examples of data mining in real-world scenarios. Data Warehousing: Concepts and techniques for storing and managing large datasets. Data Ethics & Policy: Ethical and policy-related issues in data mining. Tools We will primarily use Python along with popular data mining libraries like scikit-learn. The lab environment will be Google Colaboratory (Colab), which is a free, cloud-based Jupyter Notebook environment that allows users to write and execute Python code in a web browser. Colab includes many relevant data-analytics packages pre-installed, making it an ideal platform for this module.

Python Libraries:

scikit-learn pandas matplotlib seaborn PyTorch Keras Google Colab: This platform provides a seamless environment for writing and executing code, data visualization, and integrating lab instructions.

Assessment

Your performance in this module will be assessed through a combination of assignments and a final exam:

Final Exam: 60% of the final grade. Assignments: 40% of the final grade (20% for each of two assignments).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published