This project involves developing a machine learning model to analyze second-hand car listings gathered from Divar.ir, a popular Iranian online marketplace.
- Web scraping is done using Selenium to extract data, including car models, prices, and additional features.
- The data is stored in MongoDB and processed using Pandas and NumPy for further analysis.
- The machine learning model focuses on price prediction, demand analysis, and market trends.
This project aims to provide actionable insights for car buyers and dealerships by predicting car prices, analyzing model trends, and offering a data-driven approach to understanding the second-hand car market.
To develop a data-driven machine learning model using web-scraped data from Divar.ir to analyze second-hand car listings and provide actionable insights such as price predictions and market trends.
- Python (for web scraping and machine learning)
- Selenium (for web scraping automation)
- MongoDB (for storing scraped data)
- Pandas & NumPy (for data preprocessing and analysis)
- Scikit-learn (for building machine learning models)
- Data Visualization (Matplotlib, Seaborn)
- HTML/CSS/JavaScript Knowledge (for handling dynamic web content)
- 🔍 Problem-Solving Skills
- 🎯 Attention to Detail
- ⏱️ Time Management & Project Organization
- 💬 Effective Communication of Findings
- 🖥️ Web Scraper: A functional scraper to collect second-hand car listings from Divar.ir.
- 💾 MongoDB Database: All extracted listings (car model, price, features, etc.) stored for further analysis.
- 🧹 Cleaned and Preprocessed Dataset: Ready for machine learning tasks.
- 🤖 Predictive Machine Learning Models: Price estimation and market trend analysis.
- 📈 Visualizations: Insights like price distribution and model trends visualized.
- 📝 Final Report: Summarizing findings and actionable insights from the data.
- for web scraping.
- for data storage.
- & for data cleaning and preprocessing.
- for machine learning.
- & for data visualization.
Handling the dynamic nature of Divar.ir’s content (e.g., pagination, AJAX loading) was addressed using stealth techniques in Selenium to avoid blocking or throttling.
The dataset includes attributes such as car brand, model, year, price, mileage, and condition to provide a comprehensive analysis of the second-hand car market.