| title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned | short_description |
|---|---|---|---|---|---|---|---|---|
Analytics-Vidhya Free-Courses |
👁 |
pink |
green |
gradio |
5.5.0 |
app.py |
false |
This is assignment for Analytics Vidhya Gen AI Intern |
This project is a Gradio-based web application that allows users to search for relevant courses on Analytics Vidhya. It uses a pre-trained sentence-transformer model to calculate the similarity between a user query and course titles, descriptions, and lesson content, presenting the most relevant courses based on the query.
- Course Search: Enter a query to find relevant courses from the dataset based on title, description, and lesson content.
- Similarity-Based Ranking: Courses are ranked by their similarity score, determined using cosine similarity.
- Interactive UI: Built with Gradio, making it easy to interact and retrieve search results.
- Python 3.7+
- Git LFS for handling large files if the dataset file (
courses_with_granular_embeddings.json) is over 10 MB.
-
Clone the Repository
git clone https://huggingface.co/spaces/<username>/Analytics-Vidhya_Free-Courses.git cd Analytics-Vidhya_Free-Courses
-
Install Dependencies Install the necessary packages specified in
requirements.txt:pip install -r requirements.txt
-
Set Up Git LFS (if required)
git lfs install git lfs pull
-
Run the Application
python app.py
- Open the Gradio app on
localhostor the designated Hugging Face Space link. - Enter a search query in the text box, and click "Submit" to find relevant courses.
- Results will be displayed in a table format, showing course titles, descriptions, and links.
Analytics-Vidhya_Free-Courses/
├── app.py # Main application code
├── requirements.txt # Dependencies required to run the app
├── courses_with_granular_embeddings.json # Dataset with course embeddings
├── .gitattributes # File to specify Git LFS-tracked files
├── README.md # Project README
└── preprocessing-files # Folder containing scripts and files used for preprocessing course data
├── get_url.py # Script to scrape URLs of free courses from Analytics Vidhya
├── scraper.py # Script to extract detailed course data from each URL
├── free_course_links.json # JSON file containing URLs of all free courses
├── free_courses_content.json # JSON file containing scraped details of each free course (title, description, curriculum)
├── create_embedding.py # Script to generate embeddings for course titles, descriptions, and lessons
├── find_matching.py # Script to find relevant courses based on user queries using similarity matching
└── README.md # Detailed documentation explaining the preprocessing steps and scripts
- Gradio: For creating the interactive web interface.
- Sentence Transformers: For generating text embeddings to compare queries and course content.
- Scikit-Learn: For cosine similarity calculations.
- Git LFS: To manage large files like
courses_with_granular_embeddings.json.
This project is not at all licensed.