From e2e7de1e3112531b9ed08158d61d4d92fae9d9ad Mon Sep 17 00:00:00 2001 From: Robin Date: Sun, 25 Aug 2024 20:31:55 +0530 Subject: [PATCH] blog links added, other suggestions incorporated --- .../2024/2024-08-21-final-report-robin.rst | 101 +++++++++--------- 1 file changed, 52 insertions(+), 49 deletions(-) diff --git a/docs/source/posts/2024/2024-08-21-final-report-robin.rst b/docs/source/posts/2024/2024-08-21-final-report-robin.rst index f299351b2..5593e2a12 100644 --- a/docs/source/posts/2024/2024-08-21-final-report-robin.rst +++ b/docs/source/posts/2024/2024-08-21-final-report-robin.rst @@ -20,16 +20,16 @@ Google Summer of Code Final Work Product :tags: google :category: gsoc -- **Name:** `Robin Roy `_ -- **Organization:** Python Software Foundation -- **Sub-Organization:** FURY -- **Project:** `Improving Community Engagement: AI communication automation using LLM `_ +- **Name:** `Robin Roy `__ +- **Organization:** `Python Software Foundation `__ +- **Sub-Organization:** `FURY `__ +- **Project:** `Improving Community Engagement: AI communication automation using LLM `__ Abstract -------- -The goal of this project was to implement a `Large Language Model(LLM) `_ chatbot that understands the FURY repository. The purpose of the project is to reduce the barrier of entry to scientific visualization. `Retrieval Augmented Generation(RAG) `_ was used to get the necessary context for every user query. Multiple variations were explored, including Fine-Tuning models, mixing Fine-Tuning and RAG and RAG alone. Multiple `chunking strategies `_ were also explored for data collection and storage. The models are served to the user through a Discord Bot and a GitHub App. All the API endpoints are hosted using `HuggingFace Spaces `_. `Pinecone `_ was used as the database for storing embeddings. Benchmarking, data collection, and testing were done on `another repository `_. +The goal of this project was to implement a `Large Language Model (LLM) `__ chatbot that understands the FURY repository. The purpose of the project is to reduce the barrier of entry to scientific visualization. `Retrieval Augmented Generation (RAG) `__ was used to get the necessary context for every user query. Multiple variations were explored, including Fine-Tuning models, mixing Fine-Tuning and RAG and RAG alone. Multiple `chunking strategies `__ were also explored for data collection and storage. The models are served to the user through a Discord Bot and a GitHub App. All the API endpoints are hosted using `HuggingFace Spaces `_. `Pinecone `__ was used as the database for storing embeddings. Benchmarking, data collection, and testing were done on `another repository `__. Proposed Objectives @@ -40,19 +40,19 @@ The objectives of the GSoC project could be broadly classified as: - **Figuring out hosting.** We had a constraint on hosting to try and minimize the cost. We managed to complete the whole project with 100% free hosting. Work here included: - * Experiments with `Google Colab `_ notebook hosting. - * Experiments with `Kaggle `_ notebook hosting. - * Experiments with `HuggingFace `_ spaces hosting. + * Experiments with `Google Colab `__ notebook hosting. + * Experiments with `Kaggle `__ notebook hosting. + * Experiments with `HuggingFace `__ spaces hosting. - **Choosing the technologies to use.** Work here included: - * Experiments with local `GGUF (GPT-Generated Unified Format) `_ models. + * Experiments with local `GGUF (GPT-Generated Unified Format) `__ models. * Experiments with different quantizations. - * Experiments with `Ollama `_. - * Experiments with `LlamaCPP. `_ - * Experiments with `Groq `_. - * Experiments with `Google Gemini `_. + * Experiments with `Ollama `__. + * Experiments with `LlamaCPP. `__ + * Experiments with `Groq `__. + * Experiments with `Google Gemini `__. - **Work on the backend architecture.** Backend architecture was heavily influenced by HuggingFace and its limitations. Work here included: @@ -90,35 +90,38 @@ Objectives Completed -------------------- - **Figuring out hosting.** - As mentioned, we had a constraint on the cost. So we searched for how to host the application for free. This took us to explore interesting directions like Google Colab and Kaggle Notebooks. In the end, HuggingFace was decided to be the best place. Everything is containerized and currently hosted on HuggingFace. + As mentioned, we had a constraint on the cost. We explored different options for free hosting. This took us to explore interesting directions like Google Colab and Kaggle Notebooks. In the end, HuggingFace was decided to be the best place. Everything is containerized and currently hosted on HuggingFace. - This also meant that all the upcoming design/architectural choices would have to be based on HuggingFace. This will have some weird quirks at some places but overall HuggingFace was an excellent choice for the job. + This also meant that all the upcoming design/architectural choices would have to be based on HuggingFace. This will cause some challenges on the Discord bot hosting but overall HuggingFace was a solid choice. - A very detailed blog on hosting is available `here `_. + A very detailed blog on hosting is available `here `__. The plan is to move all the HuggingFace repositories from my account to FURY's account. But here, I'll link to all my repositories which are currently active as I'm writing this report. - * `Embeddings Endpoint `_ + * `Embeddings Endpoint `__ This endpoint converts natural language to embeddings. The model is loaded using HuggingFace SentenceTransformer. - * `Ollama Endpoint `_ + * `Ollama Endpoint `__ This endpoint could be used to communicate with the Ollama models. The perk of using this is it is more convenient and generally faster. A separate repository was required because a single free HuggingFace Space cannot allocate more than 16 GB RAM and 2vCPUs. Token generation speed will be hit if it's not a separate repository. - * `Database Endpoint `_ + * `Database Endpoint `__ This endpoint was used to get the K-Nearest (or Approximate) embeddings based on cosine similarity. The parameter K could be passed to adjust it. We used Pinecone as the database. - * `FURY Discord Bot `_ - The repository for the Discord bot. It was required to use threading here which is a quirk of HuggingFace. HuggingFace server only activates once there is an active live endpoint. Discord did not need an endpoint, but we had to make one to get the server activated. The Discord bot ran on a separate thread while a server ran on the main thread. + * `FURY Discord Bot `__ + The repository for the Discord bot. It was required to use threading here which is a side-effect of HuggingFace Spaces. HuggingFace server only activates once there is an active live endpoint. Discord did not need an endpoint, but we had to make one to get the server activated. The Discord bot ran on a separate thread while a server ran on the main thread. - * `FURY external cloud endpoints `_ + * `FURY external cloud endpoints `__ This repository orchestrated external APIs from 3rd party providers like Groq and Gemini. We made it a separate repo to abstract the logic and simplify calling different endpoints as required. You can hot-swap multiple LLM models by changing the REST API parameters. - * `GitHub App `_ + * `GitHub App `__ Repository for the GitHub application. Receives webhooks from GitHub and acts upon them using GraphQL queries. - * `FURY Engine `_ + * `FURY Engine `__ This is the main endpoint both Discord and GitHub frontend applications hit. It orchestrates all the other endpoints. The architecture of how it works is detailed later below. + * `FURY Data Parsing/Benchmarking/Testing Repo (GitHub) `__ + This is a GitHub repository and contains all the parsing, bechmarking and testing scripts. + - **Choosing the technologies to use** Choosing the technology depended largely on HuggingFace hardware support. We experimented with inferencing LlamaCPP directly, inferencing Ollama, tested different quantizations and so on. Phi-3-mini-4k-instruct was chosen initially as the LLM. We rolled with it using Ollama for a few weeks. But as luck has it, I ended up discovering Groq is a cloud provider that provides free LLM endpoints. We used Groq from then on, and later also integrated Gemini since they also have a free tier. @@ -161,7 +164,7 @@ Objectives Completed You'll get a response from ``llama3-70b-8192`` using ``Groq``. If you do ``https://robinroy03-fury-engine.hf.space/api/google/generate`` you can call any Google Gemini models like ``gemini-1.5-pro`` or ``gemini-1.5-flash``. Same for ``Ollama``. - A detailed blog on architecture is available `here. `_ + A detailed blog on architecture is available `here. `__ - **Work on improving model accuracy** @@ -169,15 +172,15 @@ Objectives Completed The Initial version used a naive parser to parse code, later my mentors told me to use an AST parser. I chunked the entire repo using this and it performed relatively better. For model benchmarking, we had 2 tests, one QnA testing and one code testing. If the code compiles, the model gets one point. - All the benchmarking, data parsing, and database upsertion scripts are `here. `_ + All the benchmarking, data parsing, and database upsertion scripts are `here. `__ We used an image model called ``moondream2`` to validate the output generated by the model. Since FURY is a graphics library, we need to judge the image to see whether it is correct or not. - Fine-tuning was done on Google AI Studio. We Fine-Tuned using question/answer pairs from Discord and GitHub discussions. We later tried mixing RAG + Fine-Tuning. A detailed blog on Fine-Tuning is available `here `_. + Fine-tuning was done on Google AI Studio. We Fine-Tuned using question/answer pairs from Discord and GitHub discussions. We later tried mixing RAG + Fine-Tuning. A detailed blog on Fine-Tuning is available `here `__. - A detailed blog on benchmarking is available `here `_. + A detailed blog on benchmarking is available `here `__. - A detailed blog on chunking is available `here `_. + A detailed blog on chunking is available `here `__. - **Discord Bot integration** @@ -201,7 +204,7 @@ Objectives Completed - **GitHub App integration** - This included building the GitHub app and figuring out how to setup the UX for it. GitHub used GraphQL, but we didn't use a separate GraphQL library for this. We used a custom setup to query GraphQL endpoints. For us who only work with 1 or 2 commands, it works well. The code is `here `_. + This included building the GitHub app and figuring out how to setup the UX for it. GitHub used GraphQL, but we didn't use a separate GraphQL library for this. We used a custom setup to query GraphQL endpoints. For us who only work with 1 or 2 commands, it works well. The code is `here `__. GitHub App UI looks like this: @@ -231,7 +234,7 @@ Other Objectives Other Open Source tasks ----------------------- -GSoC isn't all about what I do with my project. It exists along with the 3 other cool projects my peers - `Wachiou `_, `Iñigo `_ and `Kaustav `_ did. I learnt a lot through them reviewing my PRs and me reviewing their PRs. I attended all the weekly meetings of Wachiou to learn about his progress and to learn new stuff. He attended all my meetings too, which was awesome :) +GSoC isn't all about what I do with my project. It exists along with the 3 other cool projects my peers - `Wachiou `__, `Iñigo `__ and `Kaustav `__ did. I learnt a lot through them reviewing my PRs and me reviewing their PRs. I attended all the weekly meetings of Wachiou to learn about his progress and to learn new stuff. He attended all my meetings too, which was awesome :) Contributions to FURY apart from the ones directly part of GSoC: * https://github.com/fury-gl/fury/pull/862 - Rendering videos on a cube @@ -250,7 +253,7 @@ Contributions to other repositories during this time, due to GSoC work: Acknowledgement --------------- -I am very thankful to my mentors `Serge Koudoro `_ and `Mohamed Abouagour `_. They were awesome and provided me with a comfortable environment to work in. Also got to thank `Beleswar Prasad Padhi `_ who gave me a very good introduction to opensource. The good thing about open source is I can still work on this (and other FURY projects) till I'm satisfied. I'm excited to continue contributing to the open source community. +I am very thankful to my mentors `Serge Koudoro `__ and `Mohamed Abouagour `__. They were awesome and provided me with a comfortable environment to work in. Also got to thank `Beleswar Prasad Padhi `__ who gave me a very good introduction to opensource. The good thing about open source is I can still work on this (and other FURY projects) till I'm satisfied. I'm excited to continue contributing to the open source community. Timeline @@ -266,40 +269,40 @@ Timeline - Blog Post Link * - Week 0 - Community Bonding! - - https://fury.gl/latest/posts/2024/2024-05-28-week-0-robin.html + - `Blog 0 `__ * - Week 1 - It officially begins… - - https://fury.gl/latest/posts/2024/2024-06-06-week-1-robin.html + - `Blog 1 `__ * - Week 2 - The first iteration! - - https://fury.gl/latest/posts/2024/2024-06-16-week2-robin.html + - `Blog 2 `__ * - Week 3 - Data Data Data! - - https://fury.gl/latest/posts/2024/2024-06-16-week3-robin.html + - `Blog 3 `__ * - Week 4 - Pipeline Improvements and Taking The Bot Public! - - https://fury.gl/latest/posts/2024/2024-07-01-week-4-robin.html + - `Blog 4 `__ * - Week 5 - LLM Benchmarking & Architecture Modifications - - https://fury.gl/latest/posts/2024/2024-07-01-week-5-robin.html + - `Blog 5 `__ * - Week 6 - UI Improvements and RAG performance evaluation - - https://fury.gl/latest/posts/2024/2024-07-27-week6-robin.html + - `Blog 6 `__ * - Week 7 - Surviving final examinations - - https://fury.gl/latest/posts/2024/2024-07-27-week7-robin.html + - `Blog 7 `__ * - Week 8 - Gemini Finetuning - - https://fury.gl/latest/posts/2024/2024-07-27-week8-robin.html + - `Blog 8 `__ * - Week 9 - - None - - None + - Hosting FineTuned Models + - `Blog 9 `__ * - Week 10 - - None - - None + - Learning GraphQL + - `Blog 10 `__ * - Week 11 - - None - - None + - Getting the App Live + - `Blog 11 `__ * - Week 12 - - None - - None + - Wrapping things up + - `Blog 12 `__