Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LFX Fall '24 Mentorship]: Enhancing Kai with Data Querying for Fine-Tuning and Potential InstructLab Integration #191

Open
JonahSussman opened this issue Jul 30, 2024 · 7 comments
Labels
needs-kind Indicates an issue or PR lacks a `kind/foo` label and requires one. needs-priority Indicates an issue or PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@JonahSussman
Copy link

JonahSussman commented Jul 30, 2024

This is an LFX mentorship project intended to run in the Fall of 2024.

This is related to cncf/mentoring#1287

Description

Kai is a tool designed to leverage AI for application modernization by analyzing code, identifying issues, and suggesting fixes. We aim to enhance Kai by developing a robust data querying mechanism to facilitate fine-tuning processes. This enhancement will lay the groundwork for potential future integration with InstructLab, an open-source AI project enabling community contributions to Large Language Models (LLMs) by adding new skills or knowledge. The primary focus will be on creating mechanisms to query and utilize data effectively, with a stretch goal of integrating static analysis tools and implementing an agent-based workflow.

This project will significantly enhance Kai’s backend, making it more scalable and capable of providing deeper code insights while also contributing to the enrichment of LLMs through InstructLab. It will offer a rich learning experience for participating students, covering backend development, workflow management, and contributing to open-source AI projects.

In this project, you will

  • Learn how the existing Kai backend functions.
  • Understand the requirements for querying data from Kai to facilitate fine-tuning processes.
  • Develop and implement mechanisms for querying and extracting relevant data from Kai.
  • Explore the potential for future integration with InstructLab for synthetic data generation.
  • Participate in biweekly Kai community calls, sharing milestones of progress.
  • Create a comprehensive blog post at the end of the project to document the process, outcomes, and future directions.

Current Situation

Kai currently processes code analyses through a centralized workflow. While effective, there is a need for enhanced data querying capabilities to support fine-tuning and optimization. Additionally, while the potential for integration with InstructLab is promising, the primary focus is on developing a robust data querying mechanism first. There is also an opportunity to explore agent-based workflows and static analysis tools as stretch goals.

Expected Outcome:

  • Successfully develop and implement a data querying mechanism in Kai to facilitate fine-tuning processes.
  • Demonstrate the enhanced backend with improved data querying capabilities.
  • Participate actively in community meetings, presenting progress and insights.
  • Create a detailed blog post documenting the project, its outcomes, and potential future directions, including the possible integration with InstructLab.
  • (Stretch Goal) Develop a reusable module for agent-based workflows into Kai, enhancing its ability to detect and report code issues.

Recommended Skills:

  • Python
  • Podman / Docker
  • Basic Software Development skills (command line, git)

Mentor(s):

Links:

@konveyor-ci-bot konveyor-ci-bot bot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 30, 2024
@konveyor-ci-bot
Copy link

This issue is currently awaiting triage.
If contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members.

@konveyor-ci-bot konveyor-ci-bot bot added needs-kind Indicates an issue or PR lacks a `kind/foo` label and requires one. needs-priority Indicates an issue or PR lacks a `priority/foo` label and requires one. labels Jul 30, 2024
@angad-singhh
Copy link

Hey @JonahSussman , this project looks very interesting to me, also the project explanation is very clear. Just can you give a brief what is the application process for this ( do we only have to apply on LFX portal, or are there any pretest with this project , anything specific I can work on to improve my application )

Thanks

@Ytemiloluwa
Copy link

Ytemiloluwa commented Aug 1, 2024

Hello @JonahSussman and @fabianvf my name is Temi, I am a software developer with experience in open source and enterprise environment. I am a 2 time summer of bitcoin intern with recent experience on working on pyasic backend methods ( A bitcoin mining library) . I have extensive knowledge of data queries and databases together with AI & ML models. my tech stack includes SQL, Python, Swift, Rust, bash scripting, Git, Docker and AWS.

I will like to ask if there is a requirement to submit a pull request or proposal on how I plan to enhance this project?

Looking forward to your feedbacks and I hope to work with you both this fall.

Thanks!

@sarthakg004
Copy link

sarthakg004 commented Aug 2, 2024

Hello @JonahSussman and @fabianvf, I am Saarthak, an aspiring data scientist with a keen interest in generative AI. I am interested in this project and will be able to make meaningful contributions to this projects.
I have previously worked on a few projects around LLM, one of them being Building a chatbot for SQL querying.
I have good knowledge of VectorDatabases, RAG framework, SQL, Python, Git, quantization and fine-tuning LLMs.

I would like to ask how can I submit my proposal for this project.

Looking forward to working with you.

@Clemo97
Copy link

Clemo97 commented Aug 10, 2024

Hello @JonahSussman and @fabianvf. My name is Clement Lumumba, I have submitted my application and would like to start contributing. I am proficient in Python, SQL and have experience in data engineering.

Looking forward to working with you.

@omkar-334
Copy link

Hello @JonahSussman and @fabianvf. I am Omkar, a CS undergraduate from India.
I am interested in this project and I have experience with python, sql, llms, backend development, retrieval augmented generation and function calling.
I have applied along with my cover letter and resume on the LFX website.
Are there any pretasks or prerequisites to be done?

Looking forward to contribute to the kai project.
Thank you.

@debrupf2946
Copy link

debrupf2946 commented Aug 19, 2024

@fabianvf @JonahSussman
Hello,I am Debrup Paul I like the project and was diving through Kai, Kantra, InstructLab.The project is Setup locally and I have tried ran analysis on sample app(macos) its running fine.
I was woking on project in GSoC where I was building Knowledge graph from code-repo and docs and running retriever for better and accurate response and text generation.
I find kai and kantra will help a lot,I want to spend time and contribute in this project,I feel the Idea is very interesting!

We can also implement GraphRag in the step to extract contextual information from Solved Examples for code suggestion
it can readily avoid hallucinations and we can llm which are small which can save cost!

I have one doubt what is exactly meant be data querying capabilities (pandas or sql based)?

Thanks
Linkedin: https://www.linkedin.com/in/debrup-paul-599158227/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-kind Indicates an issue or PR lacks a `kind/foo` label and requires one. needs-priority Indicates an issue or PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
Status: 🆕 New
Development

No branches or pull requests

7 participants