Technical Interview with TensorRT-LLM RTX Mixtral

This project demonstrates the optimization and execution of language models using TensorRT-LLM on NVIDIA GPUs. It focuses on leveraging the capabilities of TensorRT to enhance performance in tasks involving large language models like LLaMA or Mixtral, with a particular emphasis on models of 7b size or greater.

Conduct your own professional interviews for various development and programming positions, receive feedback, and get a rating based on your responses, and find out if you're ready for a real interview!! All using AI, with the Mixtral model from your local computer!

Results Visualization

This section showcases the dynamic capabilities of our system to visualize the outputs generated from the model. Through these examples, users can gain insights into how the system not only generates questions based on selected parameters but also provides valuable feedback based on user responses, enhancing the learning and interaction experience.

Generating Questions with Selected Parameters

This screenshot illustrates the system's interface in action, where users can specify their parameters to generate targeted questions. It highlights the system's adaptability and user-centric design in facilitating personalized interactions.

Receiving Tailored Feedback

Following the interaction, the system provides users with personalized feedback based on their responses. This feature underscores the system's capacity to support learning and improvement through direct and actionable insights.

Hardware Requirements

The following hardware specifications are the minimum required to run this repository, although higher specifications are also compatible and may yield better performance:

GPU: NVIDIA RTX 3060 12GB equivalent or higher
RAM: 32 GB
Disk Space: 200 GB

Software Requirements

To run this project successfully, you will need:

Windows 11: The operating system compatible with our setup and instructions.
TensorRT-LLM Installation: Follow the installation guide for TensorRT-LLM from NVIDIA, specifically from the /rel branch tailored for Windows. In the test, we used 9.2.0.post12.dev5 version. TensorRT-LLM Installation Guide
Model Download: You need to download a model, either LLaMA or Mixtral. In the tests, we used Mixtral-7B as the baseline model before optimization. Model Base Mixtral 7B
Model Optimization: Utilize TensorRT-LLM for model optimization. For detailed steps, refer to this guide: Model Optimization Guide

Installation

To set up and run this project, follow these steps:

Clone the Project:

We recommend copying it into the "llama" folder of the TensorRT-LLM. If you choose another path, you will need to adjust the paths within the code.Recommended path for cloning the project. Navigate to the directory where you want to store the project and clone the repository.

git clone https://github.com/Jamil-Palma/technical-interview-with-tensorrt-llm-rtx-mixtral.git
cd technical-interview-with-tensorrt-llm-rtx-mixtral

Install Dependencies: Install the necessary Python libraries.

pip install -r requirements.txt

Launch the Application: Start the Streamlit application on port 5000.

streamlit run main.py --server.port 5000

Note: If you choose to store the optimized models and tokenizer in a different location, ensure to adjust the paths in the code and the run files accordingly. here

Usage

After installation, the project can be accessed through a web interface provided by Streamlit. Navigate to http://localhost:5000 in your web browser to interact with the optimized language models.

Contributing

Contributions to this project are welcome. Please consider the following types of contributions:

Feature enhancements
Performance improvements
Bug fixes

Before contributing, please submit an issue to discuss your ideas or confirm the bug. Pull requests are welcome after issue confirmation.

Contact

For support or collaboration, please contact us at:

Jamil Palma - @JamilPalma
Email - [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.streamlit		.streamlit
__pycache__		__pycache__
images		images
pdf		pdf
Readme.md		Readme.md
areas.json		areas.json
main.py		main.py
metrics.json		metrics.json
requirements.txt		requirements.txt
run_new.py		run_new.py
style.css		style.css
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Technical Interview with TensorRT-LLM RTX Mixtral

Results Visualization

Generating Questions with Selected Parameters

Receiving Tailored Feedback

Hardware Requirements

Software Requirements

Installation

Usage

Contributing

Contact

About

Releases

Packages

Languages

Jamil-Palma/technical-interview-with-tensorrt-llm-rtx-mixtral

Folders and files

Latest commit

History

Repository files navigation

Technical Interview with TensorRT-LLM RTX Mixtral

Results Visualization

Generating Questions with Selected Parameters

Receiving Tailored Feedback

Hardware Requirements

Software Requirements

Installation

Usage

Contributing

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages