Skip to content

GAISSA-UPC/ML-GAISSALabel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

570 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

GAISSA Tools

GAISSA Tools is an open-source platform that provides comprehensible cutting-edge solutions for machine learning model efficiency assessment and ROI analysis. The platform includes two main tools:

  • GAISSALabel: Allows data scientists, software engineers, and end-users of AI models to analyze, understand, and contribute to improving the energy efficiency of artificial intelligence models during both training and inference stages.
  • GAISSA ROI Analyzer: Provides return on investment analysis capabilities for machine learning projects. Calculates the ROI of machine learning tactics, used to optimize a certain model, by analyzing the costs and benefits associated with them, helping organizations make technically and environmentally informed decisions about their AI investments.

GAISSALabel Features

Training

During the training phase, data scientists can record a series of results for various metrics, such as CO2 emissions or model size, which the tool uses to generate energy efficiency results. These results include an energy efficiency label and a breakdown of the label to explain implications and potential improvements. With these results, data scientists can make decisions for future training phases, whether to retrain the same model or explore other models.

Inference

For the inference phase, GAISSALabel can analyze various results, such as the time it takes to receive a response or the complexity of the model's computational operations, among others. Using these results, GAISSALabel also generates energy efficiency labels, benefiting two target audiences.

Firstly, software engineers can utilize the information provided by the label to make decisions regarding the inference phase, such as altering the deployment architecture. Simultaneously, an end-user of the model, not an expert in the domain, can study the generated label. This enables them to use the information presented by the label to make decisions, such as opting for a model with a lower environmental impact.

Results Generation

The tool offers users various avenues to evaluate the efficiency of their machine learning models during both training and inference phases.

Firstly, for both training and inference, a form is provided to collect results on metrics and other information. Clear instructions guide the user on how to calculate or obtain the necessary inputs. Alternatively, users can upload configuration files to the application for both training and inference. Moreover, for the study of the inference phase, a one-click result acquisition process is available. This allows users to input the deployment location of the model to be studied and provide some example inputs.

Regardless of the chosen method, users have the ability to retrieve previous evaluations of training and inference. This includes all energy efficiency results calculated from the data collected from various artificial intelligence repositories.

Management

On the other hand, GAISSALabel has sections restricted to certain tool QA Managers. These managers have access to three additional sections.

The first involves the management of synchronizations with external machine learning model providers, such as Hugging Face. Secondly, managers can manage the metrics and additional information used to evaluate training and inferences. Finally, a management screen is provided for the calculation tools offered to users as options for obtaining energy efficiency results from files.

Application Architecture and Repository Structure

The architecture of the application follows a three-layer structure: the presentation layer, the domain layer, and the persistence layer. Additionally, it is physically distributed into a client and a server.

Application architecture

Starting with the client (frontend), on the right side of the figure, there are a series of views and controllers that respond to the various functionalities described earlier. These components can be grouped into those displaying energy efficiency results, those collecting user data, and, finally, those related to management.

On the other hand, the server (backend) is a web server. In this sense, it has an Application Programming Interface (API) from which the client has access to all the functionalities offered. Each of these functionalities is controlled by different domain controllers, which interact with the other components of the server's domain layer.

Firstly, the efficiency calculator is responsible for evaluating the efficiency of the studied trainings and inferences. It receives the corresponding data and generates results. On the other hand, adapters or connectors with model providers allow connection to external repositories to update system data. Finally, data interfaces are elements corresponding to the logical domain layer of the application. They enable the interaction of controllers from this layer with the database of the logical persistence layer through persistence controllers.

This repository encompasses both the client (frontend) of the application and the server (backend). These two components give their names to the two main directories of the repository.

Frontend

The frontend coding is developed using the Vue.js JavaScript framework. This technology allows the definition of various components that have their controller (programmed in JavaScript) and corresponding views (designed using HTML and CSS).

Inside the "src" folder, the following components stand out:

  • App.vue: It is the definition of the client application and all its components.
  • router.js: Defines the routes through which the application can be navigated.
  • utils.js: A file with various useful functions used throughout the coding.
  • views: Main views of all screens in the application.
  • components: Views included in the previous views. These are elements reused in different main views.
  • controllers: Controllers of the presentation layer.

Backend

For the server to respond to client requests and provide different functionalities, a REST API is implemented. In this sense, the server uses the Django Rest Framework library, built on the Django framework, and programmed in the Python language.

On one hand, Django facilitates the definition and implementation of domain controllers needed for domain logic. Additionally, it helps define various business concepts in the domain model. On the other hand, Django Rest Framework allows the definition of the REST API from a route controller. Thus, when the server receives a request, it is redirected to the corresponding domain controller.

Regarding the persistence layer, Django itself implements a data interface that adapts to the defined database type. In this case, the database is PostgreSQL, one of the databases compatible with Django.

The backend folder contains:

  • requirements.txt: Dependencies necessary for the server.
  • gaissalabel: Definition of the server application.
  • api: Files that define and implement the application's API.
  • controllers: Domain controllers of the application.
  • efficiency_calculators: Components responsible for the calculation and generation of energy efficiency results.
  • connectors: Adapters with external AI model providers, such as Hugging Face.

Deployment

GAISSALabel is deployed on a single virtual machine provided by UPC, running the Linux Ubuntu operating system. This includes both physical components (client and server) of the application. However, their connection is via the Internet, as they are deployed on different ports of the machine (port 80 for the client and port 81 for the server). To perform this deployment, the client uses Nginx, while the server, in addition to Nginx, also uses Gunicorn.

To access the virtual machine, one needs to connect via SSH to the virtual machine. Inside the main folder of the machine, this repository is cloned. The machine puts the "deploy" branch of this repository into production. Therefore, when there are changes in the main branch of the repository, the following steps are performed:

  1. Access the machine, download the changes, and merge into the "deploy" branch.

    git checkout main
    git pull
    git checkout deploy
    git merge main
    
  2. Next, update changes to the database structure:

    cd backend
    source env/bin/activate
    python manage.py migrate
    

    In case it is the first time you are running the project or if you have not yet imported the carbon intensity data, you can do so with the following command:

    python manage.py import_carbon_intensity
    

    Alternatively, if using Poetry for dependency management, activate the Poetry virtual environment (instead of the above source env/bin/activate command):

    source /var/www/gaissalabel/venvs/ml-gaissalabel-py3.11/bin/activate
    
  3. Install any new backend requirements: (assuming step 2 has been performed)

    pip install -r requirements.txt
    
  4. Set up environment variables (FIRST TIME ONLY): Create .env file in backend directory from the provided template and fill in the required values.

    cp .env.example .env
    nano .env
    
  5. Restart the backend:

    sudo systemctl daemon-reload
    sudo systemctl restart gunicorn
    sudo systemctl restart nginx
    
  6. Set up Node.js (FIRST TIME ONLY - if nvm not installed):

    # Install nvm (Node Version Manager)
    curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
    
    # Load nvm
    export NVM_DIR="$HOME/.nvm"
    source "$NVM_DIR/nvm.sh"
    
    # Install Node.js LTS version (v20 or later required)
    nvm install --lts
    nvm use --lts
    
  7. Set up environment variables (FIRST TIME ONLY): Create .env.production file in frontend directory from the provided template and fill in the required values.

    cd frontend
    cp .env.production.example .env.production
    nano .env.production
    
  8. Build and deploy the frontend:

    # Install dependencies and build
    npm install
    npm run build
    
    # Restart nginx
    sudo systemctl restart nginx
    

Get Started

Would you like to download the repository locally and run it? Follow these steps:

  1. Clone the repository using the command git clone.

  2. Access the cloned folder.

  3. You will need to start the backend and frontend separately. Starting with the backend:

    • Navigate, create, and activate a virtual environment:

      cd backend
      python3 -m venv env
      source env/bin/activate
      
    • Install dependencies:

      pip install -r requirements.txt
      
    • Set up environment variables: Create a .env file in the backend directory from the provided template and fill in the required values.

      cp .env.example .env
      nano .env
      
    • Apply database migrations:

      python manage.py migrate
      
    • Import carbon intensity data (first time only):

      python manage.py import_carbon_intensity
      

      For more information about this command, see Carbon Intensity Import Documentation.

    • Run the development server:

      python manage.py runserver
      
    • For subsequent runs (after initial setup):

      source env/bin/activate
      python manage.py runserver
      
    • For migrations during development:

      python manage.py makemigrations
      python manage.py migrate
      
  4. Set up and run the frontend:

    • Navigate to the frontend directory:

      cd frontend
      
    • Install Node.js (if not already installed):

      You'll need Node.js LTS version (v20 or later). If you don't have nvm (Node Version Manager) installed:

      # Install nvm
      curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
      
      # Load nvm
      export NVM_DIR="$HOME/.nvm"
      source "$NVM_DIR/nvm.sh"
      
      # Install Node.js LTS version
      nvm install --lts
      nvm use --lts
      
    • Set up environment variables: Create a .env.development file in the frontend directory from the provided template and fill in the required values.

      cp .env.production.example .env.development
      nano .env.development
      
    • Install dependencies:

      npm install
      
    • Run the development server:

      npm run dev
      
    • For production build: Create a .env.production file in the frontend directory from the provided template and fill in the required values.

      cp .env.production.example .env.production
      nano .env.production
      npm run build
      

## Testing

To ensure the functionality of the GAISSA tools, we have implemented a series of tests. These tests are designed to verify the correct operation of both the frontend and backend components of the application. To run the tests, follow these steps:

Frontend Tests

Navigate to the frontend directory and run the tests using the following commands:

cd frontend
npm install
npm run test:unit

Backend Tests

To run the backend tests, navigate to the backend directory and execute the following commands:

cd backend

python manage.py test apps.gaissa_roi_analyzer.tests.unit_tests.test_models
python manage.py test apps.gaissa_roi_analyzer.tests.unit_tests.test_serializers

python manage.py test apps.gaissa_roi_analyzer.tests.integration_tests.test_api

Acknowledgements

We acknowledge the raphischer/strep repository for its closely related work, particularly in the design of our energy efficiency labels. For transparency, the MIT license has been applied in line with this collaboration on the affected files. Explore raphischer/strep on GitHub for further details. Our approach to indexing and classification also draws upon industry standards, streamlining comparisons within our tool. We're thankful for these contributions that have shaped our project.

About Us and How to Contribute

GAISSALabel is a tool developed within the GAISSA project at the GESSI research group. It serves as a valuable resource for evaluating both the training and inference phases of machine learning models.

By using our platform, you become part of a community of researchers and developers dedicated to addressing the environmental impact of artificial intelligence. Your involvement can contribute to the advancement of sustainable practices in machine learning, fostering a greener future. Let's embark on this journey towards Green AI together!

Please don't hesitate to contact us to participate, collaborate, and contribute to the growth of this tool. Our location is at Campus Diagonal Nord, Building Ω (Omega), C. Jordi Girona, 1-3, 08034 Barcelona. You can reach us at 93 401 74 86. Thank you!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors