WasteAnnotator: Automated Component Annotation Pipeline

About the Project

WasteAnnotator is an automated pipeline designed to extract and annotate components from abandoned GitHub projects. By analyzing the project's dependency graph, the tool identifies components and labels them based on the files they contain. The final output is a structured file detailing the components and their associated files, providing a comprehensive view of the project's architecture.

Architecture Overview

The WasteAnnotator tool consists of several key modules:

Finder: Retrieves projects from a repository service (currently GitHub only) based on specified criteria.
GraphExtractor: Parses the project's dependency graph to identify potential components (currently uses Arcan).
Annotator: Uses semantic techniques to label and annotate components based on their file contents (currently uses AutoFL).
CommunityExtractor: Identifies communities within the component structure for further insights (via customizable algorithms from cdlib).
Exporter: Outputs the processed information in configurable formats (e.g., JSON).

Configurations for each module are defined in YAML files located in the `config` folder, allowing for easy customization of behavior and parameters. For each module new classes can be added by extending the base classes in the directory.

Getting Started

Prerequisites

Docker v4.25 or higher for containerization.
Git for repository cloning.
(Optional) Python 3.10 if running the application outside of Docker.
(Optional) Gurobi License If using Bayan community detection algorithm.

Installation

Clone the Repository

git clone https://github.com/SasCezar/WasteAnnotator.git
cd WasteAnnotator

Set Up Environment Variables
- (Optional) Create a .env file in the project root to define any necessary environment variables (e.g., GitHub tokens, paths).
Build and Start Services with Docker
```
docker compose up --build
```

This will initialize all required services and set up the necessary environment for running the WasteAnnotator pipeline.

Usage

Running the Pipeline

The main entry point for the WasteAnnotator pipeline is src/main.py. The pipeline can be executed either using Docker or directly via Python.

Using Docker

Ensure Services are Running
```
docker compose up
```
Run the Main Pipeline
- The default service automatically runs main.py within the Docker container, which initiates the component extraction and annotation process.

Running Locally (Without Docker)

Install Dependencies with Poetry
```
poetry install
```
Activate the Poetry Environment
```
poetry shell
```
Execute the Script
```
python src/main.py
```

Configuration

Configuration files are located in the config directory, which contains settings for different modules (e.g., annotator, community, exporter). Each YAML file can be customized to alter the behavior of the pipeline components:

config/main.yaml: The primary configuration file, referencing all module-specific settings.
Module-specific YAML files: Adjust parameters for finer control, such as finder/github_archived_java.yaml to change GitHub project retrieval criteria.

The pipeline uses Hydra for configuration management, allowing runtime configuration overrides. For example:

python src/main.py finder=custom_finder.yaml graphextractor=arcan.yaml

Contributing

Contributions are welcome! Please fork the repository and use a feature branch to work on your changes. When ready, submit a pull request for review.

License

This repository was previously licensed under the MIT License. However, it includes code that is licensed under the GNU General Public License (GPL). As a result, the entire project is now licensed under the GPL 3. All previous and future versions must comply with this license.

Thank you for using and contributing to WasteAnnotator! If you have any questions or need support, please open an issue or contact the maintainers.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
Analysis/src		Analysis/src
ComponentAnnotator/src		ComponentAnnotator/src
WasteAnnotator		WasteAnnotator
config		config
notebooks		notebooks
sql		sql
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WasteAnnotator: Automated Component Annotation Pipeline

Table of Contents

About the Project

Architecture Overview

Configurations for each module are defined in YAML files located in the `config` folder, allowing for easy customization of behavior and parameters. For each module new classes can be added by extending the base classes in the directory.

Getting Started

Prerequisites

Installation

Usage

Running the Pipeline

Using Docker

Running Locally (Without Docker)

Configuration

Contributing

License

About

Releases

Packages

Contributors 3

Languages

License

SasCezar/AnnotatedWasteComponents

Folders and files

Latest commit

History

Repository files navigation

WasteAnnotator: Automated Component Annotation Pipeline

Table of Contents

About the Project

Architecture Overview

Configurations for each module are defined in YAML files located in the config folder, allowing for easy customization of behavior and parameters. For each module new classes can be added by extending the base classes in the directory.

Getting Started

Prerequisites

Installation

Usage

Running the Pipeline

Using Docker

Running Locally (Without Docker)

Configuration

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Configurations for each module are defined in YAML files located in the `config` folder, allowing for easy customization of behavior and parameters. For each module new classes can be added by extending the base classes in the directory.

Packages