Skip to content

This project implements a hybrid log classification system, combining three complementary approaches to handle varying levels of complexity in log patterns. The classification methods ensure flexibility and effectiveness in processing predictable, complex, and poorly-labeled data patterns.

Notifications You must be signed in to change notification settings

yashwanthjack/log-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Log Classification With Hybrid Classification Framework

This project implements a hybrid log classification system, combining three complementary approaches to handle varying levels of complexity in log patterns. The classification methods ensure flexibility and effectiveness in processing predictable, complex, and poorly-labeled data patterns.


Classification Approaches

  1. Regular Expression (Regex):

    • Handles the most simplified and predictable patterns.
    • Useful for patterns that are easily captured using predefined rules.
  2. Sentence Transformer + Logistic Regression:

    • Manages complex patterns when there is sufficient training data.
    • Utilizes embeddings generated by Sentence Transformers and applies Logistic Regression as the classification layer.
  3. LLM (Large Language Models):

    • Used for handling complex patterns when sufficient labeled training data is not available.
    • Provides a fallback or complementary approach to the other methods.

architecture


Folder Structure

  1. datasets/:
    • This folder contains resource files such as test CSV files, output files, etc.

Setup Instructions

  1. Install Dependencies: Make sure you have Python installed on your system. Install the required Python libraries by running the following command:

    pip install -r requirements.txt

  1. Setup Google Colab If running the notebook in Google Colab, mount your Google Drive to access the dataset
from google.colab import drive
drive.mount('/content/drive')

  1. Prepare Dataset Place your synthetic_logs(2).csv dataset in the appropriate directory.

Update file paths in the notebook or scripts if necessary.


Usage

Upload a CSV file containing logs for classification. Ensure the file has the following columns:

  • source
  • log_message

The output will be a CSV file with an additional column target_label, which represents the classified label for each log entry.


About

This project implements a hybrid log classification system, combining three complementary approaches to handle varying levels of complexity in log patterns. The classification methods ensure flexibility and effectiveness in processing predictable, complex, and poorly-labeled data patterns.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published