logchimera

logchimera was born out of a research innitiative (Log Parsing Evaluation in the Era of Modern Software Systems), as a consequence of a general lack of access to heterogeneous log data typically found in industry. With logchimera you can generate and evaluate log parsing on heterogeneous industry-like data from publicly available logs. The name of the tool is inspired by the mythological creature chimera, which symbolizes a fusion or combination of different elements; and in this case, it reflects heterogeneity by enabling bringing together diverse formats from various logs to resemble industry-like contexts.

Usage

We display below how you can use logchimera. Currently, logchimera can do the following:

Estimate heterogeneity for a log dataset
Increase the heterogeneity for a log dataset
Transform industry data into publicly available data with equivalent properties

To use logchimera, make sure you first follow the set-up section corresponding to your system (currently available for Linux or Mac).

1. Estimate heterogeneity

To estimate log heterogeneity, simply provide the path to your file. Currently, logchimera is able to estimate heterogeneity for a file of arbitrary size. The only requirement that needs to be met is for your file to contain the log lines separated by a new line character. Below you can see a sample of what logchimera would expect in terms of an input file.

Sample input file:

workerEnv.init() ok /etc/httpd/conf/workers2.properties
mod_jk child init 1 -2
jk2_init() Found child 5785 in scoreboard slot 6
...

Example 1 (estimate heterogeneity)

# example estimating heterogeneity in python shell
$ python
Python 3.9.17 (main, Jul  5 2023, 20:41:20) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from logchimera.logchimera import estimate_heterogeneity
>>> from logchimera.datasets import get_example_data_for_estimating_heterogeneity
>>> example_file_path = get_example_data_for_estimating_heterogeneity()
>>> estimate_heterogeneity(example_file_path)
# Returns a 3-decimal floating-point value in the range [0, 1], e.g., 0.222; higher means more heterogeneous.
...

Example 2 (estimate heterogeneity)

# example estimating heterogeneity in python script
from logchimera.logchimera import estimate_heterogeneity
from logchimera.datasets import get_example_data_for_estimating_heterogeneity
example_file_path = get_example_data_for_estimating_heterogeneity()
h_level = estimate_heterogeneity(example_file_path) # Returns a 3-decimal floating-point value in the range [0, 1], e.g., 0.222; higher means more heterogeneous.
print(h_level)
...

2. Increase heterogeneity

To increase log heterogeneity, simply provide the path to your file. Currently, logchimera is able to increase heterogeneity for a file of arbitrary size. The only requirement that needs to be met is for your file to contain the log lines separated by a new line character.

3. Transform industry data into publicly available logs

To transform your industry log dataset, you simply need to provide a list of files (one or more) with logs. Currently, logchimera assumes that every input file represents a different application. For example, if you provide 10 input files, logchimera is going to assume that each file contains logs from 10 different applications. In turn, logchimera returns 10 files with equivalent heterogeneity, using publicly available data.

Set-up MacOS

Install miniconda

$ mkdir -p ~/miniconda3
$ curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
$ rm -rf ~miniconda3/miniconda.sh

Initialize miniconda for bash / zsh shells

$ ~/miniconda3/bin/conda init bash
$ ~/miniconda3/bin/conda init zsh

Create logchimera virtual environment and activate it

$ conda create --name logchimera python=3.9 -y
$ conda activate logchimera
$ pip install poetry

Install package

$ git clone https://github.com/spetrescu/logchimera.git
$ cd logchimera
$ poetry install

Check if installation was successfull

$ python
Python 3.9.16 (main, Mar  8 2023, 04:29:44) 
[Clang 14.0.6 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from logchimera.logchimera import function_test
>>>

Set-up Linux

Install miniconda

$ mkdir -p ~/miniconda3
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
$ rm -rf ~/miniconda3/miniconda.sh

Initialize miniconda for bash / zsh shells

$ ~/miniconda3/bin/conda init bash
$ ~/miniconda3/bin/conda init zsh

Create logchimera virtual environment and activate it

$ conda create --name logchimera python=3.9 -y
$ conda activate logchimera
$ pip install poetry

Install package

$ git clone https://github.com/spetrescu/logchimera.git
$ cd logchimera
$ poetry install

Check if installation was successfull

$ python
Python 3.9.17 (main, Jul  5 2023, 20:41:20) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from logchimera.logchimera import estimate_heterogeneity
>>>

Installation

$ pip install logchimera

Artifact: Reproduce Experiments from Original Paper

To reproduce the experiments conducted during the research initiative that led to the creation of logchimera, please refer to the ARTIFACT.md file.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

logchimera was created by Stefan Petrescu. It is licensed under the terms of the MIT license.

Credits for Initial Package Structure

The initial package structure of logchimera was created with cookiecutter and the py-pkgs-cookiecutter template.

Citation

To cite this package, you can use the following BibTeX entry:

@INPROCEEDINGS{petrescu2023issre,
author={Petrescu, Stefan and den Hengst, Floris and Uta, Alexandru and Rellermeyer, Jan S.},
booktitle={34th IEEE International Symposium on Software Reliability Engineering (ISSRE)},
title={Log Parsing Evaluation in the Era of Modern Software Systems},
year={2023}}

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github/workflows		.github/workflows
docs		docs
src/logchimera		src/logchimera
tests		tests
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
ARTIFACT.md		ARTIFACT.md
CHANGELOG.md		CHANGELOG.md
CONDUCT.md		CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

logchimera

Usage

1. Estimate heterogeneity

Sample input file:

Example 1 (estimate heterogeneity)

Example 2 (estimate heterogeneity)

2. Increase heterogeneity

3. Transform industry data into publicly available logs

Set-up MacOS

Set-up Linux

Installation

Artifact: Reproduce Experiments from Original Paper

Contributing

License

Credits for Initial Package Structure

Citation

About

Releases

Packages

Languages

License

spetrescu/logchimera

Folders and files

Latest commit

History

Repository files navigation

logchimera

Usage

1. Estimate heterogeneity

Sample input file:

Example 1 (estimate heterogeneity)

Example 2 (estimate heterogeneity)

2. Increase heterogeneity

3. Transform industry data into publicly available logs

Set-up MacOS

Set-up Linux

Installation

Artifact: Reproduce Experiments from Original Paper

Contributing

License

Credits for Initial Package Structure

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages