logchimera
was born out of a research innitiative (Log Parsing Evaluation in the Era of Modern Software Systems), as a consequence of a general lack of access to heterogeneous log data typically found in industry. With logchimera
you can generate and evaluate log parsing on heterogeneous industry-like data from publicly available logs. The name of the tool is inspired by the mythological creature chimera, which symbolizes a fusion or combination of different elements; and in this case, it reflects heterogeneity by enabling bringing together diverse formats from various logs to resemble industry-like contexts.
We display below how you can use logchimera
. Currently, logchimera
can do the following:
- Estimate heterogeneity for a log dataset
- Increase the heterogeneity for a log dataset
- Transform industry data into publicly available data with equivalent properties
To use logchimera
, make sure you first follow the set-up section corresponding to your system (currently available for Linux or Mac).
To estimate log heterogeneity, simply provide the path to your file. Currently, logchimera
is able to estimate heterogeneity for a file of arbitrary size. The only requirement that needs to be met is for your file to contain the log lines separated by a new line character. Below you can see a sample of what logchimera
would expect in terms of an input file.
workerEnv.init() ok /etc/httpd/conf/workers2.properties
mod_jk child init 1 -2
jk2_init() Found child 5785 in scoreboard slot 6
...
# example estimating heterogeneity in python shell
$ python
Python 3.9.17 (main, Jul 5 2023, 20:41:20)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from logchimera.logchimera import estimate_heterogeneity
>>> from logchimera.datasets import get_example_data_for_estimating_heterogeneity
>>> example_file_path = get_example_data_for_estimating_heterogeneity()
>>> estimate_heterogeneity(example_file_path)
# Returns a 3-decimal floating-point value in the range [0, 1], e.g., 0.222; higher means more heterogeneous.
...
# example estimating heterogeneity in python script
from logchimera.logchimera import estimate_heterogeneity
from logchimera.datasets import get_example_data_for_estimating_heterogeneity
example_file_path = get_example_data_for_estimating_heterogeneity()
h_level = estimate_heterogeneity(example_file_path) # Returns a 3-decimal floating-point value in the range [0, 1], e.g., 0.222; higher means more heterogeneous.
print(h_level)
...
To increase log heterogeneity, simply provide the path to your file. Currently, logchimera
is able to increase heterogeneity for a file of arbitrary size. The only requirement that needs to be met is for your file to contain the log lines separated by a new line character.
To transform your industry log dataset, you simply need to provide a list of files (one or more) with logs. Currently, logchimera
assumes that every input file represents a different application. For example, if you provide 10 input files, logchimera
is going to assume that each file contains logs from 10 different applications. In turn, logchimera
returns 10 files with equivalent heterogeneity, using publicly available data.
- Install
miniconda
$ mkdir -p ~/miniconda3
$ curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
$ rm -rf ~miniconda3/miniconda.sh
- Initialize
miniconda
for bash / zsh shells
$ ~/miniconda3/bin/conda init bash
$ ~/miniconda3/bin/conda init zsh
- Create
logchimera
virtual environment and activate it
$ conda create --name logchimera python=3.9 -y
$ conda activate logchimera
$ pip install poetry
- Install package
$ git clone https://github.com/spetrescu/logchimera.git
$ cd logchimera
$ poetry install
- Check if installation was successfull
$ python
Python 3.9.16 (main, Mar 8 2023, 04:29:44)
[Clang 14.0.6 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from logchimera.logchimera import function_test
>>>
- Install
miniconda
$ mkdir -p ~/miniconda3
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
$ rm -rf ~/miniconda3/miniconda.sh
- Initialize
miniconda
for bash / zsh shells
$ ~/miniconda3/bin/conda init bash
$ ~/miniconda3/bin/conda init zsh
- Create
logchimera
virtual environment and activate it
$ conda create --name logchimera python=3.9 -y
$ conda activate logchimera
$ pip install poetry
- Install package
$ git clone https://github.com/spetrescu/logchimera.git
$ cd logchimera
$ poetry install
- Check if installation was successfull
$ python
Python 3.9.17 (main, Jul 5 2023, 20:41:20)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from logchimera.logchimera import estimate_heterogeneity
>>>
$ pip install logchimera
To reproduce the experiments conducted during the research initiative that led to the creation of logchimera
, please refer to the ARTIFACT.md file.
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
logchimera
was created by Stefan Petrescu. It is licensed under the terms of the MIT license.
The initial package structure of logchimera
was created with cookiecutter
and the py-pkgs-cookiecutter
template.
To cite this package, you can use the following BibTeX entry:
@INPROCEEDINGS{petrescu2023issre,
author={Petrescu, Stefan and den Hengst, Floris and Uta, Alexandru and Rellermeyer, Jan S.},
booktitle={34th IEEE International Symposium on Software Reliability Engineering (ISSRE)},
title={Log Parsing Evaluation in the Era of Modern Software Systems},
year={2023}}