Welcome to the Win-Tie-Loss Performance Analysis, an easy-to-use R package for comparing multiple methods across different datasets. This tool calculates and visualizes the win-tie-loss outcomes between methods using a single command. Whether you're working on machine learning, bioinformatics, or any other domain where method comparison is crucial, this tool has you covered.
- Automatic Win-Tie-Loss Calculation: Quickly compute win-tie-loss comparisons across multiple methods.
- Flexible Measure Types: This type supports both maximization (e.g., precision) and minimization (e.g., hamming loss) objectives.
- Customizable Visualizations: Generate professional bar plots to visualize your results with complete control over layout and design.
- Easy-to-Use: Provide your data in CSV format, call the functions, and get your results.
@misc{WTL2024,
author = {Elaine Cecília Gatto},
title = {WinTieLoss: An R Package for Comparative Analysis of Machine Learning Methods},
year = {2024},
doi = {10.13140/RG.2.2.17131.35366/1},
note = {R package version 0.1.0. Licensed under CC BY-NC-SA 4.0},
url = {https://github.com/cissagatto/WinTieLoss}
}
A Win-Tie-Loss chart is a visual tool for comparing the performance of different algorithms or methods across multiple tasks or datasets. This type of chart summarizes how often a method "wins," "ties," or "loses" compared to other methods based on a specific performance metric.
In the context of Machine Learning, models are frequently compared to determine which offers the best performance in accuracy, recall, F1-score, or any other metric of interest. The Win-Tie-Loss chart is handy when dealing with multiple methods and datasets, as it provides a clear and aggregated view of how each method performs relative to others.
To understand the mathematical concept behind a Win-Tie-Loss chart, consider a scenario where you have
-
Method Combinations: For each pair of methods
$(m_{i},m_{k})$ where$i \neq k $ , you compare the results$P_{i,j}$ and$P_{k,j}$ on each dataset$D_{j}$ . -
Counting Wins, Ties, and Losses:
-
Win: Method
$m_{i}$ wins against method$m_{k}$ on dataset$D_{j}$ if$P_{i,j} > P_{k,j}$ . -
Tie:
$m_{i}$ ties with$m_{k}$ on dataset$D_{j}$ if$P_{i,j} = P_{k,j}$ . -
Loss:
$m_{i}$ loses to$m_{k}$ on dataset$D_{j}$ if$P_{i,j} < P_{k,j}$ .
-
Win: Method
-
Aggregating Results: After comparing all method pairs and datasets, you count the total number of wins, ties, and losses for each method relative to the others.
Mathematically, we can define the counts
$W_{i} = \sum_{k \neq i} \sum_{j=1}^{N} \text{I}(P_{i,j} > P_{k,j})$ $T_{i} = \sum_{k \neq i} \sum_{j=1}^{N} \text{I}(P_{i,j} = P_{k,j})$ $L_{i} = \sum_{k \neq i} \sum_{j=1}^{N} \text{I}(P_{i,j} < P_{k,j})$
where
In the context of Machine Learning, the Win-Tie-Loss chart helps answer important questions like:
- Which method is consistently better?: A method with more "wins" across different datasets can be considered more robust.
- Are there comparable methods?: Many "ties" might indicate that some methods perform similarly.
- Which methods are consistently worse?: A method with more "losses" might be inferior or unsuitable for the task at hand.
The chart provides a quick visualization of these aspects, making it easier to decide which model or algorithm to use in future tasks or experiments.
Suppose you have three classification models
# install.packages("devtools")
library("devtools")
devtools::install_github("https://github.com/cissagatto/WinTieLoss")
library(WinTieLoss)
Prepare your dataset in a CSV format. The CSV should have the following structure:
datasets | method 1 | method 2 | method ... | method M |
---|---|---|---|---|
dataset 1 | 0.85 | 0.80 | ... | 0.90 |
dataset 2 | 0.88 | 0.82 | ... | 0.89 |
... | ... | ... | ... | ... |
dataset D | 0.90 | 0.85 | ... | 0.92 |
Save your CSV file in the Data
folder or specify a custom path when calling the functions.
To compute the win-tie-loss, load your data and call the function.
FolderRoot = "~/pathToYourRootFolder"
FolderData = "~/pathToYourDataFolder"
FolderResults = "~/pathToYourResultsFolder"
library(WinTieLoss)
name.file = "~/WinTieLoss/Data/clp.csv"
data = data.frame(read.csv(name.file))
data = data[,-1]
methods.names = colnames(data)
df_res.mes <- wtl.measures()
filtered_res.mes <- filter(df_res.mes, names == "clp")
measure.type = as.numeric(filtered_res.mes$type)
res = win.tie.loss.compute(data = data, measure.type)
res$method <- factor(res$method, levels = methods.names)
res <- res[order(res$method), ]
save = paste(FolderResults, "/clp.csv", sep="")
write.csv(res, save, row.names = FALSE)
data
: Your dataset in CSV format, read into a DataFrame.measure.type
:1
if a higher value indicates better performance (e.g., precision).0
if a lower value indicates better performance (e.g., hamming loss).
Generate a bar plot to visualize your win-tie-loss comparison:
# Calculate the sum of win, tie, and loss for each method
soma = apply(data[,-1], 1, sum)
max.value = soma[1]
half.value = soma[1] / 2
wtl = c("win", "tie", "loss")
colnames(res) = wtl
save = paste(FolderResults, "/clp.pdf", sep="")
win.tie.loss.plot(data = res,
names.methods = methods.names,
name.file = save,
max.value = max.value,
width = 18,
height = 10,
bottom = 2,
left = 11,
top = 0,
right = 1,
size.font = 2.0,
wtl = wtl)
data
: The result fromwin.tie.loss.compute
.names.methods
: A vector of method names to label your plot.name.file
: The path and file name to save the plot as a PDF.max.value
: Pay attention to MAX.VALUE. The legend may be on top of the graph if max.value is exact.- Therefore, pass a slightly higher value, so the legend will be a little further in front of the graph.
width
,height
: Dimensions of the PDF.bottom
,left
,top
,right
: Margins for the plot.size.font
: Font size for the plot labels.wtl
: A vector with labels for "Win", "Tie", and "Loss" (you can change to your language).
If you wish, there are also functions to help you analyze the results. For example, if you compared 30 models using 50 datasets, it may be difficult to interpret the results. See the functions in the analyis.R script, they were written to perform some calculations. There are examples of how to use them in the example.R file, in the example folder.
For more detailed documentation on each function, check out the ~/WinTieLoss/docs
folder.
A complete example is available in ~/WinTieLoss/example
folder.
Ensure the following folder structure is set up:
FolderRoot
: Root directory of the project.FolderData
: Directory where CSV data files are stored.FolderResults
: Directory where results and plots are saved.
We welcome contributions from the community! If you have suggestions, improvements, or bug fixes, please submit a pull request or open an issue in the GitHub repository.
- This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
- This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.
- The authors also thank the Brazilian research agencies FAPESP financial support.
For any questions or support, please contact:
- Prof. Elaine Cecilia Gatto ([email protected])
| Site | Post-Graduate Program in Computer Science | Computer Department | Biomal | CNPQ | Ku Leuven | Embarcados | Read Prensa | Linkedin Company | Linkedin Profile | Instagram | Facebook | Twitter | Twitch | Youtube |
Start making performance analysis with the Win-Tie-Loss Tool today! 🚀