Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MONK's problems categorical features are wrongly represented as continuous/Integer #87

Open
phoeinx opened this issue Nov 24, 2024 · 0 comments

Comments

@phoeinx
Copy link

phoeinx commented Nov 24, 2024

Describe the bug
"MONK's problems" features are all intended to be categorical by the paper creating the synthetic dataset.
In the UCI ML repository, they are all recorded as Integer. This leads to possible disadvantages for models trained on them.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://archive.ics.uci.edu/dataset/70/monk+s+problems, and have a look at the variables table showing Integer as type for all features.
  2. Open the MONK's problems competition paper freely available here: https://www.researchgate.net/publication/2293492_The_MONK's_Problems_A_Performance_Comparison_of_Different_Learning_Algorithms
  3. Look at page 2, section 1.1. "The problem" to see that all features are actually categorical.

Expected behavior
Correct representation as categorical features for features a1,a2,a3,a4,a5,a6.

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant