Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take cost metrics into account #532

Open
mfeurer opened this issue Sep 19, 2018 · 4 comments
Open

Take cost metrics into account #532

mfeurer opened this issue Sep 19, 2018 · 4 comments
Labels
serverside These issues are present in the rest API and not fixable by the Python package.

Comments

@mfeurer
Copy link
Collaborator

mfeurer commented Sep 19, 2018

when running scikit-learn benchmarks.

@mfeurer mfeurer added the Good First Issue Issues suitable for people new to contributing to openml-python! label Jan 15, 2019
@mfeurer mfeurer added the Run OpenML concept label Feb 20, 2023
@v-parmar
Copy link
Contributor

Hey @mfeurer, can you give me some details about this issue? So that I can work on this issue.

@mfeurer
Copy link
Collaborator Author

mfeurer commented Apr 17, 2023

Hey, @v-parmar I just had a look in what I could suggest you, and found that the following would be a good approach to get started:

  1. Tackle issue Allow loading tasks without downloading split files #1245 to allow going through all tasks on OpenML without having to download them.
  2. Find a task with an associated cost matrix, for example via
    In [8]: for task_id in openml.tasks.list_tasks(task_type=openml.tasks.TaskType.SUPERVISED_CLASSIFICATION):
     ...:     try:
     ...:         task = openml.tasks.get_task(task_id, download_data=False, download_qualities=False)
     ...:     except:
     ...:         continue
     ...:     if task.cost_matrix is not None:
     ...:         print(task_id, task.cost_matrix)
    
  3. Integrate the cost matrix in the calculation of the metric, where the are often called sample_weights as in this metric.

@PGijsbers
Copy link
Collaborator

I don't think there is standard right now for how the cost matrix should be formatted. Some bad examples include “yes”, “adam”, or “1”. Perhaps we should wait with tackling this until we have a specified format?

@mfeurer
Copy link
Collaborator Author

mfeurer commented Apr 25, 2023

Could you please put this on the roadmap then? This seems like a basic thing that OpenML should handle.

@mfeurer mfeurer added serverside These issues are present in the rest API and not fixable by the Python package. and removed Good First Issue Issues suitable for people new to contributing to openml-python! Run OpenML concept labels Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serverside These issues are present in the rest API and not fixable by the Python package.
Projects
None yet
Development

No branches or pull requests

3 participants