-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SemanticSegmentationTask: add class-wise metrics #2130
base: main
Are you sure you want to change the base?
Conversation
Given that most metrics of interest are broken (e.g., all of them when I'm saying this because these are only the issues I've found so far, but I've also noticed other suspicious things like the fact that my classwise recall values are not the same as those in the confusion matrix when you normalize it with respect to ground truth (I haven't checked if this is also the case with precision, so when the matrix is normalized column-wise). I'm also pretty confident that if all of this is wrong then micro averaging is also probably wrong. I should be pretty easy to compute all these metrics straight from the confusion matrix (assuming it at least is correct) and I've actually tried to reimplent them this way but it hasn't really been a priority because I’ve found that all these wrong (?) values are basically a lower bound of the actual ones. If you look at the official implementations, this is actually what they are doing, and my guess is that they have a bug in their logic later on. But indeed all these metrics inherit from I’m actually pretty dumbfounded these issues are not a top priority for the TorchMetrics team and instead they focus on adding to their docs but to each their own… |
@DimitrisMantas good call on my ignoring the |
Sure, that makes sense; please excuse the rant haha. |
Applied
Note that Val is unaffected:
For a task with 2 classes there are a grand total of |
I just set to be explicit but I think that pytorch lightning or torchmetrics auto sets on_epoch to be False for training and True for all else. |
You need to set both |
@DimitrisMantas now just performing |
Not sure about this failing test |
Must be an issue with on of the minimum versions of the package since it's passing for the other tests. |
We can definitely increase the min version of torchmetrics if we need to. |
@adamjstewart I'm inclined to close this PR as I don't feel confident I understand the behaviour of torchmetrics in this implementation. Elsewhere I am using the on_stage_epoch_end hooks and feel confident I do understand the behaviour with that approach. Overall I think this should be a change we make from a place of understanding, and in smaller steps than this PR takes |
torchmetrics=1.1.0 test errors here
See this was added in 1.1.1 |
After discussion with torchmetrics devs, created Lightning-AI/torchmetrics#2683 |
That's such a complicated minimal reproducible example lol. |
I tried making a self-contained minimal reproducible example but couldn't get one working and gave up. |
It just hit me that we should be a bit careful with which metrics we add to avoid unnecessary computation; class-wise accuracy and recall are the same thing and so are micro-averaged accuracy, precision, and recall. |
Any sense of how much these metrics actually add to processing time? If it isn't noticeable by a human, I don't particularly care about the overhead. |
Haven't measured it but doubt it's much. |
I believe Lightning offers tools for profiling |
@adamjstewart @DimitrisMantas per this comment we should be using the |
I see the issue, but I must be missing something because my own code uses the standard logging tools and metric collections work just fine. Altough by "work", I mean I don't get an error. Other than that, I found out a couple of days ago that the diagonal of my confusion matrix doesn't match the class accuracies (which it should), so I'm obviously not using the API correctly... Edit: I have at least one mistake where I do Edit 2: Aaaaand I finally got your error... |
Ok, so basically what the torchmetrics guys are saying is that automatic logging is not supported for metric collections? |
@robmarkcole I can confirm the recommended approach yields consistent results. |
Sorry it's taken me so long to review. I was originally hung up on the hack required to support Only remaining concern is that the code required to loop over all metrics and averages actually makes the code more complicated and difficult to read than avoiding loops entirely. If we want to add new metrics in the future, it looks non-straightforward. I wonder if we can loop over averages only and still keep things simple. I would also really like to see this done for |
@adamjstewart please see above comment on using |
Addresses #2121 for segmentation. Mostly copied from @isaaccorley as here - he is additionally passing
on_epoch=True
which is NOT adopted hereOutput metrics for ChaBud binary task with
labels=['background', 'burned_area']
This dataset nicely illustrates why class labels are required - burned_area is minority class and is not learnt