-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plot results for neural network experiments, across all genes #85
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -17,6 +17,7 @@ | |||
"source": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it expected that theres a big difference in the number of genes in comparison to the previous run? (51480 now, 35520 before)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "shape" of this dataframe is the total number of models, i.e. one for each combination of gene/cancer type/LASSO parameter/seed/CV fold. In the most recent run we covered a larger range of LASSO parameters, so it makes sense that the dataframe is considerably larger.
There are also two genes that are included now that weren't before, ALK and CIC. I slightly changed how the filtering for hypermutated samples works which results in these genes being included, so this is expected.
@@ -17,6 +17,7 @@ | |||
"source": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it also expected that there is a shift in the distribution from most being near 0 to not most being near 15000?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the result of the larger range of LASSO parameters I described before. It is expected, and I would argue that it makes more sense for the model comparisons since we're covering a more comprehensive range of "overfit" or non-sparse models now.
@@ -17,6 +17,7 @@ | |||
"source": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's something we've observed before for some genes/cancer types. We think it's because some mutations aren't all that correlated with gene expression in some cancer types (i.e. they're not drivers in that cancer type, or they don't occur frequently enough to train a good model, etc) so they end up overfitting, which is what we see here with good CV performance and bad performance on held-out UCEC samples.
Currently, we're just including/excluding cancer types based on mutation frequency/proportion. We could do something smarter but I think for this study where we have hundreds of genes/cancer types, it's probably fine to have the filter be pretty coarse-grained. A while back we tried filtering for cancer types where the model outperforms a baseline where the gene expression data is permuted, and it didn't affect the overall model selection results that much.
View / edit / reply to this conversation on ReviewNB nrosed commented on 2023-08-04T20:24:36Z legend is covering the bars, maybe you could put it horizontal below the plot. unless this is a just for you plot and not needed for the paper jjc2718 commented on 2023-08-15T18:12:44Z Yeah, this is a "just for me" plot. The ones that show the train/valid/test breakdown are what we'll show in the paper, so I spent more time making them pretty. |
@@ -0,0 +1,752 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its very interesting that the stratified holdout and the test peak very close to each other.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was interesting to us too! I think it's pretty typical though, at least for this mutation prediction problem across genes and cancer types. That's one of the main takeaways from our paper, that cross-validation performance is generally pretty predictive of generalization performance, and model simplicity may not be as important.
View / edit / reply to this conversation on ReviewNB nrosed commented on 2023-08-04T20:24:39Z It's really interesting how ARID2 and VHL perform so much better on the test, CCLE, dataset. Could this be caused by a cell-line specific effect? Like a mismatch in cancer type proportions to cell line proportions? Not sure if its important for your paper, its just very striking. jjc2718 commented on 2023-08-15T18:14:39Z Yeah, I'm not sure why there are so many genes that perform better on CCLE. It could be something technical or dataset-specific like what you're describing, or it could be something biological like the cell line data is just cleaner/more well-behaved in these cases than the tumor samples from TCGA. If I had more time maybe I'd try to detangle the two, but I have to start writing my thesis at some point, haha. |
View / edit / reply to this conversation on ReviewNB nrosed commented on 2023-08-04T20:24:40Z I don't fully understand this plot. Why is test error always at 0.6 even though the cv and train errors are going down? |
View / edit / reply to this conversation on ReviewNB nrosed commented on 2023-08-04T20:24:41Z yeah, I'm confused how the training error is far below the test error and why the test error is constant. To me this seems like a bias in the test set? |
View / edit / reply to this conversation on ReviewNB nrosed commented on 2023-08-04T20:24:43Z Ignore if this doesn't make sense, but maybe adding another comparator (like a dummy comparator the predicts the most common label) would convince readers that your model is actually learning something. I think I'm just hung up on the test AUPR never changing over the epochs. Maybe it is changing, but it happens in the first few epochs and I can't see it in your plot? I might also be completely missing the point of these plots, too. It might also be that your batch size is small enough that there isn't much change per epoch? |
View / edit / reply to this conversation on ReviewNB nrosed commented on 2023-08-04T20:24:43Z I guess if there is a difference in the mean AUPR based on layer size it proves that it is learning a better model, I think I just couldn't really see it from the learning curves. |
View / edit / reply to this conversation on ReviewNB nrosed commented on 2023-08-04T20:31:09Z Main header and section headers would help readers understand what this notebook is doing and how to interpret the plots. jjc2718 commented on 2023-08-15T18:22:59Z I'll add them! These scripts started as just a model diagnostic thing, but we are using a few of these figures in the supplement so I'll add some documentation. |
View / edit / reply to this conversation on ReviewNB nrosed commented on 2023-08-04T20:31:11Z Main header and section headers would help readers understand what this notebook is doing and how to interpret the plots. |
View / edit / reply to this conversation on ReviewNB nrosed commented on 2023-08-04T20:31:12Z Main header and section headers would help readers understand what this notebook is doing and how to interpret the plots. |
Looks good to me; the main thing that would help future readers are some main- and sub-section headers for some of the scripts. Other comments were just conceptual questions I had on some of your plots. I might have just missed the intention of the plots, so it ignore them if they don't make any sense. |
Yeah, this is a "just for me" plot. The ones that show the train/valid/test breakdown are what we'll show in the paper, so I spent more time making them pretty. View entire conversation on ReviewNB |
Yeah, I'm not sure why there are so many genes that perform better on CCLE. It could be something technical or dataset-specific like what you're describing, or it could be something biological like the cell line data is just cleaner/more well-behaved in these cases than the tumor samples from TCGA. If I had more time maybe I'd try to detangle the two, but I have to start writing my thesis at some point, haha. View entire conversation on ReviewNB |
I'll add them! These scripts started as just a model diagnostic thing, but we are using a few of these figures in the supplement so I'll add some documentation. View entire conversation on ReviewNB |
For whatever reason, these models seem to saturate really fast rather than improving slowly across epochs. Here's a comparison of learning rates I did a while ago for KRAS mutation prediction: For the lower learning rates, you can see that CV/test performance improves a bit more gradually, but the resulting CV and test performance after 200 epochs ends up being about the same as it is for the higher learning rates (other than the obviously bad ones like 0.01). In the plots you're looking at for hidden layer size, I was doing a grid search (I think over the same range shown here) and choosing the best learning rate. So what that ends up picking happens to be one of the models that saturates really fast, at least for KRAS; I haven't looked too much at other genes. We could add some kind of baseline, but since what we ultimately care about is the "best vs. smallest good" model selection comparison I don't think it's that important in this case. Like you mentioned, because of the considerable variability between hidden layer sizes I'm fairly confident that the model is learning something, and our goal isn't really to find the absolute best performing NN model here, just a reasonable one that allows us to think about model complexity by comparing models with different performances across many genes. |
Also cleaning up some other figures for the paper draft. This PR touches a lot of files but the changes aren't that substantial, most of them are just cosmetic.