Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot results for neural network experiments, across all genes #85

Merged
merged 16 commits into from
Aug 15, 2023

Conversation

jjc2718
Copy link
Member

@jjc2718 jjc2718 commented Aug 3, 2023

Also cleaning up some other figures for the paper draft. This PR touches a lot of files but the changes aren't that substantial, most of them are just cosmetic.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jjc2718 jjc2718 requested a review from nrosed August 3, 2023 19:12
@@ -17,6 +17,7 @@
"source": [
Copy link

@nrosed nrosed Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that theres a big difference in the number of genes in comparison to the previous run? (51480 now, 35520 before)


Reply via ReviewNB

Copy link
Member Author

@jjc2718 jjc2718 Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "shape" of this dataframe is the total number of models, i.e. one for each combination of gene/cancer type/LASSO parameter/seed/CV fold. In the most recent run we covered a larger range of LASSO parameters, so it makes sense that the dataframe is considerably larger.

There are also two genes that are included now that weren't before, ALK and CIC. I slightly changed how the filtering for hypermutated samples works which results in these genes being included, so this is expected.

@@ -17,6 +17,7 @@
"source": [
Copy link

@nrosed nrosed Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it also expected that there is a shift in the distribution from most being near 0 to not most being near 15000?


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the result of the larger range of LASSO parameters I described before. It is expected, and I would argue that it makes more sense for the model comparisons since we're covering a more comprehensive range of "overfit" or non-sparse models now.

@@ -17,6 +17,7 @@
"source": [
Copy link

@nrosed nrosed Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it strange that the test_auroc is so low for CIC-UCEC?


Reply via ReviewNB

Copy link
Member Author

@jjc2718 jjc2718 Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something we've observed before for some genes/cancer types. We think it's because some mutations aren't all that correlated with gene expression in some cancer types (i.e. they're not drivers in that cancer type, or they don't occur frequently enough to train a good model, etc) so they end up overfitting, which is what we see here with good CV performance and bad performance on held-out UCEC samples.

Currently, we're just including/excluding cancer types based on mutation frequency/proportion. We could do something smarter but I think for this study where we have hundreds of genes/cancer types, it's probably fine to have the filter be pretty coarse-grained. A while back we tried filtering for cancer types where the model outperforms a baseline where the gene expression data is permuted, and it didn't affect the overall model selection results that much.

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 4, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:36Z
----------------------------------------------------------------

legend is covering the bars, maybe you could put it horizontal below the plot. unless this is a just for you plot and not needed for the paper


jjc2718 commented on 2023-08-15T18:12:44Z
----------------------------------------------------------------

Yeah, this is a "just for me" plot. The ones that show the train/valid/test breakdown are what we'll show in the paper, so I spent more time making them pretty.

@@ -0,0 +1,752 @@
{
Copy link

@nrosed nrosed Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its very interesting that the stratified holdout and the test peak very close to each other.


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was interesting to us too! I think it's pretty typical though, at least for this mutation prediction problem across genes and cancer types. That's one of the main takeaways from our paper, that cross-validation performance is generally pretty predictive of generalization performance, and model simplicity may not be as important.

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 4, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:39Z
----------------------------------------------------------------

It's really interesting how ARID2 and VHL perform so much better on the test, CCLE, dataset. Could this be caused by a cell-line specific effect? Like a mismatch in cancer type proportions to cell line proportions? Not sure if its important for your paper, its just very striking.


jjc2718 commented on 2023-08-15T18:14:39Z
----------------------------------------------------------------

Yeah, I'm not sure why there are so many genes that perform better on CCLE. It could be something technical or dataset-specific like what you're describing, or it could be something biological like the cell line data is just cleaner/more well-behaved in these cases than the tumor samples from TCGA. If I had more time maybe I'd try to detangle the two, but I have to start writing my thesis at some point, haha.

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 4, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:40Z
----------------------------------------------------------------

I don't fully understand this plot. Why is test error always at 0.6 even though the cv and train errors are going down?


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 4, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:41Z
----------------------------------------------------------------

yeah, I'm confused how the training error is far below the test error and why the test error is constant. To me this seems like a bias in the test set?


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 4, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:43Z
----------------------------------------------------------------

Ignore if this doesn't make sense, but maybe adding another comparator (like a dummy comparator the predicts the most common label) would convince readers that your model is actually learning something. I think I'm just hung up on the test AUPR never changing over the epochs. Maybe it is changing, but it happens in the first few epochs and I can't see it in your plot? I might also be completely missing the point of these plots, too. It might also be that your batch size is small enough that there isn't much change per epoch?


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 4, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:43Z
----------------------------------------------------------------

I guess if there is a difference in the mean AUPR based on layer size it proves that it is learning a better model, I think I just couldn't really see it from the learning curves.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 4, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:31:09Z
----------------------------------------------------------------

Main header and section headers would help readers understand what this notebook is doing and how to interpret the plots.


jjc2718 commented on 2023-08-15T18:22:59Z
----------------------------------------------------------------

I'll add them! These scripts started as just a model diagnostic thing, but we are using a few of these figures in the supplement so I'll add some documentation.

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 4, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:31:11Z
----------------------------------------------------------------

Main header and section headers would help readers understand what this notebook is doing and how to interpret the plots.


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 4, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:31:12Z
----------------------------------------------------------------

Main header and section headers would help readers understand what this notebook is doing and how to interpret the plots.


@nrosed
Copy link

nrosed commented Aug 4, 2023

Looks good to me; the main thing that would help future readers are some main- and sub-section headers for some of the scripts.

Other comments were just conceptual questions I had on some of your plots. I might have just missed the intention of the plots, so it ignore them if they don't make any sense.

Copy link
Member Author

jjc2718 commented Aug 15, 2023

Yeah, this is a "just for me" plot. The ones that show the train/valid/test breakdown are what we'll show in the paper, so I spent more time making them pretty.


View entire conversation on ReviewNB

Copy link
Member Author

jjc2718 commented Aug 15, 2023

Yeah, I'm not sure why there are so many genes that perform better on CCLE. It could be something technical or dataset-specific like what you're describing, or it could be something biological like the cell line data is just cleaner/more well-behaved in these cases than the tumor samples from TCGA. If I had more time maybe I'd try to detangle the two, but I have to start writing my thesis at some point, haha.


View entire conversation on ReviewNB

Copy link
Member Author

jjc2718 commented Aug 15, 2023

I'll add them! These scripts started as just a model diagnostic thing, but we are using a few of these figures in the supplement so I'll add some documentation.


View entire conversation on ReviewNB

@jjc2718
Copy link
Member Author

jjc2718 commented Aug 15, 2023

View / edit / reply to this conversation on ReviewNB

nrosed commented on 2023-08-04T20:24:40Z ----------------------------------------------------------------

I don't fully understand this plot. Why is test error always at 0.6 even though the cv and train errors are going down?

For whatever reason, these models seem to saturate really fast rather than improving slowly across epochs. Here's a comparison of learning rates I did a while ago for KRAS mutation prediction:

image

For the lower learning rates, you can see that CV/test performance improves a bit more gradually, but the resulting CV and test performance after 200 epochs ends up being about the same as it is for the higher learning rates (other than the obviously bad ones like 0.01).

In the plots you're looking at for hidden layer size, I was doing a grid search (I think over the same range shown here) and choosing the best learning rate. So what that ends up picking happens to be one of the models that saturates really fast, at least for KRAS; I haven't looked too much at other genes.

We could add some kind of baseline, but since what we ultimately care about is the "best vs. smallest good" model selection comparison I don't think it's that important in this case. Like you mentioned, because of the considerable variability between hidden layer sizes I'm fairly confident that the model is learning something, and our goal isn't really to find the absolute best performing NN model here, just a reasonable one that allows us to think about model complexity by comparing models with different performances across many genes.

@jjc2718 jjc2718 merged commit 7650b0f into greenelab:master Aug 15, 2023
1 check passed
@jjc2718 jjc2718 deleted the nn_hsize_all branch August 15, 2023 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants