Skip to content

Commit

Permalink
Merge pull request #21 from jjc2718/l1_note
Browse files Browse the repository at this point in the history
Note about L1 regularization
  • Loading branch information
cgreene committed Jul 23, 2024
2 parents 456ec96 + 0d00e24 commit ff622b0
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions content/02.main-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,10 @@ Our models were trained for 100 epochs of mini-batch stochastic gradient descent
To select the remaining hyperparameters for each hidden layer size, we performed a random search over 10 combinations, with a single train/test split stratified by cancer type, using the following hyperparameter ranges: learning rate {0.1, 0.01, 0.001, 5e-4, 1e-4}, dropout proportion {0.1, 0.5, 0.75}, weight decay (L2 penalty) {0, 0.1, 1, 10, 100}.
We used the same train/cross-validation split strategy described above for one random seed and 4 cross-validation splits, generating 4 different performance measurements for each gene and hidden layer size.

Although L1 regularization can be used to more directly induce model sparsity in convex settings, we note that using L1 regularization to control model complexity in neural networks is considerably more complex.
Simply adding an additional loss term is not enough to achieve convergence to a sparse solution; the problem requires special optimizers and is the subject of ongoing research (see, e.g., [@url:https://dl.acm.org/doi/abs/10.5555/3540261.3542126]).
For this reason, we focused on controlling NN model complexity via the size and number of hidden layers, as well as the other approaches described above.

For the _EGFR_ gene, we also ran experiments where we varied the dropout proportion and the weight decay hyperparameter as the regularization axis, and selected the remaining hyperparameters (including the hidden layer size) using a random search.
In these cases, we used a fixed range for dropout of {0.0, 0.05, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 0.95}, and a fixed range for weight decay of {0.0, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 10.0}.
All neural network analyses were performed on a Ubuntu 18.04 machine with a NVIDIA RTX 2060 GPU.
Expand Down

0 comments on commit ff622b0

Please sign in to comment.