-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about train.py #54
Comments
Great question. The dataset iterator is defined in the CellBox/cellbox/cellbox/dataset.py Lines 105 to 106 in 3bc687b
So here in |
So the way we designed |
Yes that was intentional. The rationale was to 'only test the model as the final step'. Although it's sometimes not practical to do so. Here for example we might want to bootstrap random partition sampling for a handful times and then pick the best model based on the validation set. Then we can test the picked model and report that performance. This can also save computational resources to skip the test stage during the training/development stage. The final test evaluation step is implemented here CellBox/cellbox/cellbox/train.py Lines 106 to 112 in 3bc687b
As the result, the last line of record should have only test MSE while the other columns being |
We did not document this well. This step was handled here CellBox/cellbox/cellbox/dataset.py Lines 197 to 214 in 3bc687b
so in the default |
This is related to my comment above. They should be on the test subset. |
Issue type
Need help
Summary
Some functions in
/cellbox/train.py
have some ambiguity in what task they perform. These are crucial to understand to reproduce similar results for Pytorch version of CellBox. Therefore, this issue is for resolving the ambiguity.Details
train.py
, areloss_valid_i
andloss_valid_mse_i
evaluated on one random batch fetched fromargs.feed_dicts['valid_set']
, or are these losses evaluated on the whole validation set?eval_model
function returns different values with different calls. At line 101 to 103, it returns both the total and mse loss forargs.n_batches_eval
number of batches on the validation set. At line 109 to 111, it returns only the mse loss forargs.n_batches_eval
number of batches on the test set. And at line 262 it returns the expression predictionsy_hat
for the whole test set. Are all of these statements correct?record_eval.csv
file generated after training, using the default training arguments and config file as specified in the README (python scripts/main.py -config=configs/Example.random_partition.json
), hastest_mse
column to be None. Is it the expected behaviour of the code?random_pos.csv
, generated after training, stores the index of the perturbation conditions. Does it indicate how the conditions for training, validation, and testing are split?6_best.y_hat.loss.csv
, containing the expression prediction for perturbation conditions in the test set for all nodes, but it does not indicate which row in this file corresponds to which perturbation condition. How is this file andrandom_pos.csv
related?The text was updated successfully, but these errors were encountered: