Skip to content

Commit

Permalink
Merge pull request #42 from psteinb/torch-fix-tb-instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
psteinb authored Aug 29, 2023
2 parents 3bd0c4f + 4ddcc46 commit 67f5e48
Showing 1 changed file with 26 additions and 24 deletions.
50 changes: 26 additions & 24 deletions docs/60_Pytorch/06_model_training_with_logging.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
"cells": [
{
"cell_type": "markdown",
"id": "2e63f9e0",
"metadata": {},
"source": [
"# Training with logging\n",
Expand All @@ -15,7 +14,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "25c03ff5",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -25,7 +23,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "451e615a",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -35,7 +32,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ebefbc37",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -49,7 +45,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c078117b",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -60,7 +55,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "59962964",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -70,7 +64,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "4d50ef20",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -90,7 +83,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "eba26347",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -110,7 +102,6 @@
},
{
"cell_type": "markdown",
"id": "bd86c000",
"metadata": {},
"source": [
"Training of a neural network means updating its parameters (weights) using a strategy that involves the gradients of a loss function with respect to the model parameters in order to adjust model weights to minimize this loss."
Expand All @@ -119,7 +110,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "48472592",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -129,47 +119,62 @@
},
{
"cell_type": "markdown",
"id": "bfecf4c4",
"metadata": {},
"source": [
"Such a training is performed by iterating over the batches of the training dataset multiple times. Each full iteration over the dataset is termed an epoch."
]
},
{
"cell_type": "markdown",
"id": "c4864307",
"metadata": {},
"source": [
"During or after training the tensorboard logs can be visualized as follows: in a terminal, type\n",
"**During or after training**, the tensorboard logs (which have been collected with the `SummaryWriter` object) can be visualized. Would you be on your laptop or workstation at home, you could do:\n",
"\n",
"```shell\n",
"tensorboard --logdir \"path/to/logs\",\n",
"```\n",
"\n",
"then open a browser on `localhost:6006` (or whichever port the tensorboard server outputted as running on).\n",
"then open a browser using the URL `localhost:6006` (or whichever port the tensorboard server outputted as running on).\n",
"Alternatively, tensorboard can be accessed from jupyter as well:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On Taurus, some special steps need to be taken to visualize the tensorboard logs.\n",
"\n",
"If not done already, spawn a notebook BUT this time make sure to choose `production` under software environment in the advanced spawn configuration. Then wait until the notebooks open. Run this notebook.\n",
"\n",
"In order to be able to view the tensorboard logs, the tensorboard jupyter lab extension always checks the same location on the computer it is running on. Hence, you need to move your logs in the right location. To do so, run the following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "94718c05",
"metadata": {},
"outputs": [],
"source": [
"%load_ext tensorboard\n",
"%tensorboard --port 6006 --logdir ./logs"
"!mkdir -p /tmp/$USER/tf-logs \n",
"!ln -s $PWD/logs /tmp/$USER/tf-logs #might fail if the destination already exists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now run the following cell which performs the model training. While the training runs, you can open the Tensorboad tab from the jupyter lab main page."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "967a8988",
"metadata": {},
"outputs": [],
"source": [
"max_nepochs = 1\n",
"log_interval = 1\n",
"\n",
"writer = SummaryWriter(log_dir=\"logs\", comment=\"this is the test of SummaryWriter\")\n",
"\n",
"model.train(True)\n",
Expand Down Expand Up @@ -211,15 +216,13 @@
},
{
"cell_type": "markdown",
"id": "7e04facc-5117-44ae-b779-0257f7456bb4",
"metadata": {},
"source": [
"When you executed the cell above, you should see a new folder appear in the current directory. This folder is called `logs`. This is where tensorboard stores all run information."
]
},
{
"cell_type": "markdown",
"id": "d73b530f-cace-424c-b993-c2162547cd83",
"metadata": {},
"source": [
"## Exercise: Let's do this locally\n",
Expand All @@ -240,17 +243,16 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b5eae7b2-8d38-4c4f-9c51-42742bb6e01d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "torch_intro_env",
"display_name": "devbio-napari_pol-course-pytorch",
"language": "python",
"name": "torch_intro_env"
"name": "devbio-napari_pol-course-pytorch"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -262,7 +264,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.8"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 67f5e48

Please sign in to comment.