Skip to content

Latest commit

 

History

History
150 lines (104 loc) · 6.77 KB

init.md

File metadata and controls

150 lines (104 loc) · 6.77 KB
description
Call wandb.init() each time you start a new run, before the training loop where you log model metrics

wandb.init()

Overview

Calling wandb.init() returns a run object. You can also access the run object by calling wandb.run.

You should generally call wandb.init() once at the start of your training script. This will create a new run and launch a single background process to sync the data to our cloud. If you want your machine to run offline and upload data later, use offline mode.

wandb.init() accepts a few keyword arguments:

  • name — A display name for this run
  • notes — A multiline string description associated with the run
  • config — a dictionary-like object to set as initial config
  • project — the name of the project to which this run will belong
  • tags — a list of strings to associate with this run as tags
  • dir — the path to a directory where artifacts will be written (default: ./wandb)
  • entity — the team posting this run (default: your username or your default team)
  • job_type — the type of job you are logging, e.g. eval, worker, ps (default: training)
  • save_code — save the main python or notebook file to wandb to enable diffing (default: editable from your settings page)
  • group — a string by which to group other runs; see Grouping
  • reinit — whether to allow multiple calls to wandb.init in the same process (default: False)
  • id — A unique id for this run primarily used for resuming; see Resuming, must be globally unique within a project. If you have a descriptive name for your run, we suggest you use the "name" field. The ID needs to not use special characters.
  • resume — if set to True, the run auto resumes; can also be a unique string for manual resuming; see Resuming (default: False)
  • anonymous — can be "allow", "never", or "must". This enables or explicitly disables anonymous logging. (default: never)
  • force — whether to force a user to be logged into wandb when running a script (default: False)
  • magic — (bool, dict, or str, optional): magic configuration as bool, dict, json string, yaml filename. If set to True will attempt to auto-instrument your script. (default: None)
  • sync_tensorboard — A boolean indicating whether or not copy all TensorBoard logs wandb; see Tensorboard (default: False)
  • monitor_gym — A boolean indicating whether or not to log videos generated by OpenAI Gym; see Ray Tune (default: False)
  • allow_val_change — whether to allow wandb.config values to change, by default we throw an exception if config values are overwritten. (default: False)

Most of these settings can also be controlled via Environment Variables. This is often useful when you're running jobs on a cluster.

We automatically save a copy of the script where you run wandb.init(). Learn more about the code comparison feature here: Code Comparer. To disable this feature, set the environment variable WANDB_DISABLE_CODE=true.

Common Questions

How do I launch multiple runs from one script?

If you're trying to start multiple runs from one script, add two things to your code:

  1. wandb.init(reinit=True): Use this setting to allow reinitializing runs
  2. wandb.join(): Use this at the end of your run to finish logging for that run
import wandb
for x in range(10):
    wandb.init(project="runs-from-for-loop", reinit=True)
    for y in range (100):
        wandb.log({"metric": x+y})
    wandb.join()

Alternatively you can use a python context manager which will automatically finish logging:

import wandb
for x in range(10):
    run = wandb.init(reinit=True)
    with run:
        for y in range(100):
            run.log({"metric": x+y})

LaunchError: Permission denied

If you're getting a LaunchError: Launch exception: Permission denied error, you don't have permissions to log to the project you're trying to send runs to. This might be for a few different reasons.

  1. You aren't logged in on this machine. Run wandb login on the command line.
  2. You've set an entity that doesn't exist. "Entity" should be your username or the name of an existing team. If you need to create a team, go to our Subscriptions page.
  3. You don't have project permissions. Ask the creator of the project to set the privacy to Open so you can log runs to this project.

Get the readable run name

Get the nice, readable name for your run.

import wandb

wandb.init()
wandb.run.save()
run_name = wandb.run.name

Set the run name to the generated run ID

If you'd like to overwrite the run name (like snowy-owl-10) with the run ID (like qvlp96vk) you can use this snippet:

import wandb
wandb.init()
wandb.run.name = wandb.run.id
wandb.run.save()

Save the git commit

When wandb.init() is called in your script, we automatically look for git information to save a link to your repo the SHA of the latest commit. The git information should show up on your run page. If you aren't seeing it appear there, make sure that your script where you call wandb.init() is located in a folder that has git information.

The git commit and command used to run the experiment are visible to you but are hidden to external users, so if you have a public project, these details will remain private.

Save logs offline

By default, wandb.init() starts a process that syncs metrics in real time to our cloud hosted app. If your machine is offline or you don't have internet access, here's how to run wandb using the offline mode and sync later.

Set two environment variables:

  1. WANDB_API_KEY: Set this to your account's API key, on your settings page
  2. WANDB_MODE: dryrun

Here's a sample of what this would look like in your script:

import wandb
import os

os.environ["WANDB_API_KEY"] = YOUR_KEY_HERE
os.environ["WANDB_MODE"] = "dryrun"

config = {
  "dataset": "CIFAR10",
  "machine": "offline cluster",
  "model": "CNN",
  "learning_rate": 0.01,
  "batch_size": 128,
}

wandb.init(project="offline-demo")

for i in range(100):
  wandb.log({"accuracy": i})

Here's a sample terminal output:

And once I have internet, I run a sync command to send that folder to the cloud.

wandb sync wandb/dryrun-folder-name