Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The --git-hash option in wandb job create is not working. #132

Open
zfhxi opened this issue Nov 13, 2023 · 5 comments
Open

The --git-hash option in wandb job create is not working. #132

zfhxi opened this issue Nov 13, 2023 · 5 comments

Comments

@zfhxi
Copy link

zfhxi commented Nov 13, 2023

I created a job using wandb local:

wandb job create git  https://xxx.git --project="TEST"  \
    --entity="username" --entry-point="main.py" --name="test1" \
    --git-hash="b7baca74dd034cb900ea0e3f48c397ea51c4c481"

the wandb local created the job in the TEST project, and the wandb-job.json:

{
    "_version": "v0",
    "source_type": "repo",
    "runtime": "3.7",
    "source": {
        "git": {
            "remote": "https://xxx.git",
            "commit": "b7baca74dd034cb900ea0e3f48c397ea51c4c481"
        },
        "entrypoint": [
            "python3.7",
            "main.py"
        ],
        "notebook": false
    },
    // ...
}

After that, I had modifed my codes and synced with remote repository, and the commits are as following:

$  git log --pretty=oneline -10
8a6b803c530e800cdf3304d12c6467dcfd655bf5 (HEAD -> main, origin/main) now1001
49ae7364f743c1b699d7a00f51e9805030c38c18 now1000
b7baca74dd034cb900ea0e3f48c397ea51c4c481 now1002
# ...

Then, I launched the job by pushing it to the existing queue:
image

After completing the run, I located the codes cloned from a remote repository by the wandb local server and reviewed the commit:

$ cd "/tmp/tmpavc8q10w" 
$ git log --pretty=oneline -10
8a6b803c530e800cdf3304d12c6467dcfd655bf5 (grafted, HEAD -> main, origin/main) now1001

The expected commit, as specified by --git-hash, should be b7baca74dd034cb900ea0e3f48c397ea51c4c481 rather than the HEAD commit!

The above information indicates that:

  1. The wandb local server clones the latest version of remote repository when launching the job
  2. --git-hash option in wandb job create seems to be not working.

Can anyone help solve this?

@rsanandres-wandb
Copy link

Hello! Thank you for sending this information! Could you send a link to your workspace so we can look at it? Only wandb employees will be able to view your project if this is a private project.

Also, could you verify that the launch job you created corresponds to the run id avc8q10w? Just to make sure that we are looking at the same run as the one created.

@zfhxi
Copy link
Author

zfhxi commented Nov 15, 2023

Hello! Thank you for sending this information! Could you send a link to your workspace so we can look at it? Only wandb employees will be able to view your project if this is a private project.

Also, could you verify that the launch job you created corresponds to the run id avc8q10w? Just to make sure that we are looking at the same run as the one created.

Thank you for your response. I've created a demo at https://github.com/zfhxi/test_wandb_launch_job

@zfhxi
Copy link
Author

zfhxi commented Nov 15, 2023

After hours of work, I've found this solution:

import os
import argparse
import subprocess
import sys
from git import Repo

def restart_program():
    p = subprocess.Popen([sys.executable] + sys.argv)
    p.wait()
    print("Fininshed the sub program!")
    sys.exit(0)
    
def reset_commit(repo, commit_id, workspace):
    commit = repo.commit(commit_id)
    repo.head.reset(commit=commit, index=True, working_tree=True)
    print( f"Workspace {workspace} is checkouting to {commit_id} ...")


def prerun(args):
    # Confirming if the current branch matches the specific job commit
    if bool(args.wandb_job_commit):
        repo = Repo(args.workspace)
        current_commit = repo.head.commit.hexsha
        # assert current_commit == args.wandb_job_commit, f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!"
        if current_commit != args.wandb_job_commit:
            print( f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!") # fmt: skip
            try:
                reset_commit(repo, args.wandb_job_commit)
            except Exception as e:
                print(e)
                print("Trying to fetch the latest 20 commits ...")
                origin = repo.remotes.origin
                repo.git.fetch(origin, "--depth=20")
                reset_commit(repo, args.wandb_job_commit)
            restart_program()
        else:
            print( f"Current commit {current_commit} == job commit {args.wandb_job_commit}!") # fmt: skip
    pass


if __name__=="__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument( "--wandb-job-commit", type=str, default=None, help="validating the commit hexsha") # fmt: skip
    args=parser.parse_args()
    args.workspace = os.path.dirname(os.path.abspath(__file__))

    prerun(args)
    pass
    # main codes

The codes perform the following actions:

  1. Check the current workspace's commit.
  2. Fetch the latest 20 commits from the remote repository.
  3. Switch to a specific commit.
  4. Restart the current script.

I anticipate more elegant solutions!

Copy link

sydholl commented Feb 15, 2024

WandB Internal User commented:
zfhxi commented:

Hello! Thank you for sending this information! Could you send a link to your workspace so we can look at it? Only wandb employees will be able to view your project if this is a private project.

Also, could you verify that the launch job you created corresponds to the run id avc8q10w? Just to make sure that we are looking at the same run as the one created.

Thank you for your response. I've created a demo at https://github.com/zfhxi/test_wandb_launch_job

Copy link

sydholl commented Feb 15, 2024

WandB Internal User commented:
zfhxi commented:
After hours of work, I've found this solution:

import os
import argparse
import subprocess
import sys
from git import Repo

def restart_program():
    p = subprocess.Popen([sys.executable] + sys.argv)
    p.wait()
    print("Fininshed the sub program!")
    sys.exit(0)
    
def reset_commit(repo, commit_id, workspace):
    commit = repo.commit(commit_id)
    repo.head.reset(commit=commit, index=True, working_tree=True)
    print( f"Workspace {workspace} is checkouting to {commit_id} ...")


def prerun(args):
    # Confirming if the current branch matches the specific job commit
    if bool(args.wandb_job_commit):
        repo = Repo(args.workspace)
        current_commit = repo.head.commit.hexsha
        # assert current_commit == args.wandb_job_commit, f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!"
        if current_commit != args.wandb_job_commit:
            print( f"Current commit {current_commit} is not equal the job commit {args.wandb_job_commit}!") # fmt: skip
            try:
                reset_commit(repo, args.wandb_job_commit)
            except Exception as e:
                print(e)
                print("Trying to fetch the latest 20 commits ...")
                origin = repo.remotes.origin
                repo.git.fetch(origin, "--depth=20")
                reset_commit(repo, args.wandb_job_commit)
            restart_program()
        else:
            print( f"Current commit {current_commit} == job commit {args.wandb_job_commit}!") # fmt: skip
    pass


if __name__=="__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument( "--wandb-job-commit", type=str, default=None, help="validating the commit hexsha") # fmt: skip
    args=parser.parse_args()
    args.workspace = os.path.dirname(os.path.abspath(__file__))

    prerun(args)
    pass
    # main codes

The codes perform the following actions:

  1. Check the current workspace's commit.
  2. Fetch the latest 20 commits from the remote repository.
  3. Switch to a specific commit.
  4. Restart the current script.

I anticipate more elegant solutions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants