Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant't start the sample. #222

Open
fanlai0990 opened this issue Apr 18, 2023 Discussed in #217 · 0 comments
Open

Cant't start the sample. #222

fanlai0990 opened this issue Apr 18, 2023 Discussed in #217 · 0 comments

Comments

@fanlai0990
Copy link
Member

Discussed in #217

Originally posted by li1553770945 March 13, 2023
I have successfully installed the fedscale framework and downloaded the femnist dataset. I am trying to follow the information on this page Deploment to complete my first sample. I entered the fedscale driver start conf.yml command and the contents of my conf.yml are as follows.

# Configuration file of FAR training experiment

# ========== Cluster configuration ========== 
# ip address of the parameter server (need 1 GPU process)
ps_ip: localhost

# ip address of each worker:# of available gpus process on each gpu in this node
# Note that if we collocate ps and worker on same GPU, then we need to decrease this number of available processes on that GPU by 1
# E.g., master node has 4 available processes, then 1 for the ps, and worker should be set to: worker:3
worker_ips:
    - localhost:[4]

exp_path: $FEDSCALE_HOME/fedscale/cloud

# Entry function of executor and aggregator under $exp_path
executor_entry: execution/executor.py

aggregator_entry: aggregation/aggregator.py

auth:
    ssh_user: ""
    ssh_private_key: ~/.ssh/id_rsa

# cmd to run before we can indeed run FAR (in order)
setup_commands:
    - source $HOME/anaconda3/bin/activate fedscale

# ========== Additional job configuration ========== 
# Default parameters are specified in config_parser.py, wherein more description of the parameter can be found

job_conf: 
    - job_name: femnist                   # Generate logs under this folder: log_path/job_name/time_stamp
    - log_path: $FEDSCALE_HOME/benchmark # Path of log files
    - num_participants: 50                  # Number of participants per round, we use K=100 in our paper, large K will be much slower
    - data_set: femnist                     # Dataset: openImg, google_speech, stackoverflow
    - data_dir: $FEDSCALE_HOME/benchmark/dataset/data/femnist    # Path of the dataset
    - data_map_file: $FEDSCALE_HOME/benchmark/dataset/data/femnist/client_data_mapping/train.csv              # Allocation of data to each client, turn to iid setting if not provided
    - device_conf_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_device_capacity     # Path of the client trace
    - device_avail_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_behave_trace
    - model: resnet18             # NOTE: Please refer to our model zoo README and use models for these small image (e.g., 32x32x3) inputs
#    - model_zoo: fedscale-torch-zoo
    - eval_interval: 10                     # How many rounds to run a testing on the testing set
    - rounds: 1000                          # Number of rounds to run this training. We use 1000 in our paper, while it may converge w/ ~400 rounds
    - filter_less: 21                       # Remove clients w/ less than 21 samples
    - num_loaders: 2
    - local_steps: 5
    - learning_rate: 0.05
    - batch_size: 20
    - test_bsz: 20
    - use_cuda: False
    - save_checkpoint: False

After that the message displayed is

starting aggregator on localhost...
Aggregator local PID 2767758. run kill -9 2767758 to kill the job.
Starting workers on localhost ...
Submitted job, please check your logs $FEDSCALE_HOME/benchmark/logs/femnist/0314_015101 for status

I checked the log file directory and didn't find the log file, and I checked the PID of his output and there is no process.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant