Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to ray up (part 2) #79

Closed
tonychenxyz opened this issue Sep 17, 2024 · 4 comments
Closed

Unable to ray up (part 2) #79

tonychenxyz opened this issue Sep 17, 2024 · 4 comments
Assignees

Comments

@tonychenxyz
Copy link

Previously in issue #69 , I was able to ray up with the following config yaml

cluster_name: my-cluster
min_workers: 1
max_workers: 10
upscaling_speed: 1.0
docker:
  image: "rayproject/ray:latest"
  container_name: "ray_container"
  pull_before_run: True
setup_commands:
    - wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.3.1-0-Linux-x86_64.sh -O miniconda.sh
    - bash ~/miniconda.sh -f -b -p /tmp/miniconda3/
    - echo 'export PATH="/tmp/miniconda3/bin/:$PATH"' >> ~/.bashrc
    - pip install --upgrade pip setuptools wheel
    - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
    - pip install boto3==1.26.90
    - pip install s3fs==2022.11.0
    - pip install psutil
    - pip install pyarrow
    - pip install 'pandas==2.1.4'
    - pip install git+https://github.com/mlfoundations/open_lm.git
    - git clone https://github.com/mlfoundations/dclm.git
provider:
    type: aws
    region: us-west-2
    cache_stopped_nodes: False

But now ray up with the same script gives error

Collecting torchmetrics<0.10.0,>=0.7.0 (from mosaicml->open_lm==0.0.34)
  Using cached torchmetrics-0.9.3-py3-none-any.whl.metadata (17 kB)
Collecting mosaicml (from open_lm==0.0.34)
  Using cached mosaicml-0.12.0-py3-none-any.whl.metadata (27 kB)
WARNING: Ignoring version 0.12.0 of mosaicml since it has invalid metadata:
Requested mosaicml from https://files.pythonhosted.org/packages/dc/3a/a36f940ca092403079579726f2bc8df9c0969e0840472ec091b6b2999f32/mosaicml-0.12.0-py3-none-any.whl (from open_lm==0.0.34) has invalid metadata: .* suffix can only be used with `==` or `!=` operators
    mosaicml-streaming (<0.3.*) ; extra == 'all'
                        ~~~~~^
Please use pip<24.1 if you need to use this version.
  Using cached mosaicml-0.11.1-py3-none-any.whl.metadata (27 kB)
WARNING: Ignoring version 0.11.1 of mosaicml since it has invalid metadata:
Requested mosaicml from https://files.pythonhosted.org/packages/76/d8/c9a0fef6d3afd1ece0513b35eeb11af4a8fb546323f653a680c2bb886e95/mosaicml-0.11.1-py3-none-any.whl (from open_lm==0.0.34) has invalid metadata: .* suffix can only be used with `==` or `!=` operators
    mosaicml-streaming (<0.2.*) ; extra == 'all'
                        ~~~~~^
Please use pip<24.1 if you need to use this version.
  Using cached mosaicml-0.11.0-py3-none-any.whl.metadata (27 kB)
WARNING: Ignoring version 0.11.0 of mosaicml since it has invalid metadata:
Requested mosaicml from https://files.pythonhosted.org/packages/8c/12/391990a20e8eefa280a0692be7a7e4f3c281c9fe1ecf0c9566400db4af31/mosaicml-0.11.0-py3-none-any.whl (from open_lm==0.0.34) has invalid metadata: .* suffix can only be used with `==` or `!=` operators
    mosaicml-streaming (<0.2.*) ; extra == 'all'
                        ~~~~~^
Please use pip<24.1 if you need to use this version.
  Using cached mosaicml-0.10.1-py3-none-any.whl.metadata (27 kB)
Collecting torchmetrics<0.8,>=0.7.0 (from mosaicml->open_lm==0.0.34)
  Using cached torchmetrics-0.7.3-py3-none-any.whl.metadata (20 kB)
Collecting mosaicml (from open_lm==0.0.34)
  Using cached mosaicml-0.10.0-py3-none-any.whl.metadata (27 kB)
  Using cached mosaicml-0.9.0-py3-none-any.whl.metadata (27 kB)
Collecting torch-optimizer<0.2,>=0.1.0 (from mosaicml->open_lm==0.0.34)
  Using cached torch_optimizer-0.1.0-py3-none-any.whl.metadata (53 kB)
Collecting mosaicml (from open_lm==0.0.34)
  Using cached mosaicml-0.8.2-py3-none-any.whl.metadata (27 kB)
  Using cached mosaicml-0.8.1-py3-none-any.whl.metadata (27 kB)
  Using cached mosaicml-0.8.0-py3-none-any.whl.metadata (27 kB)
  Using cached mosaicml-0.7.1-py3-none-any.whl.metadata (26 kB)
Collecting pyyaml>=5.1 (from datasets->open_lm==0.0.34)
  Using cached PyYAML-5.4.1.tar.gz (175 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [48 lines of output]
      running egg_info
      writing lib3/PyYAML.egg-info/PKG-INFO
      writing dependency_links to lib3/PyYAML.egg-info/dependency_links.txt
      writing top-level names to lib3/PyYAML.egg-info/top_level.txt
      Traceback (most recent call last):
        File "/tmp/miniconda3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/tmp/miniconda3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/tmp/miniconda3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 332, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 302, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 318, in run_setup
          exec(code, locals())
        File "<string>", line 271, in <module>
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 117, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 183, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 199, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 954, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 311, in run
          self.find_sources()
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 319, in find_sources
          mm.run()
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 540, in run
          self.add_defaults()
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 578, in add_defaults
          sdist.add_defaults(self)
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/command/sdist.py", line 108, in add_defaults
          super().add_defaults()
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 238, in add_defaults
          self._add_defaults_ext()
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 323, in _add_defaults_ext
          self.filelist.extend(build_ext.get_source_files())
        File "<string>", line 201, in get_source_files
        File "/tmp/pip-build-env-8u2pfflm/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 107, in __getattr__
          raise AttributeError(attr)
      AttributeError: cython_sources
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Shared connection to 35.82.33.80 closed.
  New status: update-failed
  !!!
  Full traceback: Traceback (most recent call last):
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/updater.py", line 166, in run
    self.do_update()
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/updater.py", line 490, in do_update
    self.cmd_runner.run(cmd, run_env="auto")
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/command_runner.py", line 493, in run
    return self.ssh_command_runner.run(
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/command_runner.py", line 383, in run
    return self._run_helper(final_cmd, with_output, exit_on_fail, silent=silent)
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/command_runner.py", line 298, in _run_helper
    raise click.ClickException(fail_msg) from None
click.exceptions.ClickException: SSH command failed.

  Error message: SSH command failed.
  !!!
  
  Failed to setup head node.
@andrewsiah
Copy link

Hi there, likewise I got the error above.

I think it can be traced to a PyYaml issue,
yaml/pyyaml#724

Someone related suggestions on stackoverflow, is to update awscli, as it has a pyyaml issue
https://stackoverflow.com/questions/76868274/build-failed-with-aws-ebcli-on-python-3-11-4

aws/aws-cli#8036 (comment)

But that doesn't fixed things for me.

Here is my ray config file:

cluster_name: andrew2-cluster
min_workers: 1
max_workers: 10
upscaling_speed: 1.0
docker:
  image: "rayproject/ray:latest"
  container_name: "ray_container"
  pull_before_run: True
setup_commands:
  - sudo apt update
  - sudo apt install cmake build-essential
  - sudo apt install g++-9
  - sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 90
  - wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.3.1-0-Linux-x86_64.sh -O miniconda.sh
  - bash ~/miniconda.sh -f -b -p /tmp/miniconda3/
  - echo 'export PATH="/tmp/miniconda3/bin/:$PATH"' >> ~/.bashrc
  - pip install --upgrade pip setuptools wheel
  - pip install --force-reinstall -v "PyYAML==6.0.1" --no-build-isolation
  - pip install awscli --no-build-isolation
  - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
  - pip install boto3==1.26.90
  - pip install s3fs==2022.11.0
  - pip install psutil
  - pip install pyarrow
  - pip install 'pandas==2.1.4'
  - pip install fasttext
  - pip install git+https://github.com/mlfoundations/open_lm.git
  - git clone https://github.com/mlfoundations/dclm.git
provider:
    type: aws
    region: us-west-2
    cache_stopped_nodes: False

@GeorgiosSmyrnis
Copy link
Contributor

Hi @tonychenxyz , @andrewsiah ,

I looked into this and made some modifications to the yaml file, and have a few variants in which the packages are installed properly. Here is the config that I used - can you try this after making the account specific edits that I marked in the comments?

cluster_name: test-processing
max_workers: 2
upscaling_speed: 1.0
available_node_types:
    ray.head.default:
        resources: {}
        node_config:
            ImageId: ami-0c5cce1d70efb41f5
            InstanceType: i4i.4xlarge
            IamInstanceProfile:
                # Replace 000000000000 with your IAM account 12-digit ID
                Arn: arn:aws:iam::000000000000:instance-profile/ray-autoscaler-v1
    ray.worker.default:
        min_workers: 2
        max_workers: 2
        node_config:
            ImageId: ami-0c5cce1d70efb41f5
            InstanceType: i4i.4xlarge
            IamInstanceProfile:
                # Replace 000000000000 with your IAM account 12-digit ID
                Arn: arn:aws:iam::000000000000:instance-profile/ray-autoscaler-v1

provider:
    type: aws
    region: us-west-2
    cache_stopped_nodes: False

setup_commands:
    - sudo mkfs -t xfs /dev/nvme1n1
    - sudo mount /dev/nvme1n1 /tmp
    - sudo chown -R $USER /tmp
    - sudo chmod -R 777 /tmp
    - wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.3.1-0-Linux-x86_64.sh -O miniconda.sh
    - bash ~/miniconda.sh -f -b -p /tmp/miniconda3/
    - echo 'export PATH="/tmp/miniconda3/bin/:$PATH"' >> ~/.bashrc
    # Include your AWS CREDS here
    - echo 'export AWS_ACCESS_KEY_ID=' >> ~/.bashrc
    - echo 'export AWS_SECRET_ACCESS_KEY=' >> ~/.bashrc
    - pip install --upgrade pip setuptools wheel
    - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
    - pip install boto3==1.26.90
    - pip install s3fs==2022.11.0
    - pip install psutil
    - pip install pysimdjson
    - pip install pyarrow
    - git clone https://github.com/mlfoundations/dclm.git
    - pip install -r dclm/requirements.txt
    - cd dclm && python3 setup.py install

@Mivg Mivg assigned Mivg and GeorgiosSmyrnis and unassigned Mivg Sep 19, 2024
@jeffreywpli
Copy link
Contributor

Hi @tonychenxyz , @andrewsiah just checking in, were you were able to resolve your issue?

@andrewsiah
Copy link

Hey, yeap, thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants