Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fasttext cannot be found #78

Closed
tonychenxyz opened this issue Sep 16, 2024 · 7 comments
Closed

fasttext cannot be found #78

tonychenxyz opened this issue Sep 16, 2024 · 7 comments
Assignees

Comments

@tonychenxyz
Copy link

I successfully launched the ray.

Then I ran

ray attach <your_cluster_config>
cd dcnlp
export PYTHONPATH=$(pwd)
screen -S processing
python3 ray_processing/process.py \
  --source_ref_paths exp_data/datasets/raw_sources/CC_WET_april_2019.json \
  --readable_name c4_v4 \
  --output_dir s3://dcnlp-west/cc_wet_2019_april_baselines/c4_v4/ \
  --config_path baselines/baselines_configs/c4.yaml \
  --source_name cc_april_2019

I was told fasttext can not be found. I tried to pip install fasttext both in shell and add pip install fasttext in setup_commands. But I got the following error either way:

     File "/tmp/pip-build-env-ic8dq85d/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-ic8dq85d/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-ic8dq85d/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-ic8dq85d/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-ic8dq85d/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-ic8dq85d/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-ic8dq85d/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-ic8dq85d/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 98, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-ic8dq85d/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
        File "<string>", line 151, in build_extensions
        File "<string>", line 114, in cpp_flag
      RuntimeError: Unsupported compiler -- at least C++17 support is needed!
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for fasttext
Failed to build fasttext
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (fasttext)

my ray config is

cluster_name: my-cluster
min_workers: 1
max_workers: 10
upscaling_speed: 1.0
docker:
  image: "rayproject/ray:latest"
  container_name: "ray_container"
  pull_before_run: True
setup_commands:
    - wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.3.1-0-Linux-x86_64.sh -O miniconda.sh
    - bash ~/miniconda.sh -f -b -p /tmp/miniconda3/
    - echo 'export PATH="/tmp/miniconda3/bin/:$PATH"' >> ~/.bashrc
    - pip install --upgrade pip setuptools wheel
    - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
    - pip install boto3==1.26.90
    - pip install s3fs==2022.11.0
    - pip install psutil
    - pip install pyarrow
    - pip install 'pandas==2.1.4'
    - pip install git+https://github.com/mlfoundations/open_lm.git
    - git clone https://github.com/mlfoundations/dclm.git
provider:
    type: aws
    region: us-west-2
    cache_stopped_nodes: False
@Mivg
Copy link
Collaborator

Mivg commented Sep 16, 2024

Hi @tonychenxyz ,
It seems that your compiler does not support c++17 as required.
Have you tried updating your g++/gcc ? for example, see the instructions here

@Mivg Mivg self-assigned this Sep 16, 2024
@GeorgiosSmyrnis GeorgiosSmyrnis self-assigned this Sep 16, 2024
@tonychenxyz
Copy link
Author

I added updating g++/gcc commands per instruction in config yaml but seems like for AWS I don't have root access to do that. Are there any updating commands that you guys used with aws?

Error I got

Shared connection to 35.82.33.80 closed.
    (3/17) apt install cmake build-essent...
E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?

My current ray config

cluster_name: my-cluster
min_workers: 1
max_workers: 10
upscaling_speed: 1.0
docker:
  image: "rayproject/ray:latest"
  container_name: "ray_container"
  pull_before_run: True
setup_commands:
    - wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.3.1-0-Linux-x86_64.sh -O miniconda.sh
    - bash ~/miniconda.sh -f -b -p /tmp/miniconda3/
    - echo 'export PATH="/tmp/miniconda3/bin/:$PATH"' >> ~/.bashrc
    - apt install cmake build-essential
    - apt install g++-9
    - update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 90
    - pip install --upgrade pip setuptools wheel
    - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
    - pip install boto3==1.26.90
    - pip install s3fs==2022.11.0
    - pip install psutil
    - pip install pyarrow
    - pip install 'pandas==2.1.4'
    - pip install fasttext
    - pip install git+https://github.com/mlfoundations/open_lm.git
    - git clone https://github.com/mlfoundations/dclm.git
provider:
    type: aws
    region: us-west-2
    cache_stopped_nodes: False

@GeorgiosSmyrnis
Copy link
Contributor

Hi @tonychenxyz,

Are you spinning up new instances (I assume you do)? If so, you should be able to use sudo when installing packages during setup.

@tonychenxyz
Copy link
Author

Thanks for replying.

I did this

cluster_name: my-cluster
min_workers: 1
max_workers: 10
upscaling_speed: 1.0
docker:
  image: "rayproject/ray:latest"
  container_name: "ray_container"
  pull_before_run: True
setup_commands:
  - sudo apt update
  - sudo apt install cmake build-essential
  - sudo apt install g++-9
  - sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 90
  - wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.3.1-0-Linux-x86_64.sh -O miniconda.sh
  - bash ~/miniconda.sh -f -b -p /tmp/miniconda3/
  - echo 'export PATH="/tmp/miniconda3/bin/:$PATH"' >> ~/.bashrc
  - pip install --upgrade pip setuptools wheel
  - pip install -U "ray[default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
  - pip install boto3==1.26.90
  - pip install s3fs==2022.11.0
  - pip install psutil
  - pip install pyarrow
  - pip install 'pandas==2.1.4'
  - pip install fasttext
  - pip install git+https://github.com/mlfoundations/open_lm.git
  - git clone https://github.com/mlfoundations/dclm.git
provider:
    type: aws
    region: us-west-2
    cache_stopped_nodes: False

And I got this error

Collecting mosaicml (from open_lm==0.0.34)
  Downloading mosaicml-0.10.0-py3-none-any.whl.metadata (27 kB)
  Downloading mosaicml-0.9.0-py3-none-any.whl.metadata (27 kB)
Collecting torch-optimizer<0.2,>=0.1.0 (from mosaicml->open_lm==0.0.34)
  Downloading torch_optimizer-0.1.0-py3-none-any.whl.metadata (53 kB)
Collecting mosaicml (from open_lm==0.0.34)
  Downloading mosaicml-0.8.2-py3-none-any.whl.metadata (27 kB)
  Downloading mosaicml-0.8.1-py3-none-any.whl.metadata (27 kB)
  Downloading mosaicml-0.8.0-py3-none-any.whl.metadata (27 kB)
  Downloading mosaicml-0.7.1-py3-none-any.whl.metadata (26 kB)
Collecting pyyaml>=5.1 (from datasets->open_lm==0.0.34)
  Downloading PyYAML-5.4.1.tar.gz (175 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [48 lines of output]
      running egg_info
      writing lib3/PyYAML.egg-info/PKG-INFO
      writing dependency_links to lib3/PyYAML.egg-info/dependency_links.txt
      writing top-level names to lib3/PyYAML.egg-info/top_level.txt
      Traceback (most recent call last):
        File "/tmp/miniconda3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/tmp/miniconda3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/tmp/miniconda3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 332, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 302, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 318, in run_setup
          exec(code, locals())
        File "<string>", line 271, in <module>
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 117, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 183, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 199, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 954, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 311, in run
          self.find_sources()
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 319, in find_sources
          mm.run()
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 540, in run
          self.add_defaults()
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 578, in add_defaults
          sdist.add_defaults(self)
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/command/sdist.py", line 108, in add_defaults
          super().add_defaults()
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 238, in add_defaults
          self._add_defaults_ext()
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 323, in _add_defaults_ext
          self.filelist.extend(build_ext.get_source_files())
        File "<string>", line 201, in get_source_files
        File "/tmp/pip-build-env-nsvpcp6i/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 107, in __getattr__
          raise AttributeError(attr)
      AttributeError: cython_sources
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Shared connection to 35.82.33.80 closed.
  New status: update-failed
  !!!
  Full traceback: Traceback (most recent call last):
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/updater.py", line 166, in run
    self.do_update()
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/updater.py", line 490, in do_update
    self.cmd_runner.run(cmd, run_env="auto")
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/command_runner.py", line 493, in run
    return self.ssh_command_runner.run(
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/command_runner.py", line 383, in run
    return self._run_helper(final_cmd, with_output, exit_on_fail, silent=silent)
  File "/shared/share_mala/conda_envs/dclm/lib/python3.10/site-packages/ray/autoscaler/_private/command_runner.py", line 298, in _run_helper
    raise click.ClickException(fail_msg) from None
click.exceptions.ClickException: SSH command failed.

  Error message: SSH command failed.
  !!!
  
  Failed to setup head node.

@tonychenxyz
Copy link
Author

After removing open_lm installation per #79 , I was able to run the new config that installs fasttext. But I got jsonlines not found. Does the baseline yaml config provide the full specification of environment packages?

@GeorgiosSmyrnis
Copy link
Contributor

GeorgiosSmyrnis commented Sep 17, 2024

Hi @tonychenxyz ,

Unfortunately pip installing open_lm (edit: or dclm) in the yaml is required, since further packages are also installed with this command.
I will look into this further and get back to you.

@GeorgiosSmyrnis
Copy link
Contributor

Hi @tonychenxyz,

I provided a modified yaml file in #79 , using which both fasttext and jsonlines are installed properly. Let's move the discussion to #79 so that it is more centralized.

One more thing I noticed though: the input json and the output that you use point to private buckets. You should create your own reference json as outlined in the README, and have the output point to a bucket where you have write access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants