Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ NeMo Curator's language identification system works through a three-step process

1. **Text Preprocessing**: For FastText classification, normalize input text by stripping whitespace and converting newlines to spaces.

2. **FastText Language Detection**: The pre-trained FastText language identification model ([`lid.176.bin`]((https://fasttext.cc/docs/en/language-identification.html))) analyzes the preprocessed text and returns:
2. **FastText Language Detection**: The pre-trained FastText language identification model ([`lid.176.bin`](https://fasttext.cc/docs/en/language-identification.html)) analyzes the preprocessed text and returns:
- A confidence score (0.0 to 1.0) indicating certainty of the prediction
- A language code (for example, "EN", "ES", "FR") in FastText's two-letter uppercase format

Expand Down
5 changes: 3 additions & 2 deletions docs/get-started/video.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ difficulty: "beginner"
content_type: "tutorial"
modality: "video-only"
only: not ga
orphan: true
---

(gs-video)=
Expand Down Expand Up @@ -48,7 +49,7 @@ docker tag nvcr.io/nvidia/nemo/nemo-curator-video:0.6.0 nemo_video_curator:1.0.0
```

```{seealso}
For details on video container environments and configurations, see [Video Curator Environments](reference-infrastructure-container-environments-video).
For details on video container environments and configurations, see Video Curator Environments.
```
:::

Expand Down Expand Up @@ -146,4 +147,4 @@ export PATH="$PATH:$HOME/.local/bin"

## Next Steps

Explore the [Video Curation documentation](video-overview).
Explore the Video Curation documentation.
12 changes: 12 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,18 @@ all_nightly = [
"nemo_curator[image_nightly]",
]

[dependency-groups]
docs = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update this in ray-curator/pyproject.toml instead of the higher level nemo_curator dependency list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay! thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i get the same error, does it living there change the command at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasdhc any ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new error if i run it from ray-curator/ directory:

Using CPython 3.11.11
Creating virtual environment at: .venv
  × Failed to build `cugraph-cu12==25.6.0`
  ├─▶ The build backend returned an error
  ╰─▶ Call to `wheel_stub.buildapi.build_wheel` failed (exit status: 1)

      [stderr]
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl against tag cp310-cp310-manylinux_2_24_aarch64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl against tag cp310-cp310-manylinux_2_28_aarch64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl against tag cp310-cp310-manylinux_2_28_x86_64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl against tag cp310-cp310-manylinux_2_24_x86_64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl against tag cp311-cp311-manylinux_2_28_aarch64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl against tag cp311-cp311-manylinux_2_24_aarch64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl against tag cp311-cp311-manylinux_2_24_x86_64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl against tag cp311-cp311-manylinux_2_28_x86_64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl against tag cp312-cp312-manylinux_2_28_aarch64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl against tag cp312-cp312-manylinux_2_24_aarch64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl against tag cp312-cp312-manylinux_2_28_x86_64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl against tag cp312-cp312-manylinux_2_24_x86_64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl against tag cp313-cp313-manylinux_2_28_aarch64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl against tag cp313-cp313-manylinux_2_24_aarch64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl against tag cp313-cp313-manylinux_2_28_x86_64
      INFO:wheel-stub:Testing wheel cugraph_cu12-25.6.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl against tag cp313-cp313-manylinux_2_24_x86_64
        File "/Users/llane/Library/Caches/uv/builds-v0/.tmpj8fibk/lib/python3.11/site-packages/wheel_stub/wheel.py", line 249, in download_wheel
          return download_manual(wheel_directory, distribution, version, config)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/llane/Library/Caches/uv/builds-v0/.tmpj8fibk/lib/python3.11/site-packages/wheel_stub/wheel.py", line 185, in download_manual
          raise RuntimeError(f"Didn't find wheel for {distribution} {version}")
      Traceback (most recent call last):
        File "/Users/llane/Library/Caches/uv/builds-v0/.tmpj8fibk/lib/python3.11/site-packages/wheel_stub/wheel.py", line 249, in download_wheel
          return download_manual(wheel_directory, distribution, version, config)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/llane/Library/Caches/uv/builds-v0/.tmpj8fibk/lib/python3.11/site-packages/wheel_stub/wheel.py", line 185, in download_manual
          raise RuntimeError(f"Didn't find wheel for {distribution} {version}")
      RuntimeError: Didn't find wheel for cugraph-cu12 25.6.0

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "<string>", line 11, in <module>
        File "/Users/llane/Library/Caches/uv/builds-v0/.tmpj8fibk/lib/python3.11/site-packages/wheel_stub/buildapi.py", line 29, in build_wheel
          return download_wheel(pathlib.Path(wheel_directory), config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/llane/Library/Caches/uv/builds-v0/.tmpj8fibk/lib/python3.11/site-packages/wheel_stub/wheel.py", line 251, in download_wheel
          report_install_failure(distribution, version, config, exception_context)
        File "/Users/llane/Library/Caches/uv/builds-v0/.tmpj8fibk/lib/python3.11/site-packages/wheel_stub/error.py", line 67, in report_install_failure
          raise InstallFailedError(
      wheel_stub.error.InstallFailedError:
      *******************************************************************************

      The installation of cugraph-cu12 for version 25.6.0 failed.

      This is a special placeholder package which downloads a real wheel package
      from https://pypi.nvidia.com/. If https://pypi.nvidia.com/ is not reachable, we
      cannot download the real wheel file to install.

      You might try installing this package via
      ```
      $ pip install --extra-index-url https://pypi.nvidia.com/ cugraph-cu12
      ```

      Here is some debug information about your platform to include in any bug
      report:

      Python Version: CPython 3.11.11
      Operating System: Darwin 24.6.0
      CPU Architecture: arm64
      nvidia-smi command not found. Ensure NVIDIA drivers are installed.

      *******************************************************************************


      hint: This usually indicates a problem with the package or the build environment.
  help: `cugraph-cu12` (v25.6.0) was included because `ray-curator[all]` (v0.1.0) depends on `cugraph-cu12>=25.6.dev0, <25.7.dev0`

"sphinx",
"myst-parser",
"sphinx-autodoc2",
"sphinx-copybutton",
"nvidia-sphinx-theme",
"sphinx-design",
"sphinxcontrib-mermaid",
"swagger-plugin-for-sphinx",
]

[project.scripts]
get_common_crawl_urls = "nemo_curator.scripts.get_common_crawl_urls:console_script"
get_wikipedia_urls = "nemo_curator.scripts.get_wikipedia_urls:console_script"
Expand Down
12 changes: 12 additions & 0 deletions ray-curator/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,18 @@ all = [
"ray_curator[video]",
]

[dependency-groups]
docs = [
"sphinx",
"myst-parser",
"sphinx-autodoc2",
"sphinx-copybutton",
"nvidia-sphinx-theme",
"sphinx-design",
"sphinxcontrib-mermaid",
"swagger-plugin-for-sphinx",
]

[tool.pixi.workspace]
channels = ["conda-forge"]
platforms = ["linux-64", "linux-aarch64"]
Expand Down
3 changes: 1 addition & 2 deletions requirements-docs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ sphinx-copybutton
nvidia-sphinx-theme
sphinx-autobuild
sphinx-design
pinecone
openai
docutils
python-dotenv
sphinxcontrib-mermaid
swagger-plugin-for-sphinx