Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Categorify #1692

Merged
merged 72 commits into from
May 30, 2023
Merged

Conversation

rjzamora
Copy link
Collaborator

@rjzamora rjzamora commented Oct 11, 2022

Note: When ready, this PR should supersede #1687

Current Algorithm

The current Categorify implementation was not designed for extremely large datasets. It was also not designed to handle categorical columns containing too many unique values to fit in a single cudf.DataFrame. The problem with large-scale datasets is the fact that the fit code does not use a proper tree reduction. The problem with high-cardinality columns is the fact that all unique groups must be written to (and read from) disk using a single cudf.DataFrame.

In both cases, the fit stage of Categorify is what limits scalability the most. The existing fit algorithm is as follows:

Screen Shot 2022-10-11 at 11 53 15 AM

  1. For each Dask-DataFrame partition, we perform the "Level-1" groupby aggregation for each categorical column. The output of the "Level 1" task is a dictionary of cudf/pandas.DataFrame results. Each dictionary value corresponds to the local groupby result for a distinct categorical-column + column-hash combination. The number of hash-based "splits" to use for each categorical column depends on the tree_width argument. In the figure above, there is only one categorical column (for simplicity), and tree_width=3.
  2. All "Level 1" outputs are combined by a distinct "Level 2" task for each unique categorical-column + column-hash combination. For the case above, where there is a single categorical column, the number of "Level 2" tasks is determined by tree_width. (Potential for OOM: Large-scale datasets may require that a significant amount of data be aggregated in each of these tasks - Even with a large tree_width setting)
  3. All "Level 2" tasks for the same categorical column are combined by a distinct "Level 3" task. The number of "Level 3" tasks is equivalent to the number of categorical columns in the system (one in the case above). This task is responsible for writing out the final unique-value summary to disk for a specific categorical column. (Potential for OOM: High-cardinality columns will need to fit entirely in memory for these tasks to succeed).

New Algorithm

This PR modifies the existing Categorify algorithm in two important ways: First, we perform a "proper" tree reduction to avoid OOM errors for large-scale datasets. Second, we persist unique values as a directory of parquet files (rather than a single parquet file) for high-cardinality columns. The general algorithm for the new Categorify.fit is as follows:

Screen Shot 2022-10-11 at 11 53 25 AM

  1. Same as step 1 for the old algorithm.
  2. We perform a seperate tree reduction for each distinct categorical-column + column-hash combination. Note that "Level 2" tasks are split into "Level 2(a, b, c, ...)" tasks (depending on the necessary height of the tree reduction).
  3. Rather than combining all "Level 2" tasks into a single parquet file for each categorical column, we use the "Level 2" tasks to create a temporary Dask-DataFrame collection for each categorical column. Note that this is only done if the temporary collection is expected to contain more than one partition. If a temporary DataFrame collection is created, it is written to disk with ddf.to_parquet(..., compute=False), and the Delayed result is used in place of the "Level 2" keys as the "Level 3" task dependency.
  4. The "Level 3" task will now (optionally) read the temporary DataFrame collection for a specific categorical columns from disk and then re-write the unique values to disk with a proper index, proper ordering, and a null-value.

Important limitations: The new algorithm will currently "fall back" to writing a single Parquet file for "complex" Categorify options (e.g. multi-column joint/combo encoding). The number of supported cases can certaibly be improved. However, the primary target for this PR is single-column encoding.

@rjzamora rjzamora added enhancement New feature or request MultiGPU Multi GPU support ops labels Oct 11, 2022
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1692 of commit a79a37e2979f11f10a0ad02af9711b466974a00f, no merge conflicts.
Running as SYSTEM
Setting status of a79a37e2979f11f10a0ad02af9711b466974a00f to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4748/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1692/*:refs/remotes/origin/pr/1692/* # timeout=10
 > git rev-parse a79a37e2979f11f10a0ad02af9711b466974a00f^{commit} # timeout=10
Checking out Revision a79a37e2979f11f10a0ad02af9711b466974a00f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a79a37e2979f11f10a0ad02af9711b466974a00f # timeout=10
Commit message: "uuse np.arange(... like)"
 > git rev-list --no-walk e91cf945a5ce3f36729309b3ac333181c6604bcb # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6871348817336990919.sh
GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py
test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.5.0+21.ga79a37e29.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.90,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.89,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@a79a37e2979f11f10a0ad02af9711b466974a00f#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.3,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='156367538'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
  Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-2gphxscx
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-2gphxscx
  Resolved https://github.com/NVIDIA-Merlin/core.git to commit 14a18dc0de5d5fd7737ecbadf9f6d7fa5d801b67
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (21.3)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (1.3.5)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (3.19.5)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (4.64.1)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (1.2.5)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.5.0)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (7.0.0)
Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.3.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (1.10.0)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.3.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (0.55.1)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (0.4.3)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (0.12.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (5.4.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.4.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (8.1.3)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (5.8.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.0.4)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (3.1.2)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.7.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (6.1)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.0.0)
Requirement already satisfied: numpy<1.22,>=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (1.20.3)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (0.38.1)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (65.3.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.7.0+9.g14a18dc) (3.0.9)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (2022.2.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (2.8.2)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.52.0)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.0.1)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.1.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (4.0.0)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (6.0.1)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.7.0+9.g14a18dc-py3-none-any.whl size=118253 sha256=50deeab28814e390243922ff18c7e10f9245f8fc8253231cf8b31d6a35b46721
  Stored in directory: /tmp/pip-ephem-wheel-cache-i6179l5j/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
  Attempting uninstall: merlin-core
    Found existing installation: merlin-core 0.3.0+12.g78ecddd
    Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
    Can't uninstall 'merlin-core'. No files were found to uninstall.
Successfully installed merlin-core-0.7.0+9.g14a18dc
test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1441 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py .... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/examples/test_01-Getting-started.py . [ 12%]
tests/unit/examples/test_02-Advanced-NVTabular-workflow.py . [ 12%]
tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py . [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
..................................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 46%]
tests/unit/ops/test_groupyby.py ..................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 6 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_categorify.py::test_categorify_size[24B-True-True]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/categorify.py:1214: UserWarning: Category DataFrame (with columns: Index(['session_id', 'session_id_size'], dtype='object')) is 304 bytes in size. This is large compared to the suggested upper limit of 24 bytes!(12.5% of the total memory by default)
warnings.warn(

tests/unit/ops/test_categorify.py::test_categorify_size[24B-True-False]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/categorify.py:1214: UserWarning: Category DataFrame (with columns: Index(['session_id', 'session_id_size'], dtype='object')) is 240 bytes in size. This is large compared to the suggested upper limit of 24 bytes!(12.5% of the total memory by default)
warnings.warn(

tests/unit/ops/test_categorify.py::test_categorify_size[24B-False-True]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/categorify.py:1214: UserWarning: Category DataFrame (with columns: Index(['session_id', 'session_id_size'], dtype='object')) is 288 bytes in size. This is large compared to the suggested upper limit of 24 bytes!(12.5% of the total memory by default)
warnings.warn(

tests/unit/ops/test_categorify.py::test_categorify_size[24B-False-False]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/categorify.py:1214: UserWarning: Category DataFrame (with columns: Index(['session_id', 'session_id_size'], dtype='object')) is 160 bytes in size. This is large compared to the suggested upper limit of 24 bytes!(12.5% of the total memory by default)
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/transforms/init.py 1 1 0%
merlin/transforms/ops/init.py 1 1 0%

TOTAL 2 2 0%

=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:14: could not import 'moto': No module named 'moto'
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: needs horovod
========== 1440 passed, 2 skipped, 262 warnings in 1083.37s (0:18:03) ==========
/usr/local/lib/python3.8/dist-packages/coverage/control.py:801: CoverageWarning: No data was collected. (no-data-collected)
self._warn("No data was collected.", slug="no-data-collected")
/usr/local/lib/python3.8/dist-packages/coverage/data.py:130: CoverageWarning: Data file '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.coverage.10.20.17.231.8838.853661' doesn't seem to be a coverage data file: cannot unpack non-iterable NoneType object
data._warn(str(exc))
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins15497597721057040107.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1692 of commit 79886ef0e32859edcc0e9d840b45c14c268074f5, no merge conflicts.
Running as SYSTEM
Setting status of 79886ef0e32859edcc0e9d840b45c14c268074f5 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4749/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1692/*:refs/remotes/origin/pr/1692/* # timeout=10
 > git rev-parse 79886ef0e32859edcc0e9d840b45c14c268074f5^{commit} # timeout=10
Checking out Revision 79886ef0e32859edcc0e9d840b45c14c268074f5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 79886ef0e32859edcc0e9d840b45c14c268074f5 # timeout=10
Commit message: "Merge branch 'main' into refactor-categorify-2"
 > git rev-list --no-walk a79a37e2979f11f10a0ad02af9711b466974a00f # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins15613217120649781849.sh
GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py
test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.5.0+23.g79886ef0e.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.90,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.89,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@79886ef0e32859edcc0e9d840b45c14c268074f5#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.3,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='620898182'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
  Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-zpu16iu7
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-zpu16iu7
  Resolved https://github.com/NVIDIA-Merlin/core.git to commit 14a18dc0de5d5fd7737ecbadf9f6d7fa5d801b67
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.3.0)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (1.3.5)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (1.10.0)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (7.0.0)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.5.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (0.55.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (21.3)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (3.19.5)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.3.0)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (1.2.5)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (4.64.1)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (0.4.3)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (0.12.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (5.4.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.0.4)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.0.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (8.1.3)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (5.8.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (6.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (3.1.2)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.7.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.4.0)
Requirement already satisfied: numpy<1.22,>=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (1.20.3)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (0.38.1)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (65.3.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.7.0+9.g14a18dc) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (2022.2.1)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.52.0)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.0.1)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.1.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (4.0.0)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.7.0+9.g14a18dc) (6.0.1)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.7.0+9.g14a18dc-py3-none-any.whl size=118253 sha256=b1346a6831e7bbadeb97888405cc54228b50abe4775e9a31f43a9c0cb619298e
  Stored in directory: /tmp/pip-ephem-wheel-cache-ob6nej2l/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
  Attempting uninstall: merlin-core
    Found existing installation: merlin-core 0.3.0+12.g78ecddd
    Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
    Can't uninstall 'merlin-core'. No files were found to uninstall.
Successfully installed merlin-core-0.7.0+9.g14a18dc
test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1441 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py .... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/examples/test_01-Getting-started.py . [ 12%]
tests/unit/examples/test_02-Advanced-NVTabular-workflow.py . [ 12%]
tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py . [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
..................................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 46%]
tests/unit/ops/test_groupyby.py ..................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 6 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_categorify.py::test_categorify_size[24B-True-True]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/categorify.py:1214: UserWarning: Category DataFrame (with columns: Index(['session_id', 'session_id_size'], dtype='object')) is 304 bytes in size. This is large compared to the suggested upper limit of 24 bytes!(12.5% of the total memory by default)
warnings.warn(

tests/unit/ops/test_categorify.py::test_categorify_size[24B-True-False]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/categorify.py:1214: UserWarning: Category DataFrame (with columns: Index(['session_id', 'session_id_size'], dtype='object')) is 240 bytes in size. This is large compared to the suggested upper limit of 24 bytes!(12.5% of the total memory by default)
warnings.warn(

tests/unit/ops/test_categorify.py::test_categorify_size[24B-False-True]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/categorify.py:1214: UserWarning: Category DataFrame (with columns: Index(['session_id', 'session_id_size'], dtype='object')) is 288 bytes in size. This is large compared to the suggested upper limit of 24 bytes!(12.5% of the total memory by default)
warnings.warn(

tests/unit/ops/test_categorify.py::test_categorify_size[24B-False-False]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/categorify.py:1214: UserWarning: Category DataFrame (with columns: Index(['session_id', 'session_id_size'], dtype='object')) is 160 bytes in size. This is large compared to the suggested upper limit of 24 bytes!(12.5% of the total memory by default)
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/transforms/init.py 1 1 0%
merlin/transforms/ops/init.py 1 1 0%

TOTAL 2 2 0%

=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:14: could not import 'moto': No module named 'moto'
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: needs horovod
========== 1440 passed, 2 skipped, 262 warnings in 1068.15s (0:17:48) ==========
/usr/local/lib/python3.8/dist-packages/coverage/control.py:801: CoverageWarning: No data was collected. (no-data-collected)
self._warn("No data was collected.", slug="no-data-collected")
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins11922725410530202878.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1692 of commit d8455893cf8198953324236c70df277521b6108a, no merge conflicts.
Running as SYSTEM
Setting status of d8455893cf8198953324236c70df277521b6108a to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4762/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1692/*:refs/remotes/origin/pr/1692/* # timeout=10
 > git rev-parse d8455893cf8198953324236c70df277521b6108a^{commit} # timeout=10
Checking out Revision d8455893cf8198953324236c70df277521b6108a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d8455893cf8198953324236c70df277521b6108a # timeout=10
Commit message: "improve test coverage"
 > git rev-list --no-walk 43381fbced6bcc08be660276c8f72d5b1d9c6029 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7404145749028268135.sh
GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py
test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.6.0+22.gd8455893c.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.26.2,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.28.2,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@d8455893cf8198953324236c70df277521b6108a#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3771611534'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
  Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-minx4e2t
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-minx4e2t
  Resolved https://github.com/NVIDIA-Merlin/core.git to commit 2c621a26a2b1b7ed786c99bd7c2790b9fc675098
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.8.0+3.g2c621a2) (2022.3.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.8.0+3.g2c621a2) (3.19.5)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.8.0+3.g2c621a2) (1.10.0)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.8.0+3.g2c621a2) (7.0.0)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.8.0+3.g2c621a2) (1.2.5)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.8.0+3.g2c621a2) (1.3.5)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.8.0+3.g2c621a2) (4.64.1)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.8.0+3.g2c621a2) (0.55.1)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.8.0+3.g2c621a2) (2022.5.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.8.0+3.g2c621a2) (21.3)
Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.8.0+3.g2c621a2) (2022.3.0)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.8.0+3.g2c621a2) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.8.0+3.g2c621a2) (0.4.3)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (2.2.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (1.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (0.12.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (5.4.1)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (5.8.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (8.1.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (3.1.2)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (2.0.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (2.4.0)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (1.0.4)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (6.1)
Requirement already satisfied: numpy<1.22,>=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.8.0+3.g2c621a2) (1.20.3)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.8.0+3.g2c621a2) (0.38.1)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.8.0+3.g2c621a2) (65.4.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.8.0+3.g2c621a2) (3.0.9)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.8.0+3.g2c621a2) (2022.2.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.8.0+3.g2c621a2) (2.8.2)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.8.0+3.g2c621a2) (1.2.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.8.0+3.g2c621a2) (1.52.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core==0.8.0+3.g2c621a2) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.8.0+3.g2c621a2) (6.0.2)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.8.0+3.g2c621a2) (4.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.8.0+3.g2c621a2) (2.1.1)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.8.0+3.g2c621a2) (6.0.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.8.0+3.g2c621a2) (4.0.0)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.8.0+3.g2c621a2-py3-none-any.whl size=118254 sha256=67b6084de7dd068d8ac6efb760b652a9eeb85a64f9521e7f019212533d6a656c
  Stored in directory: /tmp/pip-ephem-wheel-cache-9xevb5sf/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
  Attempting uninstall: merlin-core
    Found existing installation: merlin-core 0.3.0+12.g78ecddd
    Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
    Can't uninstall 'merlin-core'. No files were found to uninstall.
Successfully installed merlin-core-0.8.0+3.g2c621a2

[notice] A new release of pip available: 22.2.2 -> 22.3
[notice] To update, run: pip install --upgrade pip
test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 1447 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py .... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/examples/test_01-Getting-started.py . [ 12%]
tests/unit/examples/test_02-Advanced-NVTabular-workflow.py . [ 12%]
tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py . [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................F.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
......................................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 46%]
tests/unit/ops/test_groupyby.py ....................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 60%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 68%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
........................................................F. [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
____________________________ test_horovod_multigpu _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_horovod_multigpu0')

@pytest.mark.skipif(
    os.environ.get("NR_USER") is not None, reason="not working correctly in ci environment"
)
@pytest.mark.skipif(importlib.util.find_spec("horovod") is None, reason="needs horovod")
@pytest.mark.skipif(
    cupy and cupy.cuda.runtime.getDeviceCount() <= 1,
    reason="This unittest requires multiple gpu's to run",
)
def test_horovod_multigpu(tmpdir):
    json_sample = {
        "conts": {},
        "cats": {
            "genres": {
                "dtype": None,
                "cardinality": 50,
                "min_entry_size": 1,
                "max_entry_size": 5,
                "multi_min": 2,
                "multi_max": 4,
                "multi_avg": 3,
            },
            "movieId": {
                "dtype": None,
                "cardinality": 500,
                "min_entry_size": 1,
                "max_entry_size": 5,
            },
            "userId": {"dtype": None, "cardinality": 500, "min_entry_size": 1, "max_entry_size": 5},
        },
        "labels": {"rating": {"dtype": None, "cardinality": 2}},
    }
    cols = datagen._get_cols_from_schema(json_sample)
    df_gen = datagen.DatasetGen(datagen.UniformDistro(), gpu_frac=0.0001)
    target_path = os.path.join(tmpdir, "input/")
    os.mkdir(target_path)
    df_files = df_gen.full_df_create(10000, cols, output=target_path)
    # process them
    cat_features = nvt.ColumnSelector(["userId", "movieId", "genres"]) >> nvt.ops.Categorify()
    ratings = nvt.ColumnSelector(["rating"]) >> nvt.ops.LambdaOp(
        lambda col: (col > 3).astype("int8")
    )
    output = cat_features + ratings
    proc = nvt.Workflow(output)
    target_path_train = os.path.join(tmpdir, "train/")
    os.mkdir(target_path_train)
    proc.fit_transform(nvt.Dataset(df_files)).to_parquet(
        output_path=target_path_train, out_files_per_proc=5
    )
    # add new location
    target_path = os.path.join(tmpdir, "workflow/")
    os.mkdir(target_path)
    proc.save(target_path)
    curr_path = os.path.abspath(__file__)
    repo_root = os.path.relpath(os.path.normpath(os.path.join(curr_path, "../../../..")))
    hvd_wrap_path = os.path.join(repo_root, "examples/multi-gpu-movielens/hvd_wrapper.sh")
    hvd_exam_path = os.path.join(repo_root, "examples/multi-gpu-movielens/tf_trainer.py")
  with subprocess.Popen(
        [
            "horovodrun",
            "-np",
            "2",
            "-H",
            "localhost:2",
            "sh",
            hvd_wrap_path,
            "python",
            hvd_exam_path,
            "--dir_in",
            f"{tmpdir}",
            "--batch_size",
            "1024",
        ],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    ) as process:

tests/unit/loader/test_tf_dataloader.py:585:


/usr/lib/python3.8/subprocess.py:858: in init
self._execute_child(args, executable, preexec_fn, close_fds,


self = <subprocess.Popen object at 0x7fa115dd51c0>
args = ['horovodrun', '-np', '2', '-H', 'localhost:2', 'sh', ...]
executable = b'horovodrun', preexec_fn = None, close_fds = True, pass_fds = ()
cwd = None, env = None, startupinfo = None, creationflags = 0, shell = False
p2cread = -1, p2cwrite = -1, c2pread = 101, c2pwrite = 102, errread = 103
errwrite = 104, restore_signals = True, start_new_session = False

def _execute_child(self, args, executable, preexec_fn, close_fds,
                   pass_fds, cwd, env,
                   startupinfo, creationflags, shell,
                   p2cread, p2cwrite,
                   c2pread, c2pwrite,
                   errread, errwrite,
                   restore_signals, start_new_session):
    """Execute program (POSIX version)"""

    if isinstance(args, (str, bytes)):
        args = [args]
    elif isinstance(args, os.PathLike):
        if shell:
            raise TypeError('path-like args is not allowed when '
                            'shell is true')
        args = [args]
    else:
        args = list(args)

    if shell:
        # On Android the default shell is at '/system/bin/sh'.
        unix_shell = ('/system/bin/sh' if
                  hasattr(sys, 'getandroidapilevel') else '/bin/sh')
        args = [unix_shell, "-c"] + args
        if executable:
            args[0] = executable

    if executable is None:
        executable = args[0]

    sys.audit("subprocess.Popen", executable, args, cwd, env)

    if (_USE_POSIX_SPAWN
            and os.path.dirname(executable)
            and preexec_fn is None
            and not close_fds
            and not pass_fds
            and cwd is None
            and (p2cread == -1 or p2cread > 2)
            and (c2pwrite == -1 or c2pwrite > 2)
            and (errwrite == -1 or errwrite > 2)
            and not start_new_session):
        self._posix_spawn(args, executable, env, restore_signals,
                          p2cread, p2cwrite,
                          c2pread, c2pwrite,
                          errread, errwrite)
        return

    orig_executable = executable

    # For transferring possible exec failure from child to parent.
    # Data format: "exception name:hex errno:description"
    # Pickle is not used; it is complex and involves memory allocation.
    errpipe_read, errpipe_write = os.pipe()
    # errpipe_write must not be in the standard io 0, 1, or 2 fd range.
    low_fds_to_close = []
    while errpipe_write < 3:
        low_fds_to_close.append(errpipe_write)
        errpipe_write = os.dup(errpipe_write)
    for low_fd in low_fds_to_close:
        os.close(low_fd)
    try:
        try:
            # We must avoid complex work that could involve
            # malloc or free in the child process to avoid
            # potential deadlocks, thus we do all this here.
            # and pass it to fork_exec()

            if env is not None:
                env_list = []
                for k, v in env.items():
                    k = os.fsencode(k)
                    if b'=' in k:
                        raise ValueError("illegal environment variable name")
                    env_list.append(k + b'=' + os.fsencode(v))
            else:
                env_list = None  # Use execv instead of execve.
            executable = os.fsencode(executable)
            if os.path.dirname(executable):
                executable_list = (executable,)
            else:
                # This matches the behavior of os._execvpe().
                executable_list = tuple(
                    os.path.join(os.fsencode(dir), executable)
                    for dir in os.get_exec_path(env))
            fds_to_keep = set(pass_fds)
            fds_to_keep.add(errpipe_write)
            self.pid = _posixsubprocess.fork_exec(
                    args, executable_list,
                    close_fds, tuple(sorted(map(int, fds_to_keep))),
                    cwd, env_list,
                    p2cread, p2cwrite, c2pread, c2pwrite,
                    errread, errwrite,
                    errpipe_read, errpipe_write,
                    restore_signals, start_new_session, preexec_fn)
            self._child_created = True
        finally:
            # be sure the FD is closed no matter what
            os.close(errpipe_write)

        self._close_pipe_fds(p2cread, p2cwrite,
                             c2pread, c2pwrite,
                             errread, errwrite)

        # Wait for exec to fail or succeed; possibly raising an
        # exception (limited in size)
        errpipe_data = bytearray()
        while True:
            part = os.read(errpipe_read, 50000)
            errpipe_data += part
            if not part or len(errpipe_data) > 50000:
                break
    finally:
        # be sure the FD is closed no matter what
        os.close(errpipe_read)

    if errpipe_data:
        try:
            pid, sts = os.waitpid(self.pid, 0)
            if pid == self.pid:
                self._handle_exitstatus(sts)
            else:
                self.returncode = sys.maxsize
        except ChildProcessError:
            pass

        try:
            exception_name, hex_errno, err_msg = (
                    errpipe_data.split(b':', 2))
            # The encoding here should match the encoding
            # written in by the subprocess implementations
            # like _posixsubprocess
            err_msg = err_msg.decode()
        except ValueError:
            exception_name = b'SubprocessError'
            hex_errno = b'0'
            err_msg = 'Bad exception data from child: {!r}'.format(
                          bytes(errpipe_data))
        child_exception_type = getattr(
                builtins, exception_name.decode('ascii'),
                SubprocessError)
        if issubclass(child_exception_type, OSError) and hex_errno:
            errno_num = int(hex_errno, 16)
            child_exec_never_called = (err_msg == "noexec")
            if child_exec_never_called:
                err_msg = ""
                # The error must be from chdir(cwd).
                err_filename = cwd
            else:
                err_filename = orig_executable
            if errno_num != 0:
                err_msg = os.strerror(errno_num)
          raise child_exception_type(errno_num, err_msg, err_filename)

E FileNotFoundError: [Errno 2] No such file or directory: 'horovodrun'

/usr/lib/python3.8/subprocess.py:1704: FileNotFoundError
______________________ test_workflow_transform_ddf_dtypes ______________________

@pytest.mark.skipif(not cudf, reason="needs cudf")
def test_workflow_transform_ddf_dtypes():
    # Initial Dataset
    dtypes = {"name": str, "id": int, "x": float, "y": float}
  df = cudf.datasets.timeseries(dtypes=dtypes).reset_index()

tests/unit/workflow/test_workflow.py:613:


/usr/local/lib/python3.8/dist-packages/cudf/datasets.py:67: in timeseries
gdf = cudf.from_pandas(df)
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:6711: in from_pandas
return DataFrame.from_pandas(obj, nan_as_null=nan_as_null)
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:4547: in from_pandas
df[col_name] = column.as_column(
/usr/local/lib/python3.8/dist-packages/cudf/core/column/column.py:1966: in as_column
data = as_column(
/usr/local/lib/python3.8/dist-packages/cudf/core/column/column.py:1760: in as_column
col = ColumnBase.from_arrow(arbitrary)
/usr/local/lib/python3.8/dist-packages/cudf/core/column/column.py:297: in from_arrow
result = libcudf.interop.from_arrow(data)[0]


???
E MemoryError: std::bad_alloc: out_of_memory: CUDA error at: /opt/rapids/rmm/include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorMemoryAllocation out of memory

cudf/_lib/interop.pyx:150: MemoryError
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 6 warnings
tests/unit/workflow/test_workflow.py: 77 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/loader/test_tf_dataloader.py::test_horovod_multigpu
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 5 files.
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/transforms/init.py 1 1 0%
merlin/transforms/ops/init.py 1 1 0%

TOTAL 2 2 0%

=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:14: could not import 'moto': No module named 'moto'
===== 2 failed, 1445 passed, 1 skipped, 258 warnings in 1156.11s (0:19:16) =====
/usr/local/lib/python3.8/dist-packages/coverage/control.py:801: CoverageWarning: No data was collected. (no-data-collected)
self._warn("No data was collected.", slug="no-data-collected")
/usr/local/lib/python3.8/dist-packages/coverage/data.py:130: CoverageWarning: Couldn't use data file '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.coverage.10.20.17.231.20685.800840': database disk image is malformed
data._warn(str(exc))
ERROR: InvocationError for command /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins12080688404583274341.sh

@rjzamora rjzamora marked this pull request as ready for review March 23, 2023 18:11
Comment on lines +242 to +250
# Warn user if they set num_buckets without setting max_size or
# freq_threshold - This setting used to hash everything, but will
# now just use multiple indices for OOV encodings at transform time
if num_buckets and not (max_size or freq_threshold):
warnings.warn(
"You are setting num_buckets without using max_size or "
"freq_threshold to restrict the number of distinct "
"categories. Are you sure this is what you want?"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnyak - This is one detail I am still unsure about. I really don't want to make it easy for users to "hash everything" (what currently happens when the user only sets num_buckets). However, it is still strange for the user to *only set num_buckets (now for a slightly-different reason). My issue with the original behavior is that there is really no reason for Catogorify to be used at all for the case of "hash everything," because we are doing a lot of unnecessary work that can be accomplished by a simple lambda op.

Copy link
Contributor

@rnyak rnyak Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjzamora I am ok to take num_buckets out of Categorify. So far, we dont know a Merlin user who only wants to use num_buckets (IIRC). Not sure anyone wants to really hash entire item catalog only with num_buckets.. then how they will reverse back to original ids is a question mark. but I leave the decision to you and @karlhigley. And as you said if this can be easily done with a lambda op, we can just do it that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any strong opinions here; removing it sounds fine to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok to take num_buckets out of Categorify

Just to clarify, I'm not really proposing that we remove the num_buckets option, but that we slightly change it's meaning to only do what it used to do when max_size or freq_threshold was set:

  • OLD meaning: We hash everything, UNLESS max_size or freq_threshold is also set (in which case we use num_buckets to be the number of hash-buckets to use for infrequent OOV values at transformation time).
  • NEW meaning: We always take num_buckets to be the number of hash-buckets to use for infrequent OOV values at transformation time. Even if neither max_size nor freq_threshold is set, we will still use a hash+modulo for OOV values at transformation time if num_buckets>1.

@karlhigley karlhigley added this to the Merlin 23.04 milestone Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change enhancement New feature or request MultiGPU Multi GPU support ops
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants