Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Saving workflows with Categorify or TargetEncoding fails to write stats files #1837

Closed
nv-alaiacano opened this issue Jun 7, 2023 · 2 comments
Assignees
Labels
bug Something isn't working P0
Milestone

Comments

@nv-alaiacano
Copy link
Contributor

Describe the bug

When using certain ops that write intermediate stats files, there is an error copying them to the destination path when saving a workflow. The error seems to be with the _copy_storage helper function in categorify.py which is used by:

  • Categorify
  • TargetEncoding
  • JoinGroupby
E               IsADirectoryError: [Errno 21] Is a directory: '/workspace//categories/cat_stats.user_id.parquet'

Steps/Code to reproduce bug

The following test produces the error:

def test_categorify_saves(tmpdir):
    N = 100

    df = Dataset(
        make_df(
            {
                "a": np.random.randint(0, 100000, N),
                "item_id": np.random.randint(0, 100, N),
                "user_id": np.random.randint(0, 100, N),
                "click": np.random.randint(0, 2, N),
            }
        ),
    )

    outputs = ["a"] >> nvt.ops.Categorify()

    continuous = (
        ["user_id", "item_id"]
        >> nvt.ops.TargetEncoding(["click"], kfold=1, p_smooth=20)
        >> nvt.ops.Normalize()
    )
    outputs += continuous
    wf = nvt.Workflow(outputs)

    wf.fit(df)
    wf.save(tmpdir)

Expected behavior

Files get copied properly to the destination.

@nv-alaiacano nv-alaiacano added the bug Something isn't working label Jun 7, 2023
@nv-alaiacano nv-alaiacano self-assigned this Jun 7, 2023
@nv-alaiacano nv-alaiacano added this to the Merlin 23.06 milestone Jun 7, 2023
@nv-alaiacano nv-alaiacano added the P0 label Jun 7, 2023
@viswa-nvidia
Copy link

@nv-alaiacano please link the PR and close this

@nv-alaiacano
Copy link
Contributor Author

Resolved: #1838

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0
Projects
None yet
Development

No branches or pull requests

2 participants