Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][CI] Appveyor CI is failing on test_fs.py::test_get_file_info_with_selector #37675

Closed
raulcd opened this issue Sep 12, 2023 · 2 comments

Comments

@raulcd
Copy link
Member

raulcd commented Sep 12, 2023

Describe the bug, including details regarding any error messages, version, and platform.

As seen here on main:
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/48012043
and here on a PR:
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/48015180

Currently our Appveyor CI is failing on python/pyarrow/tests/test_fs.py::test_get_file_info_with_selector with:

================================== FAILURES ===================================
_ test_get_file_info_with_selector[PyFileSystem(FSSpecHandler(fsspec.LocalFileSystem()))] _
fs = <pyarrow._fs.PyFileSystem object at 0x0000027D2A466CE0>
pathfn = <function py_fsspec_localfs.<locals>.<lambda> at 0x0000027D2A373E30>
    def test_get_file_info_with_selector(fs, pathfn):
        base_dir = pathfn('selector-dir/')
        file_a = pathfn('selector-dir/test_file_a')
        file_b = pathfn('selector-dir/test_file_b')
        dir_a = pathfn('selector-dir/test_dir_a')
        file_c = pathfn('selector-dir/test_dir_a/test_file_c')
        dir_b = pathfn('selector-dir/test_dir_b')
    
        try:
            fs.create_dir(base_dir)
            with fs.open_output_stream(file_a):
                pass
            with fs.open_output_stream(file_b):
                pass
            fs.create_dir(dir_a)
            with fs.open_output_stream(file_c):
                pass
            fs.create_dir(dir_b)
    
            # recursive selector
            selector = FileSelector(base_dir, allow_not_found=False,
                                    recursive=True)
            assert selector.base_dir == base_dir
    
            infos = fs.get_file_info(selector)
            if fs.type_name == "py::fsspec+s3":
                # s3fs only lists directories if they are not empty, but depending
                # on the s3fs/fsspec version combo, it includes the base_dir
                # (https://github.com/dask/s3fs/issues/393)
                assert (len(infos) == 4) or (len(infos) == 5)
            else:
>               assert len(infos) == 5
E               AssertionError: assert 6 == 5
E                +  where 6 = len([<FileInfo for 'C:/Users/appveyor/AppData/Local/Temp/1/pytest-of-appveyor/pytest-0/test_get_file_info_with_select8/sel...p/1/pytest-of-appveyor/pytest-0/test_get_file_info_with_select8/selector-dir/test_file_b': type=FileType.File, size=0>])
pyarrow\tests\test_fs.py:707: AssertionError
_ test_get_file_info_with_selector[PyFileSystem(FSSpecHandler(fsspec.filesystem("memory")))] _
fs = <pyarrow._fs.PyFileSystem object at 0x0000027D2A681000>
pathfn = <function py_fsspec_memoryfs.<locals>.<lambda> at 0x0000027D2A7381B0>
    def test_get_file_info_with_selector(fs, pathfn):
        base_dir = pathfn('selector-dir/')
        file_a = pathfn('selector-dir/test_file_a')
        file_b = pathfn('selector-dir/test_file_b')
        dir_a = pathfn('selector-dir/test_dir_a')
        file_c = pathfn('selector-dir/test_dir_a/test_file_c')
        dir_b = pathfn('selector-dir/test_dir_b')
    
        try:
            fs.create_dir(base_dir)
            with fs.open_output_stream(file_a):
                pass
            with fs.open_output_stream(file_b):
                pass
            fs.create_dir(dir_a)
            with fs.open_output_stream(file_c):
                pass
            fs.create_dir(dir_b)
    
            # recursive selector
            selector = FileSelector(base_dir, allow_not_found=False,
                                    recursive=True)
            assert selector.base_dir == base_dir
    
            infos = fs.get_file_info(selector)
            if fs.type_name == "py::fsspec+s3":
                # s3fs only lists directories if they are not empty, but depending
                # on the s3fs/fsspec version combo, it includes the base_dir
                # (https://github.com/dask/s3fs/issues/393)
                assert (len(infos) == 4) or (len(infos) == 5)
            else:
>               assert len(infos) == 5
E               AssertionError: assert 6 == 5
E                +  where 6 = len([<FileInfo for '/selector-dir': type=FileType.Directory>, <FileInfo for '/selector-dir/test_dir_a': type=FileType.Dire...-dir/test_file_a': type=FileType.File, size=0>, <FileInfo for '/selector-dir/test_file_b': type=FileType.File, size=0>])
pyarrow\tests\test_fs.py:707: AssertionError
_ test_get_file_info_with_selector[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] _
fs = <pyarrow._fs.PyFileSystem object at 0x0000027D2A687E80>
pathfn = <method-wrapper '__add__' of str object at 0x0000027D1CFF0760>
    def test_get_file_info_with_selector(fs, pathfn):
        base_dir = pathfn('selector-dir/')
        file_a = pathfn('selector-dir/test_file_a')
        file_b = pathfn('selector-dir/test_file_b')
        dir_a = pathfn('selector-dir/test_dir_a')
        file_c = pathfn('selector-dir/test_dir_a/test_file_c')
        dir_b = pathfn('selector-dir/test_dir_b')
    
        try:
            fs.create_dir(base_dir)
            with fs.open_output_stream(file_a):
                pass
            with fs.open_output_stream(file_b):
                pass
            fs.create_dir(dir_a)
            with fs.open_output_stream(file_c):
                pass
            fs.create_dir(dir_b)
    
            # recursive selector
            selector = FileSelector(base_dir, allow_not_found=False,
                                    recursive=True)
            assert selector.base_dir == base_dir
    
            infos = fs.get_file_info(selector)
            if fs.type_name == "py::fsspec+s3":
                # s3fs only lists directories if they are not empty, but depending
                # on the s3fs/fsspec version combo, it includes the base_dir
                # (https://github.com/dask/s3fs/issues/393)
                assert (len(infos) == 4) or (len(infos) == 5)
            else:
                assert len(infos) == 5
    
            for info in infos:
                if (info.path.endswith(file_a) or info.path.endswith(file_b) or
                        info.path.endswith(file_c)):
                    assert info.type == FileType.File
                elif (info.path.rstrip("/").endswith(dir_a) or
                      info.path.rstrip("/").endswith(dir_b)):
                    assert info.type == FileType.Directory
                elif (fs.type_name == "py::fsspec+s3" and
                      info.path.rstrip("/").endswith("selector-dir")):
                    # s3fs can include base dir, see above
                    assert info.type == FileType.Directory
                else:
                    raise ValueError('unexpected path {}'.format(info.path))
                check_mtime_or_absent(info)
    
            # non-recursive selector -> not selecting the nested file_c
            selector = FileSelector(base_dir, recursive=False)
    
            infos = fs.get_file_info(selector)
            if fs.type_name == "py::fsspec+s3":
                # s3fs only lists directories if they are not empty
                # + for s3fs 0.5.2 all directories are dropped because of buggy
                # side-effect of previous find() call
                # (https://github.com/dask/s3fs/issues/410)
>               assert (len(infos) == 3) or (len(infos) == 2)
E               AssertionError: assert (4 == 3 or 4 == 2)
E                +  where 4 = len([<FileInfo for 'pyarrow-filesystem/selector-dir': type=FileType.Directory>, <FileInfo for 'pyarrow-filesystem/selector... type=FileType.File, size=0>, <FileInfo for 'pyarrow-filesystem/selector-dir/test_file_b': type=FileType.File, size=0>])
E                +  and   4 = len([<FileInfo for 'pyarrow-filesystem/selector-dir': type=FileType.Directory>, <FileInfo for 'pyarrow-filesystem/selector... type=FileType.File, size=0>, <FileInfo for 'pyarrow-filesystem/selector-dir/test_file_b': type=FileType.File, size=0

Component(s)

Continuous Integration, Python

@AlenkaF
Copy link
Member

AlenkaF commented Sep 12, 2023

There is already an open PR for this failure: #37558

@AlenkaF
Copy link
Member

AlenkaF commented Sep 12, 2023

Duplicate of #37555

@AlenkaF AlenkaF marked this as a duplicate of #37555 Sep 12, 2023
@raulcd raulcd closed this as completed Sep 12, 2023
@raulcd raulcd closed this as not planned Won't fix, can't repro, duplicate, stale Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants