Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[main] Rebuild for orc 2.1.0 #1697

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

regro-cf-autotick-bot
Copy link
Contributor

This PR has been triggered in an effort to update orc210.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.


If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase @conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/13137423292 - please use this URL for debugging.

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Feb 4, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13431821547. Examine the logs at this URL for more detail.

@h-vetinari h-vetinari changed the title Rebuild for orc 2.1.0 [main] Rebuild for orc 2.1.0 Feb 4, 2025
@h-vetinari
Copy link
Member

h-vetinari commented Feb 4, 2025

@kou, any idea why a previously working recipe for unchanged versions (affects all versions we build: 19.x, 18.x, 17.x, 16.x) would suddenly start failing over some aws config:

CMake Error at cmake_modules/ThirdpartyToolchain.cmake:4941 (include):
  include could not find requested file:

    AWSSDKVariables
Call Stack (most recent call first):
  CMakeLists.txt:544 (include)


CMake Error at cmake_modules/ThirdpartyToolchain.cmake:310 (find_package):
  By not providing "FindAWSSDKAlt.cmake" in CMAKE_MODULE_PATH this project
  has asked CMake to find a package configuration file provided by
  "AWSSDKAlt", but CMake did not find one.

  Could not find a package configuration file provided by "AWSSDKAlt" with
  any of the following names:

    AWSSDKAltConfig.cmake
    awssdkalt-config.cmake

  Add the installation prefix of "AWSSDKAlt" to CMAKE_PREFIX_PATH or set
  "AWSSDKAlt_DIR" to a directory containing one of the above files.  If
  "AWSSDKAlt" provides a separate development package or SDK, be sure it has
  been installed.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:5324 (resolve_dependency)
  CMakeLists.txt:544 (include)

This file ($SP_DIR/cmake_modules/AWSSDKVariables.cmake) is actually something pyarrow has been shipping itself, so it's not even that the change would be explainable by a new version of aws-sdk-cpp or cmake...?

@h-vetinari
Copy link
Member

Ping @kou @pitrou @raulcd how to fix the AWSSDKVariables issue that cropped up

@raulcd
Copy link
Member

raulcd commented Feb 11, 2025

Strangely enough, it seems the orc update has something to do with it because I triggered a no change build here:

And it's able to build AWS SDK without problems.
Could it be related to this? This fix is not part of 2.1.0

FYI @wgtmac

@raulcd
Copy link
Member

raulcd commented Feb 11, 2025

Yes, we had this PR for Arrow to use ORC 2.1.0, which is applying that patch when building orc from source:

@wgtmac
Copy link

wgtmac commented Feb 11, 2025

It should be highly relevant. Apache ORC 2.1.1 is planned in this April. Should we wait for it or get it released ASAP? @raulcd @h-vetinari

cc @dongjoon-hyun @williamhyun

@raulcd
Copy link
Member

raulcd commented Feb 12, 2025

I am unsure whether this should be fixed on the ORC recipe or we want to add the patch from the PR I pointed above here. I suppose this could happen for other projects, right? @h-vetinari I am happy to temporarily add the patch here if you think this should be the approach.

@h-vetinari
Copy link
Member

We'd need to patch orc itself, if we're comfortable with backporting the patch instead of waiting for v2.1.1. If we don't want to backport this to orc more generally, then version 2.1.0 is essentially not usable, and we'll just have to sit it out until the next version.

@h-vetinari h-vetinari force-pushed the rebuild-orc210-0-1_hca57c9 branch from 8e703c9 to e9a976f Compare February 15, 2025 20:35
@williamhyun
Copy link

Thank you, @wgtmac, for bringing this issue to my attention. I am open to discussing the possibility of releasing ORC version 2.1.1 earlier than the planned date in April. I will bring this matter to the ORC community for further discussion.

@williamhyun
Copy link

Here is the discussion in the ORC dev community, feel free to follow along.
https://lists.apache.org/thread/ljm7zyjqgfmgvpgkkhl4bfqr9g52rmfk

@wgtmac
Copy link

wgtmac commented Feb 19, 2025

@h-vetinari @raulcd How can we verify that patching apache/orc@3eb423a is sufficient? I think we need to figure it out before releasing Apache ORC 2.1.1 in case of any followup fix.

@h-vetinari h-vetinari force-pushed the rebuild-orc210-0-1_hca57c9 branch from e9a976f to 7b5c4d8 Compare February 19, 2025 02:38
@h-vetinari
Copy link
Member

h-vetinari commented Feb 19, 2025

How can we verify that patching apache/orc@3eb423a is sufficient?

I backported that commit in conda-forge/orc-feedstock#85 (without publishing this to the main channel so that it starts getting installed everywhere), and it appears that it is not sufficient, unfortunately.

@wgtmac
Copy link

wgtmac commented Feb 19, 2025

How can I debug it?

@h-vetinari
Copy link
Member

h-vetinari commented Feb 20, 2025

OK, I think I have the fix. Looking at apache/arrow@7fc8222 and

-- SNAPPY_HOME: $PREFIX
-- Found the Snappy header: $PREFIX/include/snappy.h
-- Found the Snappy library: $PREFIX/lib/libsnappy.so
-- ZSTD_HOME: 
-- Could not find zstd in system search paths.
CMake Warning at cmake_modules/FindorcAlt.cmake:29 (find_package):
  Found package configuration file:

    $PREFIX/lib/cmake/orc/orcConfig.cmake

  but it set orc_FOUND to FALSE so package "orc" is considered to be NOT
  FOUND.  Reason given by package:

  orc could not be found because dependency ZSTD could not be found.

I added some ZSTD_* variables to the CMake invocations, which are apparently enough for orc to load successfully through cmake_modules/FindorcAlt.cmake. IMO, orc should not depend on these bare variables, but first try find_package(zstd).

@h-vetinari h-vetinari force-pushed the rebuild-orc210-0-1_hca57c9 branch from 16c9f81 to f9096f4 Compare February 20, 2025 03:21
@wgtmac
Copy link

wgtmac commented Feb 20, 2025

@h-vetinari Is it possible to remove these workarounds by set(ORC_PACKAGE_KIND conan) or set(ORC_PACKAGE_KIND vcpkg)? If any works, we should add conda as well. See https://github.com/apache/orc/blob/main/cmake_modules/ThirdpartyToolchain.cmake#L268-L282

@h-vetinari
Copy link
Member

Is it possible to remove these workarounds by set(ORC_PACKAGE_KIND conan) or set(ORC_PACKAGE_KIND vcpkg)?

I can try, but runtime (from the POV of orc; i.e. when building arrow) is the wrong time to set this. We should be setting this once when building orc, and then orc should just use the selected method of dependency discovery in the packaged CMake metadata.

@h-vetinari
Copy link
Member

Is it possible to remove these workarounds by set(ORC_PACKAGE_KIND conan) or set(ORC_PACKAGE_KIND vcpkg)?

Where do you expect me to set this? In arrow's CMake files? I wanted to avoid patching, so I passed -DORC_PACKAGE_KIND=conan to cmake, but that just goes back to failing to find zstd.

@wgtmac
Copy link

wgtmac commented Feb 20, 2025

I can try, but runtime (from the POV of orc; i.e. when building arrow) is the wrong time to set this

Does it mean that building arrow will also trigger the build of orc? Doesn't it use a built orc?

Where do you expect me to set this? In arrow's CMake files? I wanted to avoid patching, so I passed -DORC_PACKAGE_KIND=conan to cmake, but that just goes back to failing to find zstd.

Patching is anyway needed for 2.1.0 :(. I mean you may not need to add ZSTD_* variables if the CMake variable ORC_PACKAGE_KIND has been set.

@h-vetinari
Copy link
Member

Does it mean that building arrow will also trigger the build of orc? Doesn't it use a built orc?

The arrow build is supposed to use a pre-built (by conda-forge) orc. This is why I find it confusing to talk about ORC_PACKAGE_KIND in the context of building arrow - at this point, orc should already be built and with unambiguous metadata (i.e. after building orc in conda-forge, we should know we can do find_package(...) for all the support libraries unconditionally).

Patching is anyway needed for 2.1.0

Well, yes, different places need to be patched. Orc needs apache/orc@3eb423a and arrow needs apache/arrow@7fc8222 (for the test changes). What I don't understand is where exactly (e.g. in orc or arrow) you imagine the patch ORC_PACKAGE_KIND patch. While I'm sure there's still something left to do, I'd prefer that no ORC_PACKAGE_KIND specification is necessary for consuming orc.

@wgtmac
Copy link

wgtmac commented Feb 21, 2025

arrow needs apache/arrow@7fc8222 (for the test changes)

Could you elaborate for the test changes?

What I don't understand is where exactly (e.g. in orc or arrow) you imagine the patch ORC_PACKAGE_KIND patch. While I'm sure there's still something left to do, I'd prefer that no ORC_PACKAGE_KIND specification is necessary for consuming orc.

Building Arrow does not require ORC_PACKAGE_KIND if ORC is not bundled. Patching apache/orc@3eb423a and set(ORC_PACKAGE_KIND conan) (or vcpkg) should be sufficient to build ORC and Arrow should have no issue to consume it. Could you confirm that?

@h-vetinari
Copy link
Member

arrow needs apache/arrow@7fc8222 (for the test changes)

Could you elaborate for the test changes?

Without the test changes from that commit, we run into:

[----------] 3 tests from TestAdapterRead
[ RUN      ] TestAdapterRead.ReadIntAndStringFileMultipleStripes
unknown file: Failure
C++ exception with description "Compression block size must be a multiple of memory block size." thrown in the test body.

[  FAILED  ] TestAdapterRead.ReadIntAndStringFileMultipleStripes (9 ms)
[ RUN      ] TestAdapterRead.ReadCharAndVarcharType
unknown file: Failure
C++ exception with description "Compression block size must be a multiple of memory block size." thrown in the test body.

[  FAILED  ] TestAdapterRead.ReadCharAndVarcharType (6 ms)
[ RUN      ] TestAdapterRead.ReadFieldAttributes
unknown file: Failure
C++ exception with description "Compression block size must be a multiple of memory block size." thrown in the test body.

[  FAILED  ] TestAdapterRead.ReadFieldAttributes (9 ms)
[----------] 3 tests from TestAdapterRead (25 ms total)

Building Arrow does not require ORC_PACKAGE_KIND if ORC is not bundled. Patching apache/orc@3eb423a and set(ORC_PACKAGE_KIND conan) (or vcpkg) should be sufficient to build ORC and Arrow should have no issue to consume it. Could you confirm that?

Please provide a diff where you expect set(ORC_PACKAGE_KIND conan) to be added, and I'll test it.

@h-vetinari
Copy link
Member

Here is the discussion in the ORC dev community, feel free to follow along.
https://lists.apache.org/thread/ljm7zyjqgfmgvpgkkhl4bfqr9g52rmfk

@williamhyun @wgtmac, I read the updates in that thread, and from my POV, an orc 2.1.1 release with apache/orc@3eb423a soon would be preferable to spending a bunch of time integrating conda-specifics (I'd still be happy to do that of course, but on a longer timeline). Right now my priority is to unblock arrow from not being compatible with orc 2.1.x, and if that requires us to set ZSTD_* and LZ4_* variables, then so be it.

To attempt another clarification on the open question (and why I think it's not worth dealing with this for 2.1.1): It is still not clear to me where you expect set(ORC_PACKAGE_KIND conan) to be added (hence my question above whether you can provide a diff that I can test).

AFAICT, it should not be hardcoded in a project's CMakeLists.txt, because projects can be built with different build systems -- that goes for orc, arrow and everything on top. So the project source CMake code needs to be generic, but when packaging for conda-forge, we are able to become much more specific, because we know how a conda-forge-packaged orc will be used. What I want is that while building orc for conda-forge, I get to hard-code the conan/vcpkg/conda-forge style of dependency discovery into the .cmake metadata that gets packaged, because any package in conda-forge relying on orc will have the required dependencies (+ the necessary metadata to do find_package(foo)) available.

@wgtmac
Copy link

wgtmac commented Feb 26, 2025

Now I understand why test changes are required. Yes, it is still required for Arrow until next release.

set(ORC_PACKAGE_KIND conan) or set(ORC_PACKAGE_KIND vcpkg) should be set like https://github.com/apache/orc/blob/main/conan/all/conanfile.py#L142 so I suppose the best place is https://github.com/conda-forge/orc-feedstock/blob/main/recipe/build.sh#L24. It would be good to test this against the latest commit from branch-2.1: https://github.com/apache/orc/tree/branch-2.1.

@h-vetinari
Copy link
Member

set(ORC_PACKAGE_KIND conan) or set(ORC_PACKAGE_KIND vcpkg) should be set like https://github.com/apache/orc/blob/main/conan/all/conanfile.py#L142 so I suppose the best place is https://github.com/conda-forge/orc-feedstock/blob/main/recipe/build.sh#L24.

This is what confused me, because the set(...) syntax is for CMakeLists.txt, not for the CMake invocation cmake -D... etc.

And indeed, passing -DORC_PACKAGE_KIND=conan to the cmake ... call without the specific variables like ZLIB_HOME doesn't work, even for the top of the branch-2.1 branch: conda-forge/orc-feedstock#86

@wgtmac
Copy link

wgtmac commented Feb 26, 2025

Could you please try vcpkg instead? For conan mode, find_package (ZLIB REQUIRED CONFIG) is called so we enforce the config mode but vcpkg mode does not enforce config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants