forked from facebookincubator/velox
-
Notifications
You must be signed in to change notification settings - Fork 4
Parquet connector refactor #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
devavret
wants to merge
94
commits into
rapidsai:velox-cudf
Choose a base branch
from
devavret:parquet-connector-refactor
base: velox-cudf
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Parquet connector refactor #50
devavret
wants to merge
94
commits into
rapidsai:velox-cudf
from
devavret:parquet-connector-refactor
+2,927
−1,523
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… And add special handling for single value that's out of column type's range
…lso make use of subfield filters to make AST now that it's available to our datasource
…incubator#14735) Summary: Pull Request resolved: facebookincubator#14735 SortingWriter::outputBatchRows() returns 0 if maxOutputBytesConfig_ is less than the estimated row size, making subsequent check fail. This method shall never return 0 as a correct behavior. Flooring it with 1 instead. Reviewed By: xiaoxmeng, amitkdutta Differential Revision: D81729964 fbshipit-source-id: 5f7ae20e3618e1f3ea5738c4ace565863b4ea511
…4779) Summary: Pull Request resolved: facebookincubator#14779 X-link: facebookexperimental/verax#371 Reviewed By: Yuhta Differential Revision: D81923560 fbshipit-source-id: bf400ae9d2002f03bd350cd21bb8e0c0b36ab2aa
…ebookincubator#14772) Summary: Pull Request resolved: facebookincubator#14772 Index lookup join doesn't fill the match column properly. It set the fill end bit to the number of bits to fill which is wrong. This is discovered by the extension to Meta internal use case. The existing index join unit test can't catch this because (1) the test always generate the hit probe rows first; (2) match verify logic only check lookup value is null if match value is false so this can't catch the issue. This PR fix the issues (1) randomize the probe hit rows; (2) check lookup value for both value null and not null cases. Verified that with this change, the improved index join unit test can catch the issue. Reviewed By: zacw7, mbasmanova Differential Revision: D80915369 fbshipit-source-id: f3788694542c5d66777ba40c04a492eb544c5432
…bator#14783) Summary: Pull Request resolved: facebookincubator#14783 misc: Clean up QDigest registration in FunctionBaseTest Reviewed By: duxiao1212 Differential Revision: D81950600 fbshipit-source-id: 2c04ab6cdcf5003f258c8fa4017927aa4b0234fc
…acebookincubator#14796) Summary: Pull Request resolved: facebookincubator#14796 Using utility casts.h function Reviewed By: xiaoxmeng Differential Revision: D81986594 fbshipit-source-id: c7303f50ad86c83f7495a1885f2b0170dcd44058
…kincubator#14784) Summary: Pull Request resolved: facebookincubator#14784 Connector factories are used only by Prestissimo to create multiple instances of the same kind of connector for different catalogs. These factories are not needed for other use cases. A follow-up would be to move connector factories out of Velox into Prestissimo. Reviewed By: xiaoxmeng Differential Revision: D81960874 fbshipit-source-id: 0688a4c2b24f9cf41f12bbf081ac5b0426a7db3e
Summary: We have been testing it in CI for a long time now and it has big benefits in binary sizes etc. so we should make it the default! Pull Request resolved: facebookincubator#14663 Reviewed By: xiaoxmeng Differential Revision: D81973981 Pulled By: Yuhta fbshipit-source-id: 350238d0c0da4263d6eae095fb0a4d7da7754856
Summary: This PR adds the Spark `timestamp_seconds` function. A key difference between `timestamp_second` and other variations of this function like `timestamp_millis` and `timestamp_micros` is that the `seconds` input parameter can be fractional ( whereas the `milliseconds` and `microseconds` input parameters are integers). Spark doc: https://spark.apache.org/docs/latest/api/sql/index.html#timestamp_seconds Spark code: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L596 Pull Request resolved: facebookincubator#14222 Reviewed By: mbasmanova Differential Revision: D81921635 Pulled By: Yuhta fbshipit-source-id: c01a3f77260443a385133e57cacac1ab7e0d58ef
Summary: Fixes facebookincubator#14275 Fixes facebookincubator#14756 This is not a complete CMake package support. This works only when: * `VELOX_BUILD_SHARED=ON` * `VELOX_BUILD_MINIMAL_WITH_DWIO=ON` * All dependencies are resolved from system (No bundled dependencies) FYI: I want to use this for Nimble: facebookincubator/nimble#215 Nimble uses `VELOX_BUILD_MINIMAL_WITH_DWIO=ON`. This is disabled by default. We can enabled this by specifying `VELOX_BUILD_CMAKE_PACKAGE=ON`. Users can find Velox by `find_package(Velox)`. We can expand supported cases step by step. How about this as the first step? Pull Request resolved: facebookincubator#14738 Reviewed By: mbasmanova Differential Revision: D81923615 Pulled By: Yuhta fbshipit-source-id: 8a7cb55c5e57c3b87b696fa7298cbd171e80d40d
Summary: This is follow-up work for facebookincubator#14375 (comment). Pull Request resolved: facebookincubator#14698 Reviewed By: mbasmanova Differential Revision: D81962221 Pulled By: Yuhta fbshipit-source-id: 572ba75309b75a101410b569528037245f0d9178
Summary: This fixes two categories of errors: - hidden declaration of a virtual function - initializer list would use explicit constructor The latter was previously addressed in tests but new tests were added that caused a re-occurrence of this problem. Pull Request resolved: facebookincubator#14781 Reviewed By: mbasmanova Differential Revision: D81962301 Pulled By: Yuhta fbshipit-source-id: 22eb32f6fc4b503cd71c6077f1ac38a565e4bf5e
Summary: Pull Request resolved: facebookincubator#14626 Add data type P4HyperLogLog Add casting of both from/to varbinary https://prestodb.io/docs/current/language/types.html#hyperloglog (check second subsection) https://prestodb.io/docs/current/language/types.html#khyperloglog Reviewed By: kagamiori Differential Revision: D81148399 fbshipit-source-id: de6b8c9491ebbd9f7380f846eee0a2b92eb7cece
…bookincubator#14706) Summary: Previously the actor (the workflow initiator) was used to try and determine the merge base commit sha. However, a user pushing to a different repo becomes the owner and $OWNER:$HEAD_REF does not exist in that scenario. Instead, try to use the PR creator name. We’ve seen HTTP 404 from GH CLI when the actor does not match the repo name. Pull Request resolved: facebookincubator#14706 Reviewed By: mbasmanova Differential Revision: D81922077 Pulled By: Yuhta fbshipit-source-id: d2c81f38e0b37b62288a2b797b8954950a7e9fd0
…kincubator#14791) Summary: Pull Request resolved: facebookincubator#14791 There are cases where RowVector may be missing subfields (particularly in the dwio field reader usage when a field is not projected). Previous setType() implementation doesn't handle missing subfield case. Reviewed By: Yuhta Differential Revision: D81987618 fbshipit-source-id: 8e8350120d598953b7a547fe7778cac847a84ef2
Summary: Pull Request resolved: facebookincubator#14806 Part of facebookincubator#14802 Reviewed By: pansatadru Differential Revision: D82057681 fbshipit-source-id: eb6afb55c52f8cdd324c316e944fe07ed0eae5c1
…14545) Summary: Fix facebookincubator#14530 Pull Request resolved: facebookincubator#14545 Test Plan: Imported from GitHub, without a `Test Plan:` line. Rollback Plan: Reviewed By: kagamiori Differential Revision: D81503154 Pulled By: peterenescu fbshipit-source-id: 953440f510e839c9a0627beba7964e2ab88f0374
Summary: X-link: facebookexperimental/verax#382 Pull Request resolved: facebookincubator#14809 Fixes facebookincubator#14802 Reviewed By: pansatadru, xiaoxmeng Differential Revision: D82071237 fbshipit-source-id: c78e3b634e0c08575a3bdf50bb5b0256472c1cff
…14813) Summary: Pull Request resolved: facebookincubator#14813 Add IndexLookupJoinBuilder into PlanBuilder.cc to unify the way of using build style to make code more consistent with other node builder such as table scan. Reviewed By: xiaoxmeng Differential Revision: D82078011 fbshipit-source-id: c542c57b4f4de6a1d83cf7571ba9b1d1fcf778af
…large projections (facebookincubator#14403) Summary: …and keep it similar for small. Memory overhead is just additional ~16.125 bytes per type in row. Also I don't think lazy is good here, but I made it this way because it was TODO in previous attempt It will be nice if exists some benchmarks for this Context: facebookexperimental/verax#118 (comment) Pull Request resolved: facebookincubator#14403 Reviewed By: mbasmanova Differential Revision: D81515241 Pulled By: bikramSingh91 fbshipit-source-id: bc127521757550161fe6703a373b68a44a57df14
…incubator#14818) Summary: Pull Request resolved: facebookincubator#14818 Continuation of facebookincubator#14784 bypass-github-export-checks Reviewed By: amitkdutta Differential Revision: D82104883 fbshipit-source-id: dccb98143c27c1c8f5183de522d5a9e6025eeb84
Summary: Pull Request resolved: facebookincubator#14253 Adds `geometry_to_bing_tiles` UDF to velox. Also uses namespace `functions::geospatial` in `BingTileType.cpp` to reduce verbosity from frequent usage of geospatial constants. Reviewed By: jagill, Yuhta Differential Revision: D78950042 fbshipit-source-id: 07c362d4595e6d976b98c726afa29302656c8f9a
…t-connector-refactor
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A copy of facebookincubator#14294
Depends on #49