You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched in the issues and found no similar issues.
Describe the bug
Hi,
Using kyuubi 1.10.1 with spark 3.5.2 seems like it has a regression from kyuubi with spark 3.4.4. I have a view with a row filter and then when querying the view as 2 subqueries of itself I get the error mentioned in the engine log.
I was able to get this minimally reproducable using the source tag v1.10.1 and doing a default build of kyuubi with ranger running in docker.
To reproduce the error you have to create tables from these zipped parquet files:
createtableif not exists Album
USING org.apache.spark.sql.parquet
OPTIONS (
path ("/tmp/chinook/alb.parquet")
);
createtableif not exists Artist
USING org.apache.spark.sql.parquet
OPTIONS (
path ("/tmp/chinook/art.parquet")
);
createtableif not exists Track
USING org.apache.spark.sql.parquet
OPTIONS (
path ("/tmp/chinook/trk.parquet")
);
Then a row filter should be added to ranger like so:
The query that causes the error is this:
SELECTT0.C1, T1.F1FROM (
selecta.TrackName C1 from myview a
) T0
LEFT OUTER JOIN (
selectb.TrackName F1 from myview b
) T1 ONT0.C1=T1.F1
Strange thing is that changing the case of a single character in the second subquery then makes the query work:
SELECTT0.C1, T1.F1FROM (
selecta.TrackName C1 from myview a
) T0
LEFT OUTER JOIN (
selectb.TrackName F1 from Myview b
) T1 ONT0.C1=T1.F1
Unfortunately, I do not have control over this.
I tested in our k8s environment against spark 3.4.4 and the issue does not occur. I have not yet tested against a local build for spark 3.4. I will provide those details once the build completes
Affects Version(s)
1.10.1
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
org.apache.spark.sql.AnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] Resolved attribute(s) "TrackName" missing from "ArtistId","ArtistName","AlbumId","AlbumTitle","TrackId","TrackName" in operator !Project [TrackName#331ASF1#319].Attribute(s) with the same name appear in the operation:"TrackName".Please check if the right attribute(s) are used.; line 1 pos 88;Project [C1#318,F1#319]
+-JoinLeftOuter, (C1#318=F1#319)
:-SubqueryAliasT0:+-Project [TrackName#331ASC1#318]
:+-SubqueryAlias a
:+-SubqueryAlias spark_catalog.default.myview
:+-Filter (albumid#328L = cast(117 as bigint))
:+-RowFilterMarker:+-PermanentViewMarker:+-View (`spark_catalog`.`default`.`myview`, [ArtistId#326L,ArtistName#327,AlbumId#328L,AlbumTitle#329,TrackId#330L,TrackName#331])
:+-Project [cast(ArtistId#320L as bigint) ASArtistId#326L, cast(ArtistName#321 as string) ASArtistName#327, cast(AlbumId#322L as bigint) ASAlbumId#328L, cast(AlbumTitle#323 as string) ASAlbumTitle#329, cast(TrackId#324L as bigint) ASTrackId#330L, cast(TrackName#325 as string) ASTrackName#331]
:+-Project [ArtistId#91L ASArtistId#320L,Name#92ASArtistName#321,AlbumId#88L ASAlbumId#322L,Title#89ASAlbumTitle#323,TrackId#93L ASTrackId#324L,Name#94ASTrackName#325]
:+-JoinLeftOuter, (AlbumId#95L =AlbumId#88L)
::-JoinLeftOuter, (ArtistId#90L =ArtistId#91L)
:::-SubqueryAliasE95675:::+-SubqueryAlias spark_catalog.default.album
:::+-Relation spark_catalog.default.album[AlbumId#88L,Title#89,ArtistId#90L] parquet
::+-SubqueryAliasE95676::+-SubqueryAlias spark_catalog.default.artist
::+-Relation spark_catalog.default.artist[ArtistId#91L,Name#92] parquet
:+-SubqueryAliasE95685:+-SubqueryAlias spark_catalog.default.track
:+-Relation spark_catalog.default.track[TrackId#93L,Name#94,AlbumId#95L,MediaTypeId#96L,GenreId#97L,Composer#98,Milliseconds#99L,Bytes#100L,UnitPrice#101] parquet
+-SubqueryAliasT1+-!Project [TrackName#331ASF1#319]
+-SubqueryAlias b
+-SubqueryAlias spark_catalog.default.myview
+-Filter (albumid#348L = cast(117 as bigint))
+-RowFilterMarker+-PermanentViewMarker+-Project [cast(ArtistId#326L as bigint) ASArtistId#346L, cast(ArtistName#327 as string) ASArtistName#347, cast(AlbumId#328L as bigint) ASAlbumId#348L, cast(AlbumTitle#329 as string) ASAlbumTitle#349, cast(TrackId#330L as bigint) ASTrackId#350L, cast(TrackName#331 as string) ASTrackName#351]
+-View (`spark_catalog`.`default`.`myview`, [ArtistId#326L,ArtistName#327,AlbumId#328L,AlbumTitle#329,TrackId#330L,TrackName#331])
+-Project [cast(ArtistId#320L as bigint) ASArtistId#326L, cast(ArtistName#321 as string) ASArtistName#327, cast(AlbumId#322L as bigint) ASAlbumId#328L, cast(AlbumTitle#323 as string) ASAlbumTitle#329, cast(TrackId#324L as bigint) ASTrackId#330L, cast(TrackName#325 as string) ASTrackName#331]
+-Project [ArtistId#335L ASArtistId#320L,Name#336ASArtistName#321,AlbumId#332L ASAlbumId#322L,Title#333ASAlbumTitle#323,TrackId#337L ASTrackId#324L,Name#338ASTrackName#325]
+-JoinLeftOuter, (AlbumId#339L =AlbumId#332L)
:-JoinLeftOuter, (ArtistId#334L =ArtistId#335L)
::-SubqueryAliasE95675::+-SubqueryAlias spark_catalog.default.album
::+-Relation spark_catalog.default.album[AlbumId#332L,Title#333,ArtistId#334L] parquet
:+-SubqueryAliasE95676:+-SubqueryAlias spark_catalog.default.artist
:+-Relation spark_catalog.default.artist[ArtistId#335L,Name#336] parquet
+-SubqueryAliasE95685+-SubqueryAlias spark_catalog.default.track
+-Relation spark_catalog.default.track[TrackId#337L,Name#338,AlbumId#339L,MediaTypeId#340L,GenreId#341L,Composer#342,Milliseconds#343L,Bytes#344L,UnitPrice#345] parquet
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:711)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:215)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:215)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:197)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:202)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:193)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:171)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:202)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:225)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:222)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:90)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:174)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:158)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:85)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:113)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Kyuubi Server Configurations
## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements. See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License. You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.### Kyuubi Configurations## kyuubi.authentication NONE#kyuubi.frontend.bind.host 0.0.0.0# kyuubi.frontend.protocols THRIFT_BINARY,REST# kyuubi.frontend.thrift.binary.bind.port 10009# kyuubi.frontend.rest.bind.port 10099## kyuubi.engine.type SPARK_SQL# kyuubi.engine.share.level USER# kyuubi.session.engine.initialize.timeout PT3M#kyuubi.ha.addresses localhost:2181# kyuubi.ha.namespace kyuubi## Details in https://kyuubi.readthedocs.io/en/master/configuration/settings.htmlspark.sql.extensions=org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
spark.executor.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/apiguardian-api-1.1.2.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/gethostname4j-1.0.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-annotations-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-core-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-databind-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-base-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jersey-bundle-1.19.4.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-platform-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar
# spark.executor.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar
spark.driver.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/apiguardian-api-1.1.2.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/gethostname4j-1.0.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-annotations-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-core-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-databind-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-base-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jersey-bundle-1.19.4.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jna-platform-5.13.0.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar
# spark.driver.extraClassPath=/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/kyuubi-spark-authz-shaded_2.12-1.10.1.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-1.9.13.jar:/home/me/git/hub/kyuubi/extensions/spark/kyuubi-spark-authz-shaded/target/jackson-jaxrs-json-provider-2.15.0.jar
Kyuubi Engine Configurations
## ranger-spark-security.xml<configuration><property><name>ranger.plugin.spark.policy.rest.url</name><value>http://localhost:6080</value></property><property><name>ranger.plugin.spark.service.name</name><value>spark</value></property><property><name>ranger.plugin.spark.policy.cache.dir</name><value>/tmp/policycache</value></property><property><name>ranger.plugin.spark.policy.pollIntervalMs</name><value>1000</value></property><property><name>ranger.plugin.spark.policy.source.impl</name><value>org.apache.ranger.admin.client.RangerAdminRESTClient</value></property><property><name>ranger.plugin.spark.enable.implicit.userstore.enricher</name><value>true</value><description>Enable UserStoreEnricher for fetching user and group attributes if using macros or scripts in row-filters since Ranger 2.3</description></property><property><name>ranger.plugin.hive.policy.cache.dir</name><value>/tmp/policycache</value><description>As Authz plugin reuses hive service def, a policy cache path is required for caching UserStore and Tags for "hive" service def, while "ranger.plugin.spark.policy.cache.dir config" is the path for caching policies in service. </description></property></configuration>
Additional context
No response
Are you willing to submit PR?
Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
No. I cannot submit a PR at this time.
The text was updated successfully, but these errors were encountered:
Just did a build for spark 3.4 with ./build/mvn clean package -Pspark-3.4 -DskipTests and can confirm the error does not happen
lanklaas
changed the title
AuthZ RowFilter causes org.apache.spark.sql.AnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] in spark 3.5
[Bug] AuthZ RowFilter causes org.apache.spark.sql.AnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION] in spark 3.5
Jan 13, 2025
Code of Conduct
Search before asking
Describe the bug
Hi,
Using kyuubi
1.10.1
with spark3.5.2
seems like it has a regression from kyuubi with spark3.4.4
. I have a view with a row filter and then when querying the view as 2 subqueries of itself I get the error mentioned in the engine log.I was able to get this minimally reproducable using the source tag
v1.10.1
and doing a default build of kyuubi with ranger running in docker.To reproduce the error you have to create tables from these zipped parquet files:
test-data.zip
Here is the SQL to create the tables:
Then you create a view on top of these tables:
Then a row filter should be added to ranger like so:
The query that causes the error is this:
Strange thing is that changing the case of a single character in the second subquery then makes the query work:
Unfortunately, I do not have control over this.
I tested in our k8s environment against spark
3.4.4
and the issue does not occur. I have not yet tested against a local build for spark 3.4. I will provide those details once the build completesAffects Version(s)
1.10.1
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
Kyuubi Server Configurations
Kyuubi Engine Configurations
Additional context
No response
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: