Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipping the timestamp check for permission failures #877

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pallavia7
Copy link
Collaborator

@pallavia7 pallavia7 commented Nov 18, 2024

Summary

Moved the permission check before timestamp check. If permission fails, skipped the timestamp check

Why / Goal

User reported an issue in analyzer: the job failed open when users don't have permission to certain Hive tables. The correct behavior should be that any table permission issues should be caught in the analyzer step.
The root cause is that the timestamp check is enabled by default, and it runs before the table permission check. Since timestamp check requires accessing data, it failed open.

Test Plan

[+ ] Added Unit Tests

Checklist

  • Documentation update

Reviewers


@pengyu-hou
Copy link
Collaborator

@pallavia7 you could run sbt scalafmt for the CI failure.

@pallavia7 pallavia7 force-pushed the pallavi--analyzer-permvalidation branch from 9eb30a1 to b9e9f72 Compare November 18, 2024 21:51
if (!skipTimestampCheck) {
val gbTables = ListBuffer[String]()
joinConf.joinParts.toScala.foreach { part =>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


// join parts
val joinPart = Builders.GroupBy(
sources = Seq(getTestGBSourceWithTs(namespace=namespace)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure how access denial is replayed here. Could you explain? Thanks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used Mockito.spy on tableUtils objects. when(tableUtils.checkTablePermission(any(), any())).thenReturn(false) modifies checkPermission behavior. So when tableUtils.checkTablePermission is invoked, the method returns false which happens when there is no access to table.

@pallavia7 pallavia7 force-pushed the pallavi--analyzer-permvalidation branch from b9e9f72 to d582acf Compare November 19, 2024 20:57
runTablePermissionValidation((gbTables.toList ++ List(joinConf.left.table)).toSet)
} else Set()

if (!skipTimestampCheck && noAccessTables.isEmpty) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we print a warning error if we are skipping the timestamp check because there are tables with permission issues?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added warning message

def testJoinAnalyzerInvalidTablePermissions(): Unit = {
val spark: SparkSession = SparkSessionBuilder.build("AnalyzerTest" + "_" + Random.alphanumeric.take(6).mkString, local = true)
val tableUtils = spy(TableUtils(spark))
when(tableUtils.checkTablePermission(any(), any())).thenReturn(false)
Copy link
Collaborator

@hzding621 hzding621 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider mocking the tableUtils.sql to throw a runtime exception to mimic table permission issue, since the timestamp check logic will try to access the table data using tableUtils.sql, and we want to ensure that this is not triggered and gated by the permission check first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find a way to mock tableUtils.sql, as some sql queries like create etc we want to behave as existing and only certain sql statements to mock. So, I made sure that timestamp check logic function is not called by spying on mocked function with verify(analyzer, never()).runTimestampChecks(any(), any())

@pallavia7 pallavia7 force-pushed the pallavi--analyzer-permvalidation branch from d582acf to d202219 Compare January 6, 2025 18:29
Ran scalafmt

reformat

ran scalafmt
@pallavia7 pallavia7 force-pushed the pallavi--analyzer-permvalidation branch from 7ec9bdb to bd4a585 Compare January 10, 2025 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants