Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'msck repair tabe ...' is failing in Spark-SQL for tables with more than 100 partitions #48

Open
YevhenKv opened this issue Aug 9, 2021 · 2 comments

Comments

@YevhenKv
Copy link

YevhenKv commented Aug 9, 2021

Hi,

We have a tables with 100> partitions. When we run 'msck repair tabe ...' in Spark-SQL it is failing with error:

21/08/09 12:32:32 ERROR BatchCreatePartitionsHelper: BatchCreatePartitions failed to create 100 out of 100 partitions.

However, if we run 'msck repair tabe ...' with hive everything works properly.

EMR: 6.3.0
Glue Data catalog: configured for hive and Spark

Found limitation in Glue API reference, but again, there is no error in hive (not sure if it's not raised or it works)
https://docs.aws.amazon.com/glue/latest/webapi/API_BatchCreatePartition.html

@swiatek25
Copy link

swiatek25 commented Jun 7, 2022

Have the same issue. @smoy Any updates on that one?

@swiatek25
Copy link

I think I found the problem:

This is default logging in case of all types of errors. Your particular case is caused by AlreadyExistsException (partition already exists). Basically could be ignored.

I guess one improvement could be warning about already exists exception when it is allowed (ifNotExists) instead of putting the error information non-contextually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants