Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock issue when a large dataset is written to a Database using Spark CarbonJDBC provider #1931

Open
chanaka3d opened this issue Apr 6, 2021 · 1 comment

Comments

@chanaka3d
Copy link

Description:
Deadlock issue that happens for the API_REQ_USER_BROW_SUMMARY table when a large dataset is written to a Database using Spark CarbonJDBC provider with more than 1 executor in a worker.

@maneeshaheshan
Copy link

We can see the following WARN log with Postgres..

TID: [-1] [] [2021-02-14 11:15:21,183] WARN

{org.apache.spark.scheduler.TaskSetManager} - Lost task 1.0 in stage 22589.0 (TID 53615, localhost): java.sql.BatchUpdException: Batch entry 80 INSERT INTO API_EXE_TIME_MIN_SUMMARY (api, version, tenantDomain, apiPublisher, apiResponseTime, context, securityLatency, throttlingLatency, requestMedianLatency, responseMediationLatency, backendLatency, otherLatency, year, month, day, hour, minutes, time) VALUES ('XXX', 'v1', 'carbon.super', 'admin@carbon', 63052, '/callbacks/v1', 11, 0, 13, 0, 0, 0, 2017, 5, 17, 8, 31, 1495024319999) ON CONFLICT (api,version,tenantDomain,apiPublisher,context,year,month,day,hour,minutes) DO UPDATET apiResponseTime=EXCLUDED.apiResponseTime, securityLatency=EXCLUDED.securityLatency, throttlingLatency=EXCLUDED.throttlingLatency, requestMediationLatency=EXCLUDED.requestMediatiatency, responseMediationLatency=EXCLUDED.responseMediationLatency, backendLatency=EXCLUDED.backendLatency, otherLatency=EXCLUDED.otherLatency, time=EXCLUDED.time was aborted: ERROdeadlock detected
Detail: Process 6742 waits for ShareLock on transaction 1157337689; blocked by process 6740.
Process 6740 waits for ShareLock on transaction 1157337691; blocked by process 6742.
Hint: See server log for query details.
Where: while inserting index tuple (1493,33) in relation "api_exe_time_min_summary" Call getNextException to see other errors in the batch.
at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:145)
at org.postgresql.core.ResultHandlerDelegate.handleError(ResultHandlerDelegate.java:50)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2126)
at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1261)
at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1286)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:455)
at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:791)
at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1547)
at org.apache.spark.sql.jdbc.carbon.package$CarbonJDBCWrite$.savePartition(carbon.scala:149)
at org.apache.spark.sql.jdbc.carbon.package$CarbonJDBCWrite$$anonfun$saveTable$1.apply(carbon.scala:72)
at org.apache.spark.sql.jdbc.carbon.package$CarbonJDBCWrite$$anonfun$saveTable$1.apply(carbon.scala:71)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
Detail: Process 6742 waits for ShareLock on transaction 1157337689; blocked by process 6740.
Process 6740 waits for ShareLock on transaction 1157337691; blocked by process 6742.
Hint: See server log for query details.
Where: while inserting index tuple (1493,33) in relation "api_exe_time_min_summary"
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2412)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2125)
... 18 more
{org.apache.spark.scheduler.TaskSetManager}
TID: [-1234] [] [2017-11-14 11:15:26,079] INFO

{org.wso2.carbon.event.output.adapter.logger.LoggerEventAdapter}
- Unique ID: request-logger-publisher,
Event: meta_clientType:external,

Also we see the same error in mysql

TID: [-1] [] [2017-11-07 21:28:35,676] ERROR

{org.apache.spark.scheduler.TaskSetManager} - Task 15 in stage 277735.0 failed 4 times; aborting job {org.apache.spark.scheduler.TaskSetManager}
TID: [-1234] [] [2017-11-07 21:28:35,678] ERROR

{org.apache.spark.sql.jdbc.carbon.CarbonJDBCRelation}
- Error while saving data to the table API_REQ_GEO_LOC_SUMMARY : Job aborted due to stage failure: Task 15 in stage 277735.0 failed 4 times, most recent failure: Lost task 15.3 in stage 277735.0 (TID 238889, localhost): java.sql.BatchUpdateException: Deadlock found when trying to get lock; try restarting transaction
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1805)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277)
at org.apache.spark.sql.jdbc.carbon.package$CarbonJDBCWrite$.savePartition(carbon.scala:149)
at org.apache.spark.sql.jdbc.carbon.package$CarbonJDBCWrite$$anonfun$saveTable$1.apply(carbon.scala:72)
at org.apache.spark.sql.jdbc.carbon.package$CarbonJDBCWrite$$anonfun$saveTable$1.apply(carbon.scala:71)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at sun.reflect.GeneratedConstructorAccessor126.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:985)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2141)
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1773)
... 14 more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants