-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] SparkMagic pyspark kernel magic(%%sql) hangs when running with Papermill. #833
Comments
🤖 Hi there, Thank you for the detailed report. This seems to be a complex issue related to the interaction between SparkMagic, Livy, and Papermill. From your description, it appears that the When you run a cell with SparkMagic in JupyterLab, it sends a request to Livy and then polls for the result. This works fine in an interactive environment like JupyterLab, but when you're running a notebook non-interactively with Papermill, it might not wait long enough for the result to come back from Livy before moving on to the next cell. You might want to look into adjusting the Livy session timeout or the polling interval in SparkMagic's configuration to see if that helps. However, please note that this is a common issue with running SparkMagic in non-interactive environments and there might not be a straightforward solution. As for the codebase, SparkMagic handles the execution of The SQLQuery object has an I hope this information helps. If you have any further questions or need more clarification, please let me know. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Thanks Dosu for the quick reply. I actually tried to troubleshoot by adding more log in command.py. Ideally each retry, there will be one line added into the notebook cell output area. I noticed that sometimes, the notebook cell does not have all the retry logs before the statement reaching from running to available state. This might align with what you mentioned on the timing issue. But when I changed the configurable_retry_policy_max_retries, it does not help and I saw similar behavior. Not fully understand how Papermill behaves for this case.
It did not move to the next cell actually and got stuck in the %%sql magic cell. |
🤖 Hello, Thank you for providing a detailed description of the issue you're experiencing. This seems to be related to the asynchronous nature of the SparkMagic To resolve this issue, you can try to modify the SparkMagic Here's how you can modify the def execute_sqlquery(
self,
cell,
samplemethod,
maxrows,
samplefraction,
session,
output_var,
quiet,
coerce,
):
sqlquery = self._sqlquery(cell, samplemethod, maxrows, samplefraction, coerce)
df = self.spark_controller.run_sqlquery(sqlquery, session)
# Add this line to wait for the Spark job to finish
self.spark_controller.wait_for_idle()
if output_var is not None:
self.shell.user_ns[output_var] = df
if quiet:
return None
else:
return df Please note that this modification will make the I hope this helps! Let me know if you have any other questions or if there's anything else I can do to assist you. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Awesome! Thanks a lot. |
Hi Dosu,
Given spark_controller does not have wait_for_idle, I made change to command.py. Between line 57 and 58, I added following lines:
Also I added log for the session status.
I can observe session was waiting for some time, then there was no further log. In the Papermill backend, I see more log until idle again. In output notebook:In the papermill log, the session reaches to idle.Seems papermill does not wait until the wait_for_idle to complete. Any suggestions? |
🤖 Hello, Thank you for providing detailed information about the issue you're experiencing. It seems like the problem might be due to the asynchronous nature of the To resolve this issue, you can modify the Here's an example of how you can modify the def execute(self, session):
self._spark_events.emit_sql_execution_start_event(
session.guid,
session.kind,
session.id,
self.guid,
self.samplemethod,
self.maxrows,
self.samplefraction,
)
command_guid = ""
try:
command = self.to_command(session.kind, session.sql_context_variable_name)
command_guid = command.guid
(success, records_text, mimetype) = command.execute(session)
# Poll the status of the command until it's done
while True:
statement = session.http_client.get_statement(session.id, command_guid)
if statement['state'] in ['available', 'error']:
break
time.sleep(1) # Wait for a second before polling the status again
if not success:
raise BadUserDataException(records_text)
result = records_to_dataframe(records_text, session.kind, self._coerce)
except Exception as e:
self._spark_events.emit_sql_execution_end_event(
session.guid,
session.kind,
session.id,
self.guid,
command_guid,
False,
e.__class__.__name__,
str(e),
)
raise
else:
self._spark_events.emit_sql_execution_end_event(
session.guid,
session.kind,
session.id,
self.guid,
command_guid,
True,
"",
"",
)
return result This modification will make the Please note that this is a simplified example and the actual implementation might need to be adjusted based on the specifics of your Spark cluster and the rest of your code. For example, you might want to add error handling for cases where the Spark job fails or takes too long to complete. I hope this helps! Let me know if you have any other questions or need further clarification. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Thanks Dosu. While executing with papermill, it does not show the log in the notebook cell output. Seems the cell output update stopped some time later and subsequent cell did not run as well.
Can you share how asynchronous works? It would be greate if you can share the code using the asynchronous mode to call livy API. I am trying to understand the execution pattern difference of %%sql magic. Following piece works. I am think if we can use same call pattern to run %%sql logic.
Execution log: |
🤖 Hello, Thank you for providing a detailed description of your issue. It seems like you're experiencing a problem with the asynchronous execution of the The As for the asynchronous execution in SparkMagic, it is handled by sending commands to the Livy server and waiting for the result. The actual execution of the commands is done asynchronously on the Livy server, allowing the Jupyter notebook to continue with other tasks. Now, regarding your issue with Papermill, it seems like Papermill is not waiting for the One possible solution could be to increase the cell execution timeout in Papermill. You can do this by passing the papermill pm_sparkmagic_test.ipynb output1.ipynb --kernel pysparkkernel --log-level DEBUG --request-timeout 600 This will set the cell execution timeout to 600 seconds (10 minutes). You can adjust this value as needed. I hope this helps! If you have any other questions or need further clarification, please let me know. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
request-timeout is not supported by papermill. execution-timeout is supported and tried that but not work.
Can you share how the pyspark code is executed? Is that following async mode?
|
Hi @edwardps, glad Dosu (the bot) was able to help you narrow down the cause. I believe the bot is correct that the issue has something to do with the async execution of the cell magic. I actually encountered this issue when implementing ipykernel >= 6 support and left comment on related papermill issue; however, the final implementation fixed the papermill issue for me 🤔 Maybe try changing your |
Thanks @devstein. I will give a try and get back soon. BTW, Dosu works awesome! |
🤖 Hello, The On the other hand, direct PySpark code gives you more control over the execution process. You can create a SparkSession, execute transformations and actions on RDDs/DataFrames, and manage resources manually. To modify the However, modifying the SparkMagic codebase might not be the best solution to resolve the hanging issue. Instead, you could try the following:
I hope this helps! Let me know if you have any other questions. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
@edwardps Sounds good, let me know.
Thank you! Let me know if you have other open source projects that could benefit from it. ^I'm still teaching it when it should and shouldn't respond 😅 |
FYI - @devstein
Downgraded ipykernel to 5.5.6(ipykernel has no v7) and did not get any luck.
Is it possible to have %%sql following the call pattern used by pyspark code execution? (as the pyspark code cell can work properly) |
FYI - @devstein If comment out the follow two lines, it works.
From my observation, the plotly based visualization will be disabled. |
Thanks @edwardps that's helpful...that means it's failing to execute this code while the kernel is getting created. I ran into this issue when I tried to make the Is there any difference in behavior if the first cell isn't For example
|
|
I initially reported this as a papermill issue(not quite sure about this). I am copying that issue to SparkMagic community to see if there happen to be any expert who can provide advice for unblocking. Please feel free to close if this is not SparkMagic issue. Thanks in advance.
Describe the bug
Our use case is to use SparkMagic wrapper kernels with PaperMill notebook execution.
Most of the functions are working as expected except the %%sql magic, which will get stuck during execution. The SparkMagic works properly when executed in interactive mode in JupyterLab and issue only happens for %%sql magic when running with PaperMill.
From the debugging log(attached), I can see the %%sql logic had been executed and response was retrieved back. The execution state was back to idle at the end. But the output of %%sql cell was not updated properly and the following cells were not executed.
Following content was printed by PaperMill, which shows the %%sql has been executed properly. This content was not rendered into cell output.
To Reproduce
Following is package list which might be highly related. I also attached one text contains all the packages.
Expected behavior
The %%sql should not hang and following cell should proceed for execution.
Screenshots
Output notebook of papermill:
Expected output(from JupyterLab)
Versions:
Additional context
log and other files.zip contains:
The text was updated successfully, but these errors were encountered: