Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1556467: Querying table works from Snowflake UI, but _fetch_pandas_all hangs and crashes kernel #2007

Open
giacomo-mason opened this issue Jul 25, 2024 · 3 comments
Assignees
Labels
bug status-information_needed Additional information is required from the reporter status-triage Issue is under initial triage

Comments

@giacomo-mason
Copy link

Python version

Python 3.10.2 (main, Apr 4 2022, 11:53:00) [Clang 13.1.6 (clang-1316.0.21.2)]

Operating system and processor architecture

macOS-14.5-arm64-arm-64bit

Installed packages

anyio==4.4.0
appdirs==1.4.4
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asn1crypto==1.5.1
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
autopep8==1.6.0
Babel==2.15.0
backcall==0.2.0
beautifulsoup4==4.12.3
black==22.12.0
bleach==6.1.0
bytecode==0.15.1
cattrs==23.2.3
certifi==2024.7.4
cffi==1.16.0
cfgv==3.4.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
colorama==0.4.6
comm==0.2.2
contourpy==1.1.1
cryptography==42.0.8
cycler==0.12.1
Cython==3.0.0a10
ddsketch==3.0.1
ddtrace==1.19.0
debugpy==1.8.2
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.14
diff-cover==7.7.0
distlib==0.3.8
envier==0.5.1
exceptiongroup==1.2.2
executing==2.0.1
fastjsonschema==2.20.0
filelock==3.15.4
flake8==4.0.1
fonttools==4.53.1
fqdn==1.5.1
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
identify==2.6.0
idna==3.7
importlib_metadata==8.1.0
iniconfig==2.0.0
ipykernel==6.29.5
ipython==8.12.3
ipywidgets==8.1.3
isoduration==20.11.0
isort==5.13.2
jaraco.classes==3.4.0
jedi==0.19.1
Jinja2==3.1.4
joblib==1.4.2
json5==0.9.25
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.2
jupyter_core==5.7.2
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.4
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==3.0.11
keyring==24.3.1
kiwisolver==1.4.5
llvmlite==0.41.1
MarkupSafe==2.1.5
matplotlib==3.7.5
matplotlib-inline==0.1.7
mccabe==0.6.1
mistune==3.0.2
more-itertools==10.3.0
mypy-extensions==1.0.0
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nbqa==1.8.5
nest-asyncio==1.6.0
nodeenv==1.9.1
notebook==7.2.1
notebook_shim==0.2.4
numba==0.58.1
numexpr==2.10.0
numpy==1.24.4
opentelemetry-api==1.16.0
overrides==7.7.0
packaging==24.1
pandas==2.0.3
pandocfilters==1.5.1
parso==0.8.4
pathspec==0.12.1
patsy==0.5.6
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.4.0
platformdirs==4.2.2
pluggy==1.5.0
pre-commit==3.5.0
prometheus_client==0.20.0
prompt_toolkit==3.0.47
protobuf==5.27.0
psutil==6.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==17.0.0
pycodestyle==2.8.0
pycparser==2.22
pyflakes==2.4.0
Pygments==2.18.0
PyJWT==2.8.0
pyOpenSSL==24.2.1
pyparsing==3.1.2
pytest==8.3.1
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.3
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.19.0
scikit-learn==1.3.2
scipy==1.9.3
seaborn==0.13.2
Send2Trash==1.8.3
shap==0.44.1
six==1.16.0
slicer==0.0.7
sniffio==1.3.1
snowflake-connector-python==3.12.0
sortedcontainers==2.4.0
soupsieve==2.5
sqlfluff==2.3.5
stack-data==0.6.3
statsmodels==0.14.1
tabulate==0.9.0
tblib==3.0.0
terminado==0.18.1
threadpoolctl==3.5.0
tinycss2==1.3.0
tokenize-rt==5.2.0
toml==0.10.2
tomli==2.0.1
tomlkit==0.13.0
tornado==6.4.1
tqdm==4.66.4
traitlets==5.14.3
types-python-dateutil==2.9.0.20240316
typing_extensions==4.12.2
tzdata==2024.1
uri-template==1.3.0
urllib3==1.26.19
virtualenv==20.26.3
wcwidth==0.2.13
webcolors==24.6.0
webencodings==0.5.1
websocket-client==1.8.0
widgetsnbextension==4.0.11
wrapt==1.16.0
xgboost==2.1.0
xmltodict==0.13.0
zipp==3.19.2

What did you do?

import snowflake.connector
import logging
import pandas as pd

import logging
import os

for logger_name in ('snowflake.connector',):
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)

# Connect to Snowflake
conn = snowflake.connector.connect(
    user="***",
    password='',
    account='***',
    warehouse='***',
    authenticator='externalbrowser'
)

# Create a cursor object
cursor = conn.cursor()

# Execute the query
cursor.execute("SELECT * FROM my_table")

# Fetch all the results into a pandas DataFrame
df = cursor.fetch_pandas_all()

# Close the cursor and connection
cursor.close()
conn.close()

# Print the DataFrame
df.head()

What did you expect to see?

I have a relatively large Snowflake table (4,155,216 rows and 177 columns).

I want to pull the entire table into a Pandas dataframe.

From the Snowflake UI, I can successfully do

SELECT * FROM my_table;

When running the same query from a Jupyter notebook (see above), I was expecting the df to contain the data from the table.

Instead, the script runs for a bit and then hangs. The Python kernel dies and needs to be restarted.

I get the same error with anything but the smallest sample from that table. For example, a LIMIT 1000 works fine, but a LIMIT 10000 runs into the same issue.

I attach the debug logs from the code above (without the initial part of the logs to remove confidential information).

connector_logs2.txt

Can you set logging to DEBUG and collect the logs?

import logging
import os

for logger_name in ('snowflake.connector',):
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)
@github-actions github-actions bot changed the title Querying table works from Snowflake UI, but _fetch_pandas_all hangs and crashes kernel SNOW-1556467: Querying table works from Snowflake UI, but _fetch_pandas_all hangs and crashes kernel Jul 25, 2024
@sfc-gh-yixie
Copy link
Collaborator

@giacomo-mason did this happens in a Python Stored Proc?

@giacomo-mason
Copy link
Author

No, this happened just using the connector to pull data from snowflake into memory to use locally in a jupyter notebook

@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Dec 16, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added status-information_needed Additional information is required from the reporter status-triage Issue is under initial triage and removed needs triage labels Dec 16, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hi - first of all, apologies we couldn't get here sooner.
second - to narrow this issue down a bit whether it's related to this library or not, could you please execute the same query outside of Jupyter notebook (e.g. in an IDE, or standalone Python script, etc) and see if it crashes?
Thank you in advance !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug status-information_needed Additional information is required from the reporter status-triage Issue is under initial triage
Projects
None yet
Development

No branches or pull requests

3 participants