Skip to content

BUG: pd.read_sql returns fewer rows than manual cursor execution for the same query and connection #63574

@ayazurrashid

Description

@ayazurrashid

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# pandas version 1.4.1
import pandas as pd
from django.db import connection
from datetime import date
month = date.today().month
year = date.today().year

query = f"""
SELECT 
id AS ID,
CASE 
WHEN MAX(CASE WHEN site = 'A' THEN rating END) IS NULL OR MAX(CASE WHEN site = 'A' THEN rating END) <= 0
THEN '' 
ELSE MAX(CASE WHEN site = 'A' THEN rating END) 
END AS A_RATING,
CASE 
WHEN MAX(CASE WHEN site = 'A' THEN reviews END) IS NULL OR MAX(CASE WHEN site = 'A' THEN reviews END) <= 0
THEN '' 
ELSE MAX(CASE WHEN site = 'A' THEN FORMAT(IFNULL(reviews, ''), 0) END)
END AS A_REVIEWS,
MAX(CASE WHEN site = 'A' THEN slug END) AS A_URL,
CASE 
WHEN MAX(CASE WHEN site = 'B' THEN rating END) IS NULL OR MAX(CASE WHEN site = 'B' THEN rating END) <= 0
THEN '' 
ELSE MAX(CASE WHEN site = 'B' THEN rating END) 
END AS AR_RATING,
CASE 
WHEN MAX(CASE WHEN site = 'B' THEN reviews END) IS NULL OR MAX(CASE WHEN site = 'B' THEN reviews END) <= 0
THEN '' 
ELSE MAX(CASE WHEN site = 'B' THEN FORMAT(IFNULL(reviews, ''), 0) END)
END AS AR_REVIEWS, 
MAX(CASE WHEN site = 'B' THEN slug END) AS AR_URL 
FROM PropertyReviewInsights WHERE YEAR(added_on) = {year} AND MONTH(added_on) = {month} 
GROUP BY id ORDER BY id;
"""
df = pd.read_sql(query, connection)
file_name = f"ABC.csv"
file_path = f'/srv/{file_name}'
df.to_csv(file_path, index=False)
connection.close()

Issue Description

I am observing a discrepancy between the number of rows returned by pd.read_sql() and a manual DB-API cursor execution using the same SQL query and the same database connection.

Using a manual MySQL cursor execution, the query returns 117,464 rows

Using pd.read_sql(), the same query returns 116,383 rows

In addition to the reduced row count, I also observed that for approximately 5,000 rows, some columns are returned as blank / NULL values when using pd.read_sql(), whereas those same rows and columns contain valid data when fetched via the manual cursor approach.

The manual cursor approach consistently returns the full dataset with all columns populated correctly, while pd.read_sql() returns fewer rows and partially blank data without raising any error or warning.

This behavior is unexpected and may indicate an issue with how pandas iterates over or buffers database cursors, or how it handles row fetching and type conversion when interfacing with MySQL drivers.

Expected Behavior

pd.read_sql() should return all rows produced by the SQL query when using the same database connection and query as a manual DB-API cursor execution.

Specifically:

The row count returned by pd.read_sql() should exactly match the row count returned by a manual cursor.execute() followed by fetchall()

No rows should be dropped or skipped

No column values should be returned as blank or NULL if the underlying database row contains valid data

Any issues during data fetching, cursor iteration, or type conversion should raise a clear error or warning, rather than silently returning incomplete data

In this case, both approaches are expected to return 117,464 rows with fully populated columns, matching the database results exactly.

Installed Versions

Details

INSTALLED VERSIONS

commit : 06d2301
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-216-generic
Version : #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.1
numpy : 1.22.2
pytz : 2021.3
dateutil : 2.8.2
pip : 20.0.2
setuptools : 44.0.0
Cython : None
pytest : 7.0.1
hypothesis : None
sphinx : 4.4.0
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.7.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.6.0
pandas_datareader: None
bs4 : 4.13.4
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions