Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to upload entire file content to ADLS GEN2 using Python SDK #36462

Closed
nupoor01nawathey opened this issue Jul 12, 2024 · 8 comments
Closed
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Data Lake issue-addressed The Azure SDK team member assisting with this issue believes it to be addressed and ready to close. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@nupoor01nawathey
Copy link

nupoor01nawathey commented Jul 12, 2024

  • DatalakeSerivceClient:
  • 12.4.0:
  • Windows 11:
  • 3.12.3 64-bit:

Describe the bug
While uploading file to ADLS2 using Python SDK DatalakeServiceClient , complete file contents are not uploaded on the ADLS2 container. There is no error in the output. When printed file content I can see full data there, but the last 2 rows from the file are not loaded to the target ADLS2. Small file is getting uploaded properly. Not sure if there's any limitation to content length here. Checked append_data and flush_data methods but not able to understand what's the root cause.
Content length: 359254
File type that is expected to be loaded: csv

To Reproduce
Steps to reproduce the behavior:

  1. Generate a file with content length more than 359254
  2. Upload using append_data and flush_data methods with default parameters.

Expected behavior
Entire file content should be uploaded to the container / folder.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
When uploaded same file content using BlobClient it's working fine.

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files) labels Jul 12, 2024
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jalauzon-msft @vincenttran-msft.

@jalauzon-msft
Copy link
Member

Hi @nupoor01nawathey Nupoor, thanks for reaching out. Could you please share a code snippet showing how you are constructing your client and how you are uploading your data? It is difficult for us to reproduce without a code sample. Thanks.

@pvaneck pvaneck added needs-author-feedback More information is needed from author to address the issue. and removed needs-team-attention This issue needs attention from Azure service team or SDK team labels Jul 15, 2024
Copy link

Hi @nupoor01nawathey. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@nupoor01nawathey
Copy link
Author

nupoor01nawathey commented Jul 16, 2024

Sorry for the delay. PFA for the sample code. There is no error in the upload process but somehow 2 last rows from the source file are not getting uploaded to the target file. Tried with BlobClient and it's working, the issue I'm facing is only in
DatalakeServiceClient.

upload_file_from_local_to_adls2

@github-actions github-actions bot added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. labels Jul 16, 2024
@jalauzon-msft
Copy link
Member

Hi @nupoor01nawathey Nupoor, thanks for sharing the code. Here are several suggestions to try and address the issue. If none of these work, we can look more closely.

  • Firstly, you mentioned you are using version 12.4.0? That is a pretty old version and so I would recommend updating to the latest.
  • I would try opening your files in bytes mode rather than text and providing the append_data API a bytes. You are providing us a string where we will attempt some encoding. It is better to provide the raw bytes to upload. Additionally, I would recommend using a context manager with your file operations to ensure the file is closed and all contents are flushed. Another thing you could try is passing the SDK the file stream directly. Here are some examples of both of these:
# Open and read the file into memory
with open(local_file_path, 'rb') as local_file:
    file_contents: bytes = local_file.read()

# Pass file handle directly
file_client = dir_client.create_file(file_name)
with open(local_file_path, 'rb') as local_file:
    file_client.append_data(local_file, 0, len(local_file))
    file_client.flush(len(local_file))
  • Yet another thing you could try, if you just want to upload a file, is using the upload_data API which will create the file and upload the files contents fully. If you are file is large, this can prevent having o read the whole file into memory and even offers parallelism.
with open(local_file_path, 'rb') as local_file:
    # You need to pass overwrite=True to create the file.
    file_client.upload_data(local_file, overwrite=True)

Hopefully one of these suggestions can help but please let us know if not, thanks.

@nupoor01nawathey
Copy link
Author

nupoor01nawathey commented Jul 17, 2024

@pvaneck Thanks for the quick response, the 3rd solution you provided is working fine.

with open(local_file_path, 'rb') as local_file: file_client.upload_data(local_file, overwrite=True)

@xiangyan99 xiangyan99 added the issue-addressed The Azure SDK team member assisting with this issue believes it to be addressed and ready to close. label Jul 18, 2024
Copy link

Hi @nupoor01nawathey. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@github-actions github-actions bot removed the needs-team-attention This issue needs attention from Azure service team or SDK team label Jul 18, 2024
Copy link

Hi @nupoor01nawathey, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Data Lake issue-addressed The Azure SDK team member assisting with this issue believes it to be addressed and ready to close. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

6 participants