Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1859664: Issues uploading data via PUT to s3 on driver v1.8 or above (including 1.12.1) #1279

Open
tobischo opened this issue Dec 17, 2024 · 9 comments
Assignees
Labels
question Issue is a usage/other question rather than a bug status-information_needed Additional information is required from the reporter status-triage Issue is under initial triage

Comments

@tobischo
Copy link

  1. What version of GO driver are you using?
    1.12.1

  2. What operating system and processor architecture are you using?
    debian Linux, x86

  3. What version of GO are you using?
    1.23.3

4.Server version:* E.g. 1.90.1

8.46.1

  1. What did you do?

We have been using the snowflake go driver on v1.7.1 for quite a long time now, as we first encountered issues with v1.8. Back then we downgraded and did not followup on the matter under the assumption that it might just be a client issue and will be fixed with the next versions. Recently we updated to 1.12.1 to verify that this is working and received the the following error on the client:
Failed to upload data to snowflake via PUT: 264003: unexpected error while retrieving header: operation error S3: HeadObject, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Head "https://<s3url>": write tcp <internal IP>-><aws ip>:443: write: broken pipe
Downgrading back to 1.7.1 fixes the issue for now.

  1. What did you expect to see?

    v1.12.1 to work the same as v1.7.1

  2. Can you set logging to DEBUG and collect the logs?

    Not easily

@tobischo tobischo added the bug Erroneous or unexpected behaviour label Dec 17, 2024
@github-actions github-actions bot changed the title Issues uploading data via PUT to s3 on driver v1.8 or above (including 1.12.1) SNOW-1859664: Issues uploading data via PUT to s3 on driver v1.8 or above (including 1.12.1) Dec 17, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Dec 17, 2024
@sfc-gh-dszmolka
Copy link
Contributor

sfc-gh-dszmolka commented Dec 17, 2024

hi - thanks for filing this issue with us. So one likely change which influence the behaviour you're seeing between 1.7.1 and 1.8.0 is #991, where we stopped swallowing the errors we got back from cloud storage :)
edit: also with 1.12.1 came #1242 where we changed the behaviour of 'don't raise PUT/GET errors even if they are failing' to 'do raise PUT/GET errors if they are failing'.

So no extra error was introduced, we just simply surface the error now, which error was always there - instead of silently ignoring it.

I just confirmed by PUTting a file to my internal stage on a test account on AWS EU Frankfurt with 1.12.1.

The Snowflake PUT consists of two phases when using this Go driver

  • phase1 - driver sends HTTP PUT with PutObject in the request path to, well, upload the file :)
  • phase2- after it, driver sends HTTP HEAD against the just-uploaded file, to verify if it was really uploaded

The second operation is maybe failing in your case. The file itself is likely uploaded (see phase1), just the verification step seems to fail in your case.

As of next step, you can

  • review if any policy, application level firewall, anything else on your network prevents HEAD or HeadObject request from being issued to your internal stage. A tcpdump or similar tool can help too.
  • please make sure that STAGE type endpoints returned from SELECT SYSTEM$ALLOWLIST(); are allowed on your network
  • alternatively, you can use the latest driver and enable S3 client logging in hope of getting more details on the error itself

Also instead of fixing the underlying issue on the infrastructure, you can decide to continue ignoring PUT/GET errors by setting RaisePutGetError: false

Let me know please how this went.

@sfc-gh-dszmolka sfc-gh-dszmolka added question Issue is a usage/other question rather than a bug status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter and removed bug Erroneous or unexpected behaviour labels Dec 17, 2024
@tobischo
Copy link
Author

tobischo commented Dec 17, 2024

I can confirm that HEAD is not blocked. Those requests are going through successfully from the same machines that the code is on using curl -I <url>.

The STAGE type endpoints are also not blocked (verified that as well).

I assume all of the hosts are 100% snowflake owned, so there should be nothing on the receiving side blocking this?

Based on what you are saying, RaisePutGetError: false is basically what has been happening so far and it worked perfectly? Can we expect any downside from going that route (for now) in the near future?

Logger: would we still get the detailed log output when setting RaisePutGetError: false?
If yes, my preferred approach would be, to add a logger matching our logging scheme, and disabling receiving the errors specifically for now so that it would continue working.

@sfc-gh-dszmolka
Copy link
Contributor

sfc-gh-dszmolka commented Dec 17, 2024

RaisePutGetError is only governing whether the errors (if any encountered during PUT or GET) should be raised to the user or silently ignored. That's the difference. You can tune it per your own preference.
It should be used independently from S3LoggingMode (S3 client logging) .

But ignoring or surfacing the same errors do not fix those errors. If they were originating from the Snowflake end, I would expect tons of issues reported to us from other users, but there isn't any (besides this).
So for now, I assume this is a network / connectivity related issue somewhere between the source host where you run gosnowflake and the AWS S3 blob. Hence, the tcpdump mentioned earlier could be very helpful for you in troubleshooting.

I still would like to re-do my test in the same Snowflake deployment as yours, just to be sure. Can you please share in which Snowflake deployment you're experiencing this problem?

@tobischo
Copy link
Author

tobischo commented Dec 17, 2024

This is for the *.eu-central-1.snowflakecomputing.com snowflake deployment.

The connections are coming from a data center (not AWS), so maybe there is something blocked on the side of the snowflake deployment, but I am very certain that it is not something that is blocked from the data center.

I understand that hiding the error doesn't fix it... I want to have a somewhat safe way to get it into a release so that i do not have to roll back 5 minutes after I saw the first issue when this is apparently not a currently critical element.
According to your response: nothing changed in the context of what is being done to connect to snowflake aside from surfacing the error. That means it must already have failed anyway right now, which means that it should work while being silenced and we can safely look into the errors under realistic conditions.

@sfc-gh-dszmolka
Copy link
Contributor

This is for the .eu-central-1.snowflakecomputing.com snowflake deployment.

Perfect - my tests were run on the exact same deployment:

I just confirmed by PUTting a file to my internal stage on a test account on AWS EU Frankfurt with 1.12.1.

and worked, so i can confirm it works on the Snowflake side, so next step is probably for you to work together with your network guys and trace down on which hop where the socket gets closed which in turn leads to broken pipe.
This issue is not associated with this library where it was filed (gosnowflake) and unfortunately being outside of the Snowflake-controlled environment, we cannot fix it.

We can help of course by giving hints (here), I already suggested some tools for it (tcpdump to see where the socket is closed and generally, what happens to a packet issued to S3) but of course your network team can use any other tool at their disposal.

Since you mentioned the source is a data center, I would also suggest checking with your network guys if you're perhaps using a S3 Gateway which can contribute to this behaviour if it's not configured to transparently allow all kinds of requests to Snowflake STAGE S3 endpoints.

According to your response: nothing changed in the context of what is being done to connect to snowflake aside from surfacing the error. That means it must already have failed anyway right now, which means that it should work while being silenced and we can safely look into the errors under realistic conditions.

That is entirely correct and I agree with you.

I guess you need to consider: are the files actually uploaded with 1.8.0 / 1.12.1 too to the Snowflake internal stage correctly, the same way as with 1.7.1 ? You can verify it e.g. with downloading the same file with GET and comparing if it's the same file as was uploaded with PUT just before.
If it is, then you verified that regardless of the 'new' error you're getting, the file upload flow works and can maybe more comfortably use RaisePutGetError: false to not raise error. Please also be aware that the true value became the default for a reason - there might be other errors in the future which you might want to see and address so if that's an option, it might be a good idea to eliminate the root cause eventually which triggers this error.

I think that's all I can add to this issue for now. If you perhaps need Snowflake's help on looking into information which you might not want to share here (the driver's logs which can contain sensitive information, the packet capture, etc) you can file an official case with Snowflake Support and we can take it from there.

Please do understand however, that Snowflake cannot fix the issue regardless if it's filed on Github or in an official Support ticket, when it is related to non-Snowflake infrastructure. Thank you for your kind understanding.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter labels Dec 18, 2024
@tobischo
Copy link
Author

No worries, I understand. This information has already been very valuable in understanding the underlying issue better.

I am 100% certain that we do not have an S3 proxy in play here, and I also checked if the HEAD requests are generally working from the machines making those requests. That's why I assumed that it is probably either a change in what the driver does or something not working on your end. So from our data center side there is nothing blocking this.

Will look into it and provide more details if it turns out that it might be driver related after all (which I doubt based on your information)

@sfc-gh-dszmolka
Copy link
Contributor

This is interesting what you just wrote:

[..] if the HEAD requests are generally working from the machines making those requests

This is a new information. So; if the HEAD requests are generally working but you have a certain particular flow, which always (or sometimes) fails and can generate the issue - then I would like to test the exact same flow on my setup and see if I can reproduce it locally, from a different network.

Do you perhaps have a reproduction setup which you can share, which I can try on my end to see if it reproduces for me? For now, I was testing general functionality with a small file, but apparently general functionality is working for you, too. Without this bit of information I was under the impression the general functionality doesn't work. The error suggested so

S3: HeadObject, exceeded maximum number of attempts, 3

meaning, even after 3 attempts, HEAD was unsuccessful.

Any information might be helpful. E.g. file must be bigger than X MB to get to the issue, any details. Best of course, would be a repro code snippet, a runnable program, or a repro Github repo shared if that's an option here.
Really want to rule out potential issues on Snowflake end with some edge cases into which maybe most of the people are not running into.

Thank you in advance !

@sfc-gh-pfus
Copy link
Collaborator

sfc-gh-pfus commented Dec 18, 2024

Hi, regarding this one:

We are using PUT through the SQL interface, so just setting RaisePutGetError: false might be a tiny bit more complicated.

This is perfectly fine. It is the only way to use PUTs with Go driver. You can set this value by:

ctx := WithFileTransferOptions(context.Background(), &SnowflakeFileTransferOptions{RaisePutGetError: false})
db.ExecContext(ctx, "PUT ...")

@tobischo
Copy link
Author

I ran the HEAD requests using curl, but the suggestion was that it would be a network issue and it did not fail with a network issue but rather because I didn't pass the credentials. That would mean that it should not be generally related to the network on the source of the request side blocking anything.

Sure, I can try to stitch together an example case

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter and removed status-triage_done Initial triage done, will be further handled by the driver team labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue is a usage/other question rather than a bug status-information_needed Additional information is required from the reporter status-triage Issue is under initial triage
Projects
None yet
Development

No branches or pull requests

3 participants