Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-719759: Is there subset of requirements that include Python Pandas that would fit as a lambda layer in AWS (<= 50MB) #1384

Closed
ArashMehraban opened this issue Dec 27, 2022 · 9 comments

Comments

@ArashMehraban
Copy link

Is there a subset of these requirements: https://github.com/snowflakedb/snowflake-connector-python/blob/main/tested_requirements/requirements_38.reqs that could include Python Pandas (an Numpy as a pre-requisite for Pandas) for which the build will be less than 50MB in size (zipped) to be utilized as a layer in AWS? The requirements listed above will build a 33MB zipped folder and Pandas/Numpy will add another 25MB. Together, they are more than 50MB that is allowed as a lambda layer in AWS.

@github-actions github-actions bot changed the title Is there subset of requirements that include Python Pandas that would fit as a lambda layer in AWS (<= 50MB) SNOW-719759: Is there subset of requirements that include Python Pandas that would fit as a lambda layer in AWS (<= 50MB) Dec 27, 2022
@sfc-gh-yixie
Copy link
Collaborator

@sfc-gh-aling You may have more information.

@sfc-gh-sfan sfc-gh-sfan closed this as not planned Won't fix, can't repro, duplicate, stale Mar 9, 2023
@sfc-gh-sfan sfc-gh-sfan reopened this Mar 9, 2023
@sfc-gh-sfan sfc-gh-sfan removed the Stale label Mar 9, 2023
@sfc-gh-sfan
Copy link
Contributor

This stale related operation is impacted by a test run of a stale bot workflow.

@sfc-gh-achandrasekaran
Copy link
Contributor

@ArashMehraban We are actively working on unlocking this functionality with a current ETA of June 2023. Unfortunately, there is not something that works currently.

@ArashMehraban
Copy link
Author

@sfc-gh-achandrasekaran Thanks for the reply! By "this functionality", in your statement, do you mean, have a list of snowflake requirements in addition to pandas and numpy to be less that 50M zipped to be used in aws lambda?

@sfc-gh-achandrasekaran
Copy link
Contributor

We cannot create a subset of the requirements for the connector. That being said we are actively working on reducing the size of the python connector overall to support scenarios like yours.

Lambda has an unzipped file limit of 250MB ( as opposed to the 50 MB limit for zipped files). Does that work for you in the meantime?

@ArashMehraban
Copy link
Author

@sfc-gh-achandrasekaran

-Reducing the size of the connector by June, will work for me! Meanwhile, I have found ways to reduce the size of numpy/pandas as well by deleting the docs, etc. that come with those packages.

  • I have not used the unzipped version of files/folders as lambda layers in AWS. The reason for that is I use Terraform to upload the lambda layer and it accepts a zipped folder name as the lambda layer, not a list of folder names.

  • At any rate, once the connector size gets reduced to a point that enables me to have the connector and pandas and numpy in <=50M zipped folder in a lambda layer, it would be super useful. In addition, it enables me to cut costs as well. Right now, I use the connector layer in one lambda to retrieve data from Snowflake. Then I use another lambda to load the data the first lambda extracted from snowflake to be processed in pandas. Obviously, these two lambdas could be consolidated into one lambda if the layer could include both the snowflake connector and pandas/numpy.

@sfc-gh-anugupta
Copy link

Hi All ,

We have released a new preview version of connector with reduced sized with nanoarrow which you can check at this blog post https://medium.com/snowflake/supercharging-the-snowflake-python-connector-with-nanoarrow-8388cb57eeba

Do let us know your feedback. Do note this is still in preview, so we dont recommend it used for production.

Thanks
Anurag

@sfc-gh-aling
Copy link
Collaborator

Hi all, we're thrilled to announce that snowflake-connector-python 3.5.0 is released which removes the restriction of pyarrow dependency as well as reduces the package size: https://pypi.org/project/snowflake-connector-python/3.5.0/

please give it a try!

@Schambry
Copy link

Hey guys,

I'm experiencing this as well. obviously it's been a while since 3.5.0, I'm now using the latest, 3.12.3. I'm using the recommended by AWS method for creating a binary package to be deployed as a layer:

pip install
--platform manylinux2014_x86_64
--target=
--implementation cp
--python-version 3.12
--only-binary=:all: --upgrade
"snowflake-connector-python[pandas]"

I can see that the dependency on pyarrow is gone, but the total uncompressed size is still far too large. Any recommendations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants
@ArashMehraban @sfc-gh-yixie @sfc-gh-sfan @sfc-gh-achandrasekaran @sfc-gh-aling @sfc-gh-anugupta @Schambry and others