Skip to content

Releases: snowflakedb/snowpark-python

1.0.0

01 Nov 04:32
1ec8b19
Compare
Choose a tag to compare

1.0.0 (2022-11-01)

New Features

  • Added Session.generator() to create a new DataFrame using the Generator table function.
  • Added a parameter secure to the functions that create a secure UDF or UDTF.

v0.12.0

14 Oct 21:15
87faa2f
Compare
Choose a tag to compare
v0.12.0 Pre-release
Pre-release

0.12.0 (2022-10-14)

New Features

  • Added new APIs for async job:
    • Session.create_async_job() to create an AsyncJob instance from a query id.
    • AsyncJob.result() now accepts argument result_type to return the results in different formats.
    • AsyncJob.to_df() returns a DataFrame built from the result of this asynchronous job.
    • AsyncJob.query() returns the SQL text of the executed query.
  • DataFrame.agg() and RelationalGroupedDataFrame.agg() now accept variable-length arguments.
  • Added parameters lsuffix and rsuffix to DataFram.join() and DataFrame.cross_join() to conveniently rename overlapping columns.
  • Added Table.drop_table() so you can drop the temp table after DataFrame.cache_result(). Table is also a context manager so you can use the with statement to drop the cache temp table after use.
  • Added Session.use_secondary_roles().
  • Added functions first_value() and last_value(). (contributed by @chasleslr)
  • Added on as an alias for using_columns and how as an alias for join_type in DataFrame.join().

Bug Fixes

  • Fixed a bug in Session.create_dataframe() that raised an error when schema names had special characters.
  • Fixed a bug in which options set in Session.read.option() were not passed to DataFrame.copy_into_table() as default values.
  • Fixed a bug in which DataFrame.copy_into_table() raises an error when a copy option has single quotes in the value.

v0.11.0

29 Sep 16:55
7a8f511
Compare
Choose a tag to compare
v0.11.0 Pre-release
Pre-release

0.11.0 (2022-09-28)

Behavior Changes:

  • Session.add_packages() now raises ValueError when the version of a package cannot be found in Snowflake Anaconda channel. Previously, Session.add_packages() succeeded, and a SnowparkSQLException exception was raised later in the UDF/SP registration step.

New Features:

  • Added method FileOperation.get_stream() to support downloading stage files as stream.
  • Added support in functions.ntiles() to accept int argument.
  • Added the following aliases:
    • functions.call_function() for functions.call_builtin().
    • functions.function() for functions.builtin().
    • DataFrame.order_by() for DataFrame.sort()
    • DataFrame.orderBy() for DataFrame.sort()
  • Improved DataFrame.cache_result() to return a more accurate Table class instead of a DataFrame class.
  • Added support to allow session as the first argument when calling StoredProcedure.

Improvements:

  • Improved nested query generation by flattening queries when applicable.
    • This improvement could be enabled by setting Session.sql_simplifier_enabled = True.
    • DataFrame.select(), DataFrame.with_column(), DataFrame.drop() and other select-related APIs have more flattened SQLs.
    • DataFrame.union(), DataFrame.union_all(), DataFrame.except_(), DataFrame.intersect(), DataFrame.union_by_name() have flattened SQLs generated when multiple set operators are chained.
  • Improved type annotations for async job APIs.

Bug Fixes:

  • Fixed a bug in which Table.update(), Table.delete(), Table.merge() try to reference a temp table that does not exist.

v0.10.0

16 Sep 19:30
Compare
Choose a tag to compare
v0.10.0 Pre-release
Pre-release

0.10.0 (2022-09-16)

New Features:

  • Added experimental APIs for evaluating Snowpark dataframes with asynchronous queries:
    • Added keyword argument block to the following action APIs on Snowpark dataframes (which execute queries) to allow asynchronous evaluations:
      • DataFrame.collect(), DataFrame.to_local_iterator(), DataFrame.to_pandas(), DataFrame.to_pandas_batches(), DataFrame.count(), DataFrame.first().
      • DataFrameWriter.save_as_table(), DataFrameWriter.copy_into_location().
      • Table.delete(), Table.update(), Table.merge().
    • Added method DataFrame.collect_nowait() to allow asynchronous evaluations.
    • Added class AsyncJob to retrieve results from asynchronously executed queries and check their status.
  • Added support for table_type in Session.write_pandas(). You can now choose from these table_type options: "temporary", "temp", and "transient".
  • Added support for using Python structured data (list, tuple and dict) as literal values in Snowpark.
  • Added keyword argument execute_as to functions.sproc() and session.sproc.register() to allow registering a stored procedure as a caller or owner.
  • Added support for specifying a pre-configured file format when reading files from a stage in Snowflake.

Improvements:

  • Added support for displaying details of a Snowpark session.

Bug Fixes:

  • Fixed a bug in which DataFrame.copy_into_table() and DataFrameWriter.save_as_table() mistakenly created a new table if the table name is fully qualified, and the table already exists.

Deprecations:

  • Deprecated keyword argument create_temp_table in Session.write_pandas().
  • Deprecated invoking UDFs using arguments wrapped in a Python list or tuple. You can use variable-length arguments without a list or tuple.

Dependency updates

  • Updated snowflake-connector-python to 2.7.12.

v0.9.0

31 Aug 00:52
d45eb5c
Compare
Choose a tag to compare
v0.9.0 Pre-release
Pre-release

0.9.0 (2022-08-30)

New Features:

  • Added support for displaying source code as comments in the generated scripts when registering UDFs.
    This feature is turned on by default. To turn it off, pass the new keyword argument source_code_display as False when calling register() or @udf().
  • Added support for calling table functions from DataFrame.select(), DataFrame.with_column() and DataFrame.with_columns() which now take parameters of type table_function.TableFunctionCall for columns.
  • Added keyword argument overwrite to session.write_pandas() to allow overwriting contents of a Snowflake table with that of a Pandas DataFrame.
  • Added keyword argument column_order to df.write.save_as_table() to specify the matching rules when inserting data into table in append mode.
  • Added method FileOperation.put_stream() to upload local files to a stage via file stream.
  • Added methods TableFunctionCall.alias() and TableFunctionCall.as_() to allow aliasing the names of columns that come from the output of table function joins.
  • Added function get_active_session() in module snowflake.snowpark.context to get the current active Snowpark session.

Bug Fixes:

  • Fixed a bug in which batch insert should not raise an error when statement_params is not passed to the function.
  • Fixed a bug in which column names should be quoted when session.create_dataframe() is called with dicts and a given schema.
  • Fixed a bug in which creation of table should be skipped if the table already exists and is in append mode when calling df.write.save_as_table().
  • Fixed a bug in which third-party packages with underscores cannot be added when registering UDFs.

Improvements:

  • Improved function function.uniform() to infer the types of inputs max_ and min_ and cast the limits to IntegerType or FloatType correspondingly.

v0.8.0

25 Jul 20:50
510b2b5
Compare
Choose a tag to compare
v0.8.0 Pre-release
Pre-release

0.8.0 (2022-07-22)

New Features:

  • Added keyword only argument statement_params to the following methods to allow for specifying statement level parameters:
    • collect, to_local_iterator, to_pandas, to_pandas_batches,
      count, copy_into_table, show, create_or_replace_view, create_or_replace_temp_view, first, cache_result
      and random_split on class snowflake.snowpark.Dateframe.
    • update, delete and merge on class snowflake.snowpark.Table.
    • save_as_table and copy_into_location on class snowflake.snowpark.DataFrameWriter.
    • approx_quantile, statement_params, cov and crosstab on class snowflake.snowpark.DataFrameStatFunctions.
    • register and register_from_file on class snowflake.snowpark.udf.UDFRegistration.
    • register and register_from_file on class snowflake.snowpark.udtf.UDTFRegistration.
    • register and register_from_file on class snowflake.snowpark.stored_procedure.StoredProcedureRegistration.
    • udf, udtf and sproc in snowflake.snowpark.functions.
  • Added support for Column as an input argument to session.call().
  • Added support for table_type in df.write.save_as_table(). You can now choose from these table_type options: "temporary", "temp", and "transient".

Improvements:

  • Added validation of object name in session.use_* methods.
  • Updated the query tag in SQL to escape it when it has special characters.
  • Added a check to see if Anaconda terms are acknowledged when adding missing packages.

Bug Fixes:

  • Fixed the limited length of the string column in session.create_dataframe().
  • Fixed a bug in which session.create_dataframe() mistakenly converted 0 and False to None when the input data was only a list.
  • Fixed a bug in which calling session.create_dataframe() using a large local dataset sometimes created a temp table twice.
  • Aligned the definition of function.trim() with the SQL function definition.
  • Fixed an issue where snowpark-python would hang when using the Python system-defined (built-in function) sum vs. the Snowpark function.sum().

v0.7.0

07 Jun 23:35
6bf1a1e
Compare
Choose a tag to compare
v0.7.0 Pre-release
Pre-release

0.7.0

New Features:

  • Added support for user-defined table functions (UDTFs).
    • Use function snowflake.snowpark.functions.udtf() to register a UDTF, or use it as a decorator to register the UDTF.
      • You can also use Session.udtf.register() to register a UDTF.
    • Use Session.udtf.register_from_file() to register a UDTF from a Python file.
  • Updated APIs to query a table function, including both Snowflake built-in table functions and UDTFs.
    • Use function snowflake.snowpark.functions.table_function() to create a callable representing a table function and use it to call the table function in a query.
    • Alternatively, use function snowflake.snowpark.functions.call_table_function() to call a table function.
    • Added support for over clause that specifies partition by and order by when lateral joining a table function.
    • Updated Session.table_function() and DataFrame.join_table_function() to accept TableFunctionCall instances.

Breaking Changes:

  • When creating a function with functions.udf() and functions.sproc(), you can now specify an empty list for the imports or packages argument to indicate that no import or package is used for this UDF or stored procedure. Previously, specifying an empty list meant that the function would use session-level imports or packages.
  • Improved the __repr__ implementation of data types in types.py. The unused type_name property has been removed.
  • Added a Snowpark-specific exception class for SQL errors. This replaces the previous ProgrammingError from the Python connector.

Improvements:

  • Added a lock to a UDF or UDTF when it is called for the first time per thread.
  • Improved the error message for pickling errors that occurred during UDF creation.
  • Included the query ID when logging the failed query.

Bug Fixes:

  • Fixed a bug in which non-integral data (such as timestamps) was occasionally converted to integer when calling DataFrame.to_pandas().
  • Fixed a bug in which DataFrameReader.parquet() failed to read a parquet file when its column contained spaces.
  • Fixed a bug in which DataFrame.copy_into_table() failed when the dataframe is created by reading a file with inferred schemas.

Deprecations

Session.flatten() and DataFrame.flatten().

Dependency Updates:

  • Restricted the version of cloudpickle <= 2.0.0.

v0.6.0

07 Jun 18:59
Compare
Choose a tag to compare
v0.6.0 Pre-release
Pre-release

0.6.0

New Features:

  • Added support for vectorized UDFs with the input as a Pandas DataFrame or Pandas Series and the output as a Pandas Series. This improves the performance of UDFs in Snowpark.
  • Added support for inferring the schema of a DataFrame by default when it is created by reading a Parquet, Avro, or ORC file in the stage.
  • Added functions current_session(), current_statement(), current_user(), current_version(), current_warehouse(), date_from_parts(), date_trunc(), dayname(), dayofmonth(), dayofweek(), dayofyear(), grouping(), grouping_id(), hour(), last_day(), minute(), next_day(), previous_day(), second(), month(), monthname(), quarter(), year(), current_database(), current_role(), current_schema(), current_schemas(), current_region(), current_avaliable_roles(), add_months(), any_value(), bitnot(), bitshiftleft(), bitshiftright(), convert_timezone(), uniform(), strtok_to_array(), sysdate(), time_from_parts(), timestamp_from_parts(), timestamp_ltz_from_parts(), timestamp_ntz_from_parts(), timestamp_tz_from_parts(), weekofyear(), percentile_cont() to snowflake.snowflake.functions.

Breaking Changes:

  • Expired deprecations:
    • Removed the following APIs that were deprecated in 0.4.0: DataFrame.groupByGroupingSets(), DataFrame.naturalJoin(), DataFrame.joinTableFunction, DataFrame.withColumns(), Session.getImports(), Session.addImport(), Session.removeImport(), Session.clearImports(), Session.getSessionStage(), Session.getDefaultDatabase(), Session.getDefaultSchema(), Session.getCurrentDatabase(), Session.getCurrentSchema(), Session.getFullyQualifiedCurrentSchema().

Improvements:

  • Added support for creating an empty DataFrame with a specific schema using the Session.create_dataframe() method.
  • Changed the logging level from INFO to DEBUG for several logs (e.g., the executed query) when evaluating a dataframe.
  • Improved the error message when failing to create a UDF due to pickle errors.

Bug Fixes:

  • Removed pandas hard dependencies in the Session.create_dataframe() method.

Dependency Updates:

  • Added typing-extension as a new dependency with the version >= 4.1.0.

v0.5.0

01 Apr 20:16
c8d5e2a
Compare
Choose a tag to compare
v0.5.0 Pre-release
Pre-release

New Features

  • Added stored procedures API.
    • Added Session.sproc property and sproc() to snowflake.snowpark.functions, so you can register stored procedures.
    • Added Session.call to call stored procedures by name.
  • Added UDFRegistration.register_from_file() to allow registering UDFs from Python source files or zip files directly.
  • Added UDFRegistration.describe() to describe a UDF.
  • Added DataFrame.random_split() to provide a way to randomly split a dataframe.
  • Added functions md5(), sha1(), sha2(), ascii(), initcap(), length(), lower(), lpad(), ltrim(), rpad(), rtrim(), repeat(), soundex(), regexp_count(), replace(), charindex(), collate(), collation(), insert(), left(), right(), endswith() to snowflake.snowpark.functions.
  • Allowed call_udf() to accept literal values.
  • Provided a distinct keyword in array_agg().

Bug Fixes:

  • Fixed an issue that caused DataFrame.to_pandas() to have a string column if Column.cast(IntegerType()) was used.
  • Fixed a bug in DataFrame.describe() when there is more than one string column.

v0.4.1

15 Mar 18:48
3b656d1
Compare
Choose a tag to compare
v0.4.1 Pre-release
Pre-release

0.4.1 (2022-02-25)

Bug Fixes

  • Fixed a bug in DataFrame.describe() that raised an error when the DataFrame has more than one string columns.