Releases: snowflakedb/snowpark-python
Release
1.20.0 (2024-07-17)
Snowpark Python API Updates
Improvements
- Added distributed tracing using open telemetry APIs for table stored procedure function in
DataFrame
:_execute_and_get_query_id
- Added support for the
arrays_zip
function. - Improves performance for binary column expression and
df._in
by avoiding unnecessary cast for numeric values. You can enable this optimization by settingsession.eliminate_numeric_sql_value_cast_enabled = True
. - Improved error message for
write_pandas
when the target table does not exist andauto_create_table=False
. - Added open telemetry tracing on UDxF functions in Snowpark.
- Added open telemetry tracing on stored procedure registration in Snowpark.
- Added a new optional parameter called
format_json
to theSession.SessionBuilder.app_name
function that sets the app name in theSession.query_tag
in JSON format. By default, this parameter is set toFalse
.
Bug Fixes
- Fixed a bug where SQL generated for
lag(x, 0)
was incorrect and failed with error messageargument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'
.
Snowpark Local Testing Updates
New Features
- Added support for the following APIs:
- snowflake.snowpark.functions
- random
- snowflake.snowpark.functions
- Added new parameters to
patch
function when registering a mocked function:distinct
allows an alternate function to be specified for when a sql function should be distinct.pass_column_index
passes a named parametercolumn_index
to the mocked function that contains the pandas.Index for the input data.pass_row_index
passes a named parameterrow_index
to the mocked function that is the 0 indexed row number the function is currently operating on.pass_input_data
passes a named parameterinput_data
to the mocked function that contains the entire input dataframe for the current expression.- Added support for the
column_order
parameter to methodDataFrameWriter.save_as_table
.
Bug Fixes
- Fixed a bug that caused DecimalType columns to be incorrectly truncated to integer precision when used in BinaryExpressions.
Snowpark pandas API Updates
New Features
- Added support for
DataFrameGroupBy.all
,SeriesGroupBy.all
,DataFrameGroupBy.any
, andSeriesGroupBy.any
. - Added support for
DataFrame.nlargest
,DataFrame.nsmallest
,Series.nlargest
andSeries.nsmallest
. - Added support for
replace
andfrac > 1
inDataFrame.sample
andSeries.sample
. - Added support for
read_excel
(Uses local pandas for processing) - Added support for
Series.at
,Series.iat
,DataFrame.at
, andDataFrame.iat
. - Added support for
Series.dt.isocalendar
. - Added support for
Series.case_when
except when condition or replacement is callable. - Added documentation pages for
Index
and its APIs. - Added support for
DataFrame.assign
. - Added support for
DataFrame.stack
. - Added support for
DataFrame.pivot
andpd.pivot
. - Added support for
DataFrame.to_csv
andSeries.to_csv
. - Added partial support for
Series.str.translate
where the values in thetable
are single-codepoint strings. - Added support for
DataFrame.corr
. - Allow
df.plot()
andseries.plot()
to be called, materializing the data into the local client - Added support for
DataFrameGroupBy
andSeriesGroupBy
aggregationsfirst
andlast
- Added support for
DataFrameGroupBy.get_group
. - Added support for
limit
parameter whenmethod
parameter is used infillna
. - Added partial support for
Series.str.translate
where the values in thetable
are single-codepoint strings. - Added support for
DataFrame.corr
. - Added support for
DataFrame.equals
andSeries.equals
. - Added support for
DataFrame.reindex
andSeries.reindex
. - Added support for
Index.astype
. - Added support for
Index.unique
andIndex.nunique
.
Bug Fixes
- Fixed an issue when using np.where and df.where when the scalar 'other' is the literal 0.
- Fixed a bug regarding precision loss when converting to Snowpark pandas
DataFrame
orSeries
withdtype=np.uint64
. - Fixed bug where
values
is set toindex
whenindex
andcolumns
contain all columns in DataFrame duringpivot_table
.
Improvements
- Added support for
Index.copy()
- Added support for Index APIs:
dtype
,values
,item()
,tolist()
,to_series()
andto_frame()
- Expand support for DataFrames with no rows in
pd.pivot_table
andDataFrame.pivot_table
. - Added support for
inplace
parameter inDataFrame.sort_index
andSeries.sort_index
.
Release
1.19.0 (2024-06-25)
Snowpark Python API Updates
Improvements
New Features
- Added support for
to_boolean
function. - Added documentation pages for
Index
and its APIs.
Bug Fixes
- Fixed a bug where python stored procedure with table return type fails when run in a task.
- Fixed a bug where df.dropna fails due to
RecursionError: maximum recursion depth exceeded
when the DataFrame has more than 500 columns. - Fixed a bug where
AsyncJob.result("no_result")
doesn't wait for the query to finish execution.
Snowpark Local Testing Updates
New Features
- Added support for the
strict
parameter when registering UDFs and Stored Procedures.
Bug Fixes
- Fixed a bug in convert_timezone that made the setting the source_timezone parameter return an error.
- Fixed a bug where creating DataFrame with empty data of type
DateType
raisesAttributeError
. - Fixed a bug that table merge fails when update clause exists but no update takes place.
- Fixed a bug in mock implementation of
to_char
that raisesIndexError
when incoming column has nonconsecutive row index. - Fixed a bug in handling of
CaseExpr
expressions that raisesIndexError
when incoming column has nonconsecutive row index. - Fixed a bug in implementation of
Column.like
that raisesIndexError
when incoming column has nonconsecutive row index.
Improvements
- Added support for type coercion in the implementation of DataFrame.replace, DataFrame.dropna and the mock function
iff
.
Snowpark pandas API Updates
New Features
- Added partial support for
DataFrame.pct_change
andSeries.pct_change
without thefreq
andlimit
parameters. - Added support for
Series.str.get
. - Added support for
Series.dt.dayofweek
,Series.dt.day_of_week
,Series.dt.dayofyear
, andSeries.dt.day_of_year
. - Added support for
Series.str.__getitem__
(Series.str[...]
). - Added support for
Series.str.lstrip
andSeries.str.rstrip
. - Added support for
DataFrameGroupby.size
andSeriesGroupby.size
. - Added support for
DataFrame.expanding
andSeries.expanding
for aggregationscount
,sum
,min
,max
,mean
,std
, andvar
withaxis=0
. - Added support for
DataFrame.rolling
andSeries.rolling
for aggregationcount
withaxis=0
. - Added support for
Series.str.match
. - Added support for
DataFrame.resample
andSeries.resample
for aggregationsize
.
Bug Fixes
- Fixed a bug that causes output of GroupBy.aggregate's columns to be ordered incorrectly.
- Fixed a bug where
DataFrame.describe
on a frame with duplicate columns of differing dtypes could cause an error or incorrect results. - Fixed a bug in
DataFrame.rolling
andSeries.rolling
sowindow=0
now throwsNotImplementedError
instead ofValueError
Improvements
- Added support for named aggregations in
DataFrame.aggregate
andSeries.aggregate
withaxis=0
. pd.read_csv
reads using the native pandas CSV parser, then uploads data to snowflake using parquet. This enables most of the parameters supported byread_csv
including date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.- Initial work to support an
pd.Index
directly in Snowpark pandas. Support forpd.Index
as a first-class component of Snowpark pandas is coming soon. - Added a lazy index constructor and support for
len
,shape
,size
,empty
,to_pandas()
andnames
. Fordf.index
, Snowpark pandas creates a lazy index object. - For
df.columns
, Snowpark pandas supports a non-lazy version of anIndex
since the data is already stored locally.
Release
1.18.0 (2024-05-28)
Snowpark pandas API Updates
New Features
- Added
DataFrame.cache_result
andSeries.cache_result
methods for users to persist DataFrames and Series to a temporary table lasting the duration of the session to improve latency of subsequent operations.
Improvements
- Added partial support for
DataFrame.pivot_table
with noindex
parameter, as well as formargins
parameter. - Updated the signature of
DataFrame.shift
/Series.shift
/DataFrameGroupBy.shift
/SeriesGroupBy.shift
to match pandas 2.2.1. Snowpark pandas does not yet support the newly-addedsuffix
argument, or sequence values ofperiods
. - Re-added support for
Series.str.split
.
Bug Fixes
- Fixed how we support mixed columns for string methods (
Series.str.*
).
Snowpark Local Testing Updates
New Features
- Added support for the following DataFrameReader read options to file formats
csv
andjson
:- PURGE
- PATTERN
- INFER_SCHEMA with value being
False
- ENCODING with value being
UTF8
- Added support for
DataFrame.analytics.moving_agg
andDataFrame.analytics.cumulative_agg_agg
. - Added support for
if_not_exists
parameter during UDF and stored procedure registration.
Bug Fixes
- Fixed a bug that when processing time format, fractional second part is not handled properly.
- Fixed a bug that caused function calls on
*
to fail. - Fixed a bug that prevented creation of map and struct type objects.
- Fixed a bug that function
date_add
was unable to handle some numeric types. - Fixed a bug that
TimestampType
casting resulted in incorrect data. - Fixed a bug that caused
DecimalType
data to have incorrect precision in some cases. - Fixed a bug where referencing missing table or view raises confusing
IndexError
. - Fixed a bug that mocked function
to_timestamp_ntz
can not handle None data. - Fixed a bug that mocked UDFs handles output data of None improperly.
- Fixed a bug where
DataFrame.with_column_renamed
ignores attributes from parent DataFrames after join operations. - Fixed a bug that integer precision of large value gets lost when converted to pandas DataFrame.
- Fixed a bug that the schema of datetime object is wrong when create DataFrame from a pandas DataFrame.
- Fixed a bug in the implementation of
Column.equal_nan
where null data is handled incorrectly. - Fixed a bug where
DataFrame.drop
ignore attributes from parent DataFrames after join operations. - Fixed a bug in mocked function
date_part
where Column type is set wrong. - Fixed a bug where
DataFrameWriter.save_as_table
does not raise exceptions when inserting null data into non-nullable columns. - Fixed a bug in the implementation of
DataFrameWriter.save_as_table
where- Append or Truncate fails when incoming data has different schema than existing table.
- Truncate fails when incoming data does not specify columns that are nullable.
Improvements
- Removed dependency check for
pyarrow
as it is not used. - Improved target type coverage of
Column.cast
, adding support for casting to boolean and all integral types. - Aligned error experience when calling UDFs and stored procedures.
- Added appropriate error messages for
is_permanent
andanonymous
options in UDFs and stored procedures registration to make it more clear that those features are not yet supported. - File read operation with unsupported options and values now raises
NotImplementedError
instead of warnings and unclear error information.
Release
1.17.0 (2024-05-21)
Snowpark Python API Updates
New Features
- Added support to add a comment on tables and views using the functions listed below:
DataFrameWriter.save_as_table
DataFrame.create_or_replace_view
DataFrame.create_or_replace_temp_view
DataFrame.create_or_replace_dynamic_table
Improvements
- Improved error message to remind users to set
{"infer_schema": True}
when reading CSV file without specifying its schema.
Snowpark pandas API Updates
New Features
- Start of Public Preview of Snowpark pandas API. Refer to the Snowpark pandas API Docs for more details.
Snowpark Local Testing Updates
New Features
- Added support for NumericType and VariantType data conversion in the mocked function
to_timestamp_ltz
,to_timestamp_ntz
,to_timestamp_tz
andto_timestamp
. - Added support for DecimalType, BinaryType, ArrayType, MapType, TimestampType, DateType and TimeType data conversion in the mocked function
to_char
. - Added support for the following APIs:
- snowflake.snowpark.functions:
- to_varchar
- snowflake.snowpark.DataFrame:
- pivot
- snowflake.snowpark.Session:
- cancel_all
- snowflake.snowpark.functions:
- Introduced a new exception class
snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException
. - Added support for casting to FloatType
Bug Fixes
- Fixed a bug that stored procedure and UDF should not remove imports already in the
sys.path
during the clean-up step. - Fixed a bug that when processing datetime format, the fractional second part is not handled properly.
- Fixed a bug that on Windows platform that file operations was unable to properly handle file separator in directory name.
- Fixed a bug that on Windows platform that when reading a pandas dataframe, IntervalType column with integer data can not be processed.
- Fixed a bug that prevented users from being able to select multiple columns with the same alias.
- Fixed a bug that
Session.get_current_[schema|database|role|user|account|warehouse]
returns upper-cased identifiers when identifiers are quoted. - Fixed a bug that function
substr
andsubstring
can not handle 0-basedstart_expr
.
Improvements
- Standardized the error experience by raising
SnowparkLocalTestingException
in error cases which is on par withSnowparkSQLException
raised in non-local execution. - Improved error experience of
Session.write_pandas
method thatNotImplementError
will be raised when called. - Aligned error experience with reusing a closed session in non-local execution.
Release
1.16.0 (2024-05-07)
New Features
- Added snowflake.snowpark.Session.lineage.trace to explore data lineage of Snowflake objects.
- Support stored procedure registration with packages given as Python modules.
- Added support for structured type schema parsing.
Bug Fixes
- Fixed a bug that when inferring a schema, single quotes are added to stage files that already have single quotes.
Local Testing Updates
New Features
- Added support for StringType, TimestampType and VariantType data conversion in the mocked function
to_date
. - Added support for the following APIs:
- snowflake.snowpark.functions
- get
- concat
- concat_ws
- snowflake.snowpark.functions
Bug Fixes
- Fixed a bug that caused NaT and NaN values to not be recognized.
- Fixed a bug when inferring schema, single quotes are added to stage files already have single quotes.
- Fixed a bug where DataFrameReader.csv was unable to handle quoted values containing a delimiter.
- Fixed a bug that when there is
None
value in an arithmetic calculation, the output should remainNone
instead ofmath.nan
. - Fixed a bug in function
sum
andcovar_pop
that when there ismath.nan
in the data, the output should also bemath.nan
. - Fixed a bug that stage operation can not handle directories.
- Fixed a bug that
DataFrame.to_pandas
should take Snowflake numeric types with precision 38 asint64
.
Release
1.15.0 (2024-04-24)
New Features
- Added
truncate
save mode inDataFrameWrite
to overwrite existing tables by truncating the underlying table instead of dropping it. - Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.
- Added the functions below to unload data from a
DataFrame
into one or more files in a stage:DataFrame.write.json
DataFrame.write.csv
DataFrame.write.parquet
- Added distributed tracing using open telemetry APIs for action functions in
DataFrame
andDataFrameWriter
:- snowflake.snowpark.DataFrame:
- collect
- collect_nowait
- to_pandas
- count
- show
- snowflake.snowpark.DataFrameWriter:
- save_as_table
- snowflake.snowpark.DataFrame:
- Added support for snow:// URLs to
snowflake.snowpark.Session.file.get
andsnowflake.snowpark.Session.file.get_stream
- Added support to register stored procedures and UDxFs with a
comment
. - UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.
- Added support for dynamic pivot. This feature is currently in private preview.
Improvements
- Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature not enabled by default, and can be enabled by setting
session.cte_optimization_enabled
toTrue
.
Bug Fixes
- Fixed a bug where
statement_params
was not passed to query executions that register stored procedures and user defined functions. - Fixed a bug causing
snowflake.snowpark.Session.file.get_stream
to fail for quoted stage locations. - Fixed a bug that an internal type hint in
utils.py
might raise AttributeError in case the underlying module can not be found.
Local Testing Updates
New Features
- Added support for registering UDFs and stored procedures.
- Added support for the following APIs:
- snowflake.snowpark.Session:
- file.put
- file.put_stream
- file.get
- file.get_stream
- read.json
- add_import
- remove_import
- get_imports
- clear_imports
- add_packages
- add_requirements
- clear_packages
- remove_package
- udf.register
- udf.register_from_file
- sproc.register
- sproc.register_from_file
- snowflake.snowpark.functions
- current_database
- current_session
- date_trunc
- object_construct
- object_construct_keep_null
- pow
- sqrt
- udf
- sproc
- snowflake.snowpark.Session:
- Added support for StringType, TimestampType and VariantType data conversion in the mocked function
to_time
.
Bug Fixes
- Fixed a bug that null filled columns for constant functions.
- Fixed a bug that implementation of to_object, to_array and to_binary to better handle null inputs.
- Fixed a bug that timestamp data comparison can not handle year beyond 2262.
- Fixed a bug that
Session.builder.getOrCreate
should return the created mock session.
Release
1.14.0 (2024-03-20)
New Features
- Added support for creating vectorized UDTFs with
process
method. - Added support for dataframe functions:
- to_timestamp_ltz
- to_timestamp_ntz
- to_timestamp_tz
- locate
- Added support for ASOF JOIN type.
- Added support for the following local testing APIs:
- snowflake.snowpark.functions:
- to_double
- to_timestamp
- to_timestamp_ltz
- to_timestamp_ntz
- to_timestamp_tz
- greatest
- least
- convert_timezone
- dateadd
- date_part
- snowflake.snowpark.Session:
- get_current_account
- get_current_warehouse
- get_current_role
- use_schema
- use_warehouse
- use_database
- use_role
- snowflake.snowpark.functions:
Bug Fixes
- Fixed a bug in
SnowflakePlanBuilder
thatsave_as_table
does not filter column that name start with '$' and follow by number correctly. - Fixed a bug that statement parameters may have no effect when resolving imports and packages.
- Fixed bugs in local testing:
- LEFT ANTI and LEFT SEMI joins drop rows with null values.
- DataFrameReader.csv incorrectly parses data when the optional parameter
field_optionally_enclosed_by
is specified. - Column.regexp only considers the first entry when
pattern
is aColumn
. - Table.update raises
KeyError
when updating null values in the rows. - VARIANT columns raise errors at
DataFrame.collect
. count_distinct
does not work correctly when counting.- Null values in integer columns raise
TypeError
.
Improvements
- Added telemetry to local testing.
- Improved the error message of
DataFrameReader
to raiseFileNotFound
error when reading a path that does not exist or when there are no files under the path.
Release
1.13.0 (2024-02-26)
New Features
- Added support for an optional
date_part
argument in functionlast_day
SessionBuilder.app_name
will set the query_tag after the session is created.- Added support for the following local testing functions:
- current_timestamp
- current_date
- current_time
- strip_null_value
- upper
- lower
- length
- initcap
Improvements
- Added cleanup logic at interpreter shutdown to close all active sessions.
Bug Fixes
- Fixed a bug in
DataFrame.to_local_iterator
where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level. For details, please see #945. - Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.
- Fixed a bug that
Session.range
returns empty result when the range is large.
Release
1.12.1 (2024-02-08)
Improvements
- Use
split_blocks=True
by default duringto_pandas
conversion, for optimal memory allocation. This parameter is passed topyarrow.Table.to_pandas
, which enablesPyArrow
to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.
Bug Fixes
- Fixed a bug in
DataFrame.to_pandas
that caused an error when evaluating on a Dataframe with anIntergerType
column with null values.
v1.12.0
1.12.0 (2024-01-30)
New Features
- Exposed
statement_params
inStoredProcedure.__call__
. - Added two optional arguments to
Session.add_import
.chunk_size
: The number of bytes to hash per chunk of the uploaded files.whole_file_hash
: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.
- Added parameters
external_access_integrations
andsecrets
when creating a UDAF from Snowpark Python to allow integration with external access. - Added a new method
Session.append_query_tag
. Allows an additional tag to be added to the current query tag by appending it as a comma separated value. - Added a new method
Session.update_query_tag
. Allows updates to a JSON encoded dictionary query tag. SessionBuilder.getOrCreate
will now attempt to replace the singleton it returns when token expiration has been detected.- Added support for new functions in
snowflake.snowpark.functions
:array_except
create_map
sign
/signum
- Added the following functions to
DataFrame.analytics
:- Added the
moving_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes. - Added the
cummulative_agg
function inDataFrame.analytics
to enable moving aggregations like sums and averages with multiple window sizes.
- Added the
Bug Fixes
-
Fixed a bug in
DataFrame.na.fill
that caused Boolean values to erroneously override integer values. -
Fixed a bug in
Session.create_dataframe
where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:- Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
LongType()
, but will now be correctly maintained as timestamp values and be inferred asTimestampType(TimestampTimeZone.NTZ)
. - Earlier timestamp columns with a timezone would be inferred as
TimestampType(TimestampTimeZone.NTZ)
and loose timezone information but will now be correctly inferred asTimestampType(TimestampTimeZone.LTZ)
and timezone information is retained correctly. - Set session parameter
PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME
to revert back to old behavior. It is recommended that you update your code to align with correct behavior because the parameter will be removed in the future.
- Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
-
Fixed a bug that
DataFrame.to_pandas
gets decimal type when scale is not 0, and creates an object dtype inpandas
. Instead, we cast the value to a float64 type. -
Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
DataFrame.filter()
is called afterDataFrame.sort().limit()
.DataFrame.sort()
orfilter()
is called on a DataFrame that already has a window function or sequence-dependent data generator column.
For instance,df.select("a", seq1().alias("b")).select("a", "b").sort("a")
won't flatten the sort clause anymore.- a window or sequence-dependent data generator column is used after
DataFrame.limit()
. For instance,df.limit(10).select(row_number().over())
won't flatten the limit and select in the generated SQL.
-
Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
df = df.select(col("a").alias("b")) df = copy(df) df.select(col("b").alias("c")) # threw an error. Now it's fixed.
-
Fixed a bug in
Session.create_dataframe
that the non-nullable field in a schema is not respected for boolean type. Note that this fix is only effective when the user has the privilege to create a temp table. -
Fixed a bug in SQL simplifier where non-select statements in
session.sql
dropped a SQL query when used withlimit()
. -
Fixed a bug that raised an exception when session parameter
ERROR_ON_NONDETERMINISTIC_UPDATE
is true.
Behavior Changes (API Compatible)
- When parsing data types during a
to_pandas
operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned asint8
gets returned asint64
. Users can fix this by explicitly specifying precision values for their return column. - Aligned behavior for
Session.call
in case of table stored procedures where runningSession.call
would not trigger stored procedure unless acollect()
operation was performed. StoredProcedureRegistration
will now automatically addsnowflake-snowpark-python
as a package dependency. The added dependency will be on the client's local version of the library and an error is thrown if the server cannot support that version.