Releases: snowflakedb/snowpark-python
Releases · snowflakedb/snowpark-python
1.0.0
1.0.0 (2022-11-01)
New Features
- Added
Session.generator()
to create a newDataFrame
using the Generator table function. - Added a parameter
secure
to the functions that create a secure UDF or UDTF.
v0.12.0
0.12.0 (2022-10-14)
New Features
- Added new APIs for async job:
Session.create_async_job()
to create anAsyncJob
instance from a query id.AsyncJob.result()
now accepts argumentresult_type
to return the results in different formats.AsyncJob.to_df()
returns aDataFrame
built from the result of this asynchronous job.AsyncJob.query()
returns the SQL text of the executed query.
DataFrame.agg()
andRelationalGroupedDataFrame.agg()
now accept variable-length arguments.- Added parameters
lsuffix
andrsuffix
toDataFram.join()
andDataFrame.cross_join()
to conveniently rename overlapping columns. - Added
Table.drop_table()
so you can drop the temp table afterDataFrame.cache_result()
.Table
is also a context manager so you can use thewith
statement to drop the cache temp table after use. - Added
Session.use_secondary_roles()
. - Added functions
first_value()
andlast_value()
. (contributed by @chasleslr) - Added
on
as an alias forusing_columns
andhow
as an alias forjoin_type
inDataFrame.join()
.
Bug Fixes
- Fixed a bug in
Session.create_dataframe()
that raised an error whenschema
names had special characters. - Fixed a bug in which options set in
Session.read.option()
were not passed toDataFrame.copy_into_table()
as default values. - Fixed a bug in which
DataFrame.copy_into_table()
raises an error when a copy option has single quotes in the value.
v0.11.0
0.11.0 (2022-09-28)
Behavior Changes:
Session.add_packages()
now raisesValueError
when the version of a package cannot be found in Snowflake Anaconda channel. Previously,Session.add_packages()
succeeded, and aSnowparkSQLException
exception was raised later in the UDF/SP registration step.
New Features:
- Added method
FileOperation.get_stream()
to support downloading stage files as stream. - Added support in
functions.ntiles()
to accept int argument. - Added the following aliases:
functions.call_function()
forfunctions.call_builtin()
.functions.function()
forfunctions.builtin()
.DataFrame.order_by()
forDataFrame.sort()
DataFrame.orderBy()
forDataFrame.sort()
- Improved
DataFrame.cache_result()
to return a more accurateTable
class instead of aDataFrame
class. - Added support to allow
session
as the first argument when callingStoredProcedure
.
Improvements:
- Improved nested query generation by flattening queries when applicable.
- This improvement could be enabled by setting
Session.sql_simplifier_enabled = True
. DataFrame.select()
,DataFrame.with_column()
,DataFrame.drop()
and other select-related APIs have more flattened SQLs.DataFrame.union()
,DataFrame.union_all()
,DataFrame.except_()
,DataFrame.intersect()
,DataFrame.union_by_name()
have flattened SQLs generated when multiple set operators are chained.
- This improvement could be enabled by setting
- Improved type annotations for async job APIs.
Bug Fixes:
- Fixed a bug in which
Table.update()
,Table.delete()
,Table.merge()
try to reference a temp table that does not exist.
v0.10.0
0.10.0 (2022-09-16)
New Features:
- Added experimental APIs for evaluating Snowpark dataframes with asynchronous queries:
- Added keyword argument
block
to the following action APIs on Snowpark dataframes (which execute queries) to allow asynchronous evaluations:DataFrame.collect()
,DataFrame.to_local_iterator()
,DataFrame.to_pandas()
,DataFrame.to_pandas_batches()
,DataFrame.count()
,DataFrame.first()
.DataFrameWriter.save_as_table()
,DataFrameWriter.copy_into_location()
.Table.delete()
,Table.update()
,Table.merge()
.
- Added method
DataFrame.collect_nowait()
to allow asynchronous evaluations. - Added class
AsyncJob
to retrieve results from asynchronously executed queries and check their status.
- Added keyword argument
- Added support for
table_type
inSession.write_pandas()
. You can now choose from thesetable_type
options:"temporary"
,"temp"
, and"transient"
. - Added support for using Python structured data (
list
,tuple
anddict
) as literal values in Snowpark. - Added keyword argument
execute_as
tofunctions.sproc()
andsession.sproc.register()
to allow registering a stored procedure as a caller or owner. - Added support for specifying a pre-configured file format when reading files from a stage in Snowflake.
Improvements:
- Added support for displaying details of a Snowpark session.
Bug Fixes:
- Fixed a bug in which
DataFrame.copy_into_table()
andDataFrameWriter.save_as_table()
mistakenly created a new table if the table name is fully qualified, and the table already exists.
Deprecations:
- Deprecated keyword argument
create_temp_table
inSession.write_pandas()
. - Deprecated invoking UDFs using arguments wrapped in a Python list or tuple. You can use variable-length arguments without a list or tuple.
Dependency updates
- Updated
snowflake-connector-python
to 2.7.12.
v0.9.0
0.9.0 (2022-08-30)
New Features:
- Added support for displaying source code as comments in the generated scripts when registering UDFs.
This feature is turned on by default. To turn it off, pass the new keyword argumentsource_code_display
asFalse
when callingregister()
or@udf()
. - Added support for calling table functions from
DataFrame.select()
,DataFrame.with_column()
andDataFrame.with_columns()
which now take parameters of typetable_function.TableFunctionCall
for columns. - Added keyword argument
overwrite
tosession.write_pandas()
to allow overwriting contents of a Snowflake table with that of a Pandas DataFrame. - Added keyword argument
column_order
todf.write.save_as_table()
to specify the matching rules when inserting data into table in append mode. - Added method
FileOperation.put_stream()
to upload local files to a stage via file stream. - Added methods
TableFunctionCall.alias()
andTableFunctionCall.as_()
to allow aliasing the names of columns that come from the output of table function joins. - Added function
get_active_session()
in modulesnowflake.snowpark.context
to get the current active Snowpark session.
Bug Fixes:
- Fixed a bug in which batch insert should not raise an error when
statement_params
is not passed to the function. - Fixed a bug in which column names should be quoted when
session.create_dataframe()
is called with dicts and a given schema. - Fixed a bug in which creation of table should be skipped if the table already exists and is in append mode when calling
df.write.save_as_table()
. - Fixed a bug in which third-party packages with underscores cannot be added when registering UDFs.
Improvements:
- Improved function
function.uniform()
to infer the types of inputsmax_
andmin_
and cast the limits toIntegerType
orFloatType
correspondingly.
v0.8.0
0.8.0 (2022-07-22)
New Features:
- Added keyword only argument
statement_params
to the following methods to allow for specifying statement level parameters:collect
,to_local_iterator
,to_pandas
,to_pandas_batches
,
count
,copy_into_table
,show
,create_or_replace_view
,create_or_replace_temp_view
,first
,cache_result
andrandom_split
on classsnowflake.snowpark.Dateframe
.update
,delete
andmerge
on classsnowflake.snowpark.Table
.save_as_table
andcopy_into_location
on classsnowflake.snowpark.DataFrameWriter
.approx_quantile
,statement_params
,cov
andcrosstab
on classsnowflake.snowpark.DataFrameStatFunctions
.register
andregister_from_file
on classsnowflake.snowpark.udf.UDFRegistration
.register
andregister_from_file
on classsnowflake.snowpark.udtf.UDTFRegistration
.register
andregister_from_file
on classsnowflake.snowpark.stored_procedure.StoredProcedureRegistration
.udf
,udtf
andsproc
insnowflake.snowpark.functions
.
- Added support for
Column
as an input argument tosession.call()
. - Added support for
table_type
indf.write.save_as_table()
. You can now choose from thesetable_type
options:"temporary"
,"temp"
, and"transient"
.
Improvements:
- Added validation of object name in
session.use_*
methods. - Updated the query tag in SQL to escape it when it has special characters.
- Added a check to see if Anaconda terms are acknowledged when adding missing packages.
Bug Fixes:
- Fixed the limited length of the string column in
session.create_dataframe()
. - Fixed a bug in which
session.create_dataframe()
mistakenly converted 0 andFalse
toNone
when the input data was only a list. - Fixed a bug in which calling
session.create_dataframe()
using a large local dataset sometimes created a temp table twice. - Aligned the definition of
function.trim()
with the SQL function definition. - Fixed an issue where snowpark-python would hang when using the Python system-defined (built-in function)
sum
vs. the Snowparkfunction.sum()
.
v0.7.0
0.7.0
New Features:
- Added support for user-defined table functions (UDTFs).
- Use function
snowflake.snowpark.functions.udtf()
to register a UDTF, or use it as a decorator to register the UDTF.- You can also use
Session.udtf.register()
to register a UDTF.
- You can also use
- Use
Session.udtf.register_from_file()
to register a UDTF from a Python file.
- Use function
- Updated APIs to query a table function, including both Snowflake built-in table functions and UDTFs.
- Use function
snowflake.snowpark.functions.table_function()
to create a callable representing a table function and use it to call the table function in a query. - Alternatively, use function
snowflake.snowpark.functions.call_table_function()
to call a table function. - Added support for
over
clause that specifiespartition by
andorder by
when lateral joining a table function. - Updated
Session.table_function()
andDataFrame.join_table_function()
to acceptTableFunctionCall
instances.
- Use function
Breaking Changes:
- When creating a function with
functions.udf()
andfunctions.sproc()
, you can now specify an empty list for theimports
orpackages
argument to indicate that no import or package is used for this UDF or stored procedure. Previously, specifying an empty list meant that the function would use session-level imports or packages. - Improved the
__repr__
implementation of data types intypes.py
. The unusedtype_name
property has been removed. - Added a Snowpark-specific exception class for SQL errors. This replaces the previous
ProgrammingError
from the Python connector.
Improvements:
- Added a lock to a UDF or UDTF when it is called for the first time per thread.
- Improved the error message for pickling errors that occurred during UDF creation.
- Included the query ID when logging the failed query.
Bug Fixes:
- Fixed a bug in which non-integral data (such as timestamps) was occasionally converted to integer when calling
DataFrame.to_pandas()
. - Fixed a bug in which
DataFrameReader.parquet()
failed to read a parquet file when its column contained spaces. - Fixed a bug in which
DataFrame.copy_into_table()
failed when the dataframe is created by reading a file with inferred schemas.
Deprecations
Session.flatten()
and DataFrame.flatten()
.
Dependency Updates:
- Restricted the version of
cloudpickle
<=2.0.0
.
v0.6.0
0.6.0
New Features:
- Added support for vectorized UDFs with the input as a Pandas DataFrame or Pandas Series and the output as a Pandas Series. This improves the performance of UDFs in Snowpark.
- Added support for inferring the schema of a DataFrame by default when it is created by reading a Parquet, Avro, or ORC file in the stage.
- Added functions
current_session()
,current_statement()
,current_user()
,current_version()
,current_warehouse()
,date_from_parts()
,date_trunc()
,dayname()
,dayofmonth()
,dayofweek()
,dayofyear()
,grouping()
,grouping_id()
,hour()
,last_day()
,minute()
,next_day()
,previous_day()
,second()
,month()
,monthname()
,quarter()
,year()
,current_database()
,current_role()
,current_schema()
,current_schemas()
,current_region()
,current_avaliable_roles()
,add_months()
,any_value()
,bitnot()
,bitshiftleft()
,bitshiftright()
,convert_timezone()
,uniform()
,strtok_to_array()
,sysdate()
,time_from_parts()
,timestamp_from_parts()
,timestamp_ltz_from_parts()
,timestamp_ntz_from_parts()
,timestamp_tz_from_parts()
,weekofyear()
,percentile_cont()
tosnowflake.snowflake.functions
.
Breaking Changes:
- Expired deprecations:
- Removed the following APIs that were deprecated in 0.4.0:
DataFrame.groupByGroupingSets()
,DataFrame.naturalJoin()
,DataFrame.joinTableFunction
,DataFrame.withColumns()
,Session.getImports()
,Session.addImport()
,Session.removeImport()
,Session.clearImports()
,Session.getSessionStage()
,Session.getDefaultDatabase()
,Session.getDefaultSchema()
,Session.getCurrentDatabase()
,Session.getCurrentSchema()
,Session.getFullyQualifiedCurrentSchema()
.
- Removed the following APIs that were deprecated in 0.4.0:
Improvements:
- Added support for creating an empty
DataFrame
with a specific schema using theSession.create_dataframe()
method. - Changed the logging level from
INFO
toDEBUG
for several logs (e.g., the executed query) when evaluating a dataframe. - Improved the error message when failing to create a UDF due to pickle errors.
Bug Fixes:
- Removed pandas hard dependencies in the
Session.create_dataframe()
method.
Dependency Updates:
- Added
typing-extension
as a new dependency with the version >=4.1.0
.
v0.5.0
New Features
- Added stored procedures API.
- Added
Session.sproc
property andsproc()
tosnowflake.snowpark.functions
, so you can register stored procedures. - Added
Session.call
to call stored procedures by name.
- Added
- Added
UDFRegistration.register_from_file()
to allow registering UDFs from Python source files or zip files directly. - Added
UDFRegistration.describe()
to describe a UDF. - Added
DataFrame.random_split()
to provide a way to randomly split a dataframe. - Added functions
md5()
,sha1()
,sha2()
,ascii()
,initcap()
,length()
,lower()
,lpad()
,ltrim()
,rpad()
,rtrim()
,repeat()
,soundex()
,regexp_count()
,replace()
,charindex()
,collate()
,collation()
,insert()
,left()
,right()
,endswith()
tosnowflake.snowpark.functions
. - Allowed
call_udf()
to accept literal values. - Provided a
distinct
keyword inarray_agg()
.
Bug Fixes:
- Fixed an issue that caused
DataFrame.to_pandas()
to have a string column ifColumn.cast(IntegerType())
was used. - Fixed a bug in
DataFrame.describe()
when there is more than one string column.
v0.4.1
0.4.1 (2022-02-25)
Bug Fixes
- Fixed a bug in
DataFrame.describe()
that raised an error when theDataFrame
has more than one string columns.