This demo is expecting that you have VSCode installed. If you do not have it installed, you can download it here. With the python extensions downloaded so that you are able to create a minicode environment.
bash install_miniconda.sh
conda create --name cortex_forecast --override-channels -c https://repo.anaconda.com/pkgs/snowflake python=3.11 --y; conda activate cortex_forecast; pip install --upgrade pip setuptools wheel; pip install .
streamlit run streamlit_app.py
Note: in side of streamlit_app.py you can add your snowflake connection parameters on ln 105 of the python file if you want a default or the streamlit app will ask for your credientials when you run the app.
if st.session_state.snowpark_connection is None:
try:
# Use default SIS connection parameters
connection_config = {
'user': os.getenv('SNOWFLAKE_USER', ''),
'password': os.getenv('SNOWFLAKE_PASSWORD', ''),
'account': os.getenv('SNOWFLAKE_ACCOUNT', ''),
'database': 'CORTEX',
'warehouse': 'CORTEX_WH',
'schema': 'DEV',
'role': 'CORTEX_USER_ROLE'
}
Install Snowflake CLI How to Link
Once you have the CLI installed you can run the following command to deploy the streamlit app to Snowflake.
# Set initial snowflake CLI connection
snow connection add
# Set default connection
snow connection set default <NEW_CONNECTION_NAME>
# Test snowflake connection
snow connection test
# Deploy streamlit app to Snowflake
snow streamlit deploy --replace
This will provide you with a URL that you can use to access the streamlit app that is running in Snowflake.
The Streamlit application provides a user-friendly interface to the SnowflakeMLForecast functionality.
- Choose between your own data or a pre-loaded example dataset
- Step-by-step forecast configuration
- Interactive visualization of results
To run the Streamlit application, use the following command in your terminal:
streamlit run streamlit_app.py
This will start the Streamlit server and open the application in your default web browser.
-
Data Selection
-
Choose between your Snowflake data or the example dataset
-
If using your data, select the database, schema, and table/view
-
Specify timestamp, target, and optional series/exogenous columns
-
-
Forecast Configuration
-
Set model name, tags, and comments
-
Configure forecast parameters (training days, forecast horizon)
-
Set advanced options like evaluation settings
-
-
Model Execution
-
Review settings and run the forecast
-
View results including charts and statistics
-
-
Ensure consistent frequency in your time series data
-
Include sufficient historical data to capture seasonal patterns
-
Experiment with different configuration settings
-
For your own data, carefully select relevant exogenous variables
Remember, these code snippets are integrated into the Streamlit application, allowing for an interactive experience. Users can modify parameters, load their own data, and see results in real-time through the web interface.
Note: This is a POC and is not production ready. It’s designed to get you 80-90% of the way there for simple forecasting use cases, serving as a starting point for your forecasting journey.
The project consists of three main components:
- SnowparkConnection Class: A Python class that manages the connection to Snowflake, handling authentication and session management.
- SnowflakeMLForecast Class: A Python class that interfaces with Snowflake to create, manage, and analyze forecast models.
- Streamlit Application: A web-based interface that makes it easy to configure and run forecasts without writing code.
- Dynamic forecast model creation using Snowflake’s
CREATE SNOWFLAKE.ML.FORECAST
functionality - Visualization capabilities for forecast results
- Tag management in Snowflake
- Configurable via YAML files
- Robust error handling
-
Clone the repository
-
Install dependencies:
pip install snowflake-connector-python pandas altair streamlit
from snowflake_ml_forecast import SnowflakeMLForecast
# Setup connection config
connection_config = {
'user': 'your_user',
'password': 'your_password',
'account': 'your_account',
'database': 'your_database',
'warehouse': 'your_warehouse',
'schema': 'your_schema',
'role': 'your_role'
}
# Create forecast model
forecast_model = SnowflakeMLForecast(
config_file='path/to/your/config.yaml',
connection_config=connection_config
)
# Run forecast and visualize
forecast_data = forecast_model.create_and_run_forecast()
forecast_model.generate_forecast_and_visualization()
# Clean up
forecast_model.cleanup()
This guide provides a detailed explanation of all configuration options available for the Snowflake ML Forecast tool. The configuration is typically defined in a YAML file.
The configuration is divided into four main sections:
model
: Defines the model’s metadatainput_data
: Specifies the input data source and structureforecast_config
: Sets up the forecasting parametersoutput
: Defines where the forecast results will be stored
Below is a detailed breakdown of each section:
model:
name: my_taxi_forecast_model
tags:
environment: production
team: data_science
comment: "Forecast model for predicting sales trends."
name
(required): A unique identifier for your model.tags
(optional): Key-value pairs for categorizing and organizing your models.environment
: Typically set to ‘development’, ‘production’, or ‘testing’.team
: The team responsible for this model.
comment
(optional): A description of the model’s purpose or any relevant notes.
input_data:
table: ny_taxi_rides_h3_train
table_type: table
timestamp_column: pickup_time
target_column: pickups
series_column: h3
exogenous_columns:
- PUBLIC_HOLIDAY
- SPORT_EVENT
table
(required): The name of the table or view containing your input data.table_type
(optional): Specifies whether the input is a ‘table’ or a ‘view’. Default is ‘table’.timestamp_column
(required): The name of the column containing timestamp data.target_column
(required): The name of the column containing the values you want to forecast.series_column
(optional): For multi-series forecasting, the column that identifies different series.exogenous_columns
(optional): A list of additional columns to be used as features in the forecast model.
forecast_config:
training_days: 30
table: ny_taxi_rides_h3_predict
config_object:
on_error: skip
evaluate: true
evaluation_config:
n_splits: 2
gap: 0
prediction_interval: 0.95
training_days
(optional): The number of days of historical data to use for training. If not specified, all available data will be used.table
(optional): If specified, the model will create predictions for the data in this table instead of forecasting future dates.config_object
: Advanced configuration options for the forecasting process.on_error
(optional): Determines behavior when an error occurs. Options are ‘skip’ or ‘fail’. Default is ‘skip’.evaluate
(optional): Whether to perform model evaluation. Default is true.evaluation_config
: Settings for the evaluation process.n_splits
(optional): Number of splits for cross-validation. Default is 2.test_size
(optional): Size of the test set in days. If not specified, it will be automatically determined.gap
(optional): Number of days between the training and test sets. Default is 0.prediction_interval
(optional): Confidence level for prediction intervals. Default is 0.95 (95% confidence).
output:
table: taxi_forecast_results
table
(required): The name of the table where forecast results will be stored.
- Ensure all required fields are filled out correctly.
- For multi-series forecasting, make sure to specify the
series_column
. - Choose
exogenous_columns
carefully - they should be variables that you expect to have a significant impact on your target variable. - When using
training_days
, ensure you have enough historical data to cover the specified period. - If you specify a
table
inforecast_config
, make sure it contains future dates or scenarios you want to predict. - Adjust
evaluation_config
parameters based on your data characteristics and forecast horizon.
The class uses a YAML file for configuration. Here’s a simple example sample:
model:
name: my_forecast_model
tags:
environment: production
team: data_science
comment: "Forecast model for predicting trends."
input_data:
table: storage_usage_train
timestamp_column: usage_date
target_column: storage_gb
forecast_config:
training_days: 180
forecast_days: 30
config_object:
evaluate: true
evaluation_config:
prediction_interval: 0.95
output:
table: storage_forecast_results
While this project showcases a powerful Streamlit application for interactive forecasting, it’s important to note that the core functionality is highly versatile and can be easily integrated into various workflows:
-
Python Scripts and Notebooks: The SnowflakeMLForecast class can be imported and used in your own Python scripts or Jupyter notebooks, allowing for seamless integration with your existing data processing pipelines.
-
Automated Reporting: Leverage the forecasting capabilities in automated reporting systems, scheduling regular forecast updates without manual intervention.
-
Custom Applications: Build your own applications around the SnowflakeMLForecast class, tailoring the user interface and functionality to your specific needs.
-
Baseline for Advanced Models: Use this project as a starting point for more complex forecasting models, adding your own custom features or integrating with other machine learning libraries.
Remember, while the Streamlit application provides a great out-of-the-box experience, the underlying SnowflakeMLForecast class is designed to be flexible and extensible. We encourage you to explore beyond the Streamlit interface and adapt the code to fit your unique forecasting requirements. Whether you’re using the ready-made Streamlit application or integrating the core functionality into your own systems, this tool provides a solid foundation for leveraging Snowflake’s ML capabilities for time series forecasting
Here’s a simplified overview of the project structure:
cortex_forecast/
├── cortex_forecast/
│ ├── __init__.py
│ ├── connection.py
│ ├── forecast.py
│ └── files/
│ └── yaml/
│ ├── storage_forecast_config.yaml
│ └── taxi_forecast_config.yaml
├── docs/
│ ├── index.ipynb
│ └── forecast_storage_example.ipynb
├── nbs/
│ ├── 00_connection.ipynb
│ └── 01_cortex_forecast.ipynb
├── pages/
│ ├── 00_table_selection_creation_page.py
│ ├── 01_create_forecast_config.py
│ └── 02_modeling.py
├── streamlit_app.py
├── requirements.txt
└── setup.py
Key components:
-
cortex_forecast/
: Core Python package with the main functionality. -
docs/
: Documentation notebooks and examples. -
nbs/
: Development notebooks (using nbdev). -
pages/
: Streamlit application pages. -
streamlit_app.py
: Main Streamlit application entry point.
This structure separates the core functionality (cortex_forecast/
)
from the Streamlit application (pages/
and streamlit_app.py
), while
keeping documentation (docs/
) and development notebooks (nbs/
)
organized.
Whether you’re using the SnowflakeMLForecast class in your Python scripts or leveraging the Streamlit application for quick forecasts, this tool provides a flexible and powerful way to generate forecasts using Snowflake’s ML capabilities. Start with the example data to get familiar with the process, then apply it to your own datasets for valuable insights.
Happy forecasting!
See in Docs/ folder for two example of this in action. One is for storage and the other is for Taxi Pick up in NYC.
from snowflake.snowpark.version import VERSION
from cortex_forecast.forecast import SnowflakeMLForecast
import os
Note: Make sure that you create a yaml file that you would like to so that the SnowflakeMLForecast can read the connection information from it and be able to build your forecast.
forecast_model = SnowflakeMLForecast(
config='./cortex_forecast/files/yaml/storage_forecast_config.yaml',
connection_config={
'user': os.getenv('SNOWFLAKE_USER'),
'password': os.getenv('SNOWFLAKE_PASSWORD'),
'account': os.getenv('SNOWFLAKE_ACCOUNT'),
'database': 'CORTEX',
'warehouse': 'CORTEX_WH',
'schema': 'DEV',
'role': 'CORTEX_USER_ROLE' # Use the desired role
},
is_streamlit=False
)
snowflake_environment = forecast_model.session.sql('SELECT current_user(), current_version()').collect()
snowpark_version = VERSION
print('\nConnection Established with the following parameters:')
print('Snowflake version : {}'.format(snowflake_environment[0][1]))
print('Snowpark for Python version : {}.{}.{}'.format(snowpark_version[0], snowpark_version[1], snowpark_version[2]))
# Create Training Data
training_days = 365
forecast_model.session.sql(f'''CREATE OR REPLACE TABLE storage_usage_train AS
SELECT
TO_TIMESTAMP_NTZ(usage_date) AS usage_date,
storage_bytes / POWER(1024, 3) AS storage_gb
FROM
(
SELECT *
FROM snowflake.account_usage.storage_usage
WHERE usage_date < CURRENT_DATE()
)
WHERE TO_TIMESTAMP_NTZ(usage_date) > DATEADD(day, -{training_days}, CURRENT_DATE())
''').collect()
forecast_model.session.sql('SELECT * FROM storage_usage_train ORDER BY usage_date DESC LIMIT 10').show()
import matplotlib.pyplot as plt
df = forecast_model.session.sql('SELECT * FROM storage_usage_train ORDER BY usage_date').to_pandas()
df.head()
df = df.set_index('USAGE_DATE')
df['STORAGE_GB'].plot(figsize=(10, 6), title='Storage GB Over Time')
# Show the plot
plt.xlabel('Date')
plt.ylabel('Storage GB')
plt.grid(True)
plt.show()
This will use what is inside of the yaml file that you created that you passed over to the SnowflakeMLForecast object
# Run Forecast
forecast_data = forecast_model.create_and_run_forecast()
forecast_data.head()
forecast_model.generate_forecast_and_visualization(show_historical=True)