Skip to content

Commit

Permalink
Feature/snowflake utils (#1108)
Browse files Browse the repository at this point in the history
* deprecated databricks password/email login

* WIP Docker image generation

* update test

* WIP docs

* add missing base docker file and fast API server

* support for OCR, more install params

* Docs WIP

* refactored tests

* docs nav update

* visual support

* added helper funcs for uploading/downloading files and generic health check for OCR container

* removed debug code and added missing dependency

* Docs updated

* visual container tests added

* docker utils docs updated

* nlp.send_file_to_server attach to nlp namespace

* snowflake utilities for UDF creation
  • Loading branch information
C-K-Loan committed Apr 5, 2024
1 parent aa999f3 commit 5a3c1ff
Show file tree
Hide file tree
Showing 10 changed files with 866 additions and 3 deletions.
4 changes: 4 additions & 0 deletions docs/_data/navigation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,10 @@ jsl:
url: /docs/en/jsl/start-a-sparksession
- title: Settings & Cache Folder
url: /docs/en/jsl/john-snow-labs-home
- title: Utilities for Docker
url: /docs/en/jsl/docker-utils
- title: Utilities for Snowflake
url: /docs/en/jsl/snowflake-utils
- title: Utilities for Databricks
url: /docs/en/jsl/databricks-utils
- title: Utilities for AWS EMR
Expand Down
188 changes: 188 additions & 0 deletions docs/en/jsl/snowflake_utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
---
layout: docs
seotitle: NLP | John Snow Labs
title: Utilities for Snowflake
permalink: /docs/en/jsl/snowflake-utils
key: docs-install
modify_date: "2020-05-26"
header: true
show_nav: true
sidebar:
nav: jsl
---
<div class="main-docs" markdown="1">

You can easily deploy any John Snow Labs models within the Snowpark Container Services Ecosystem via `nlp.deploy_as_snowflake_udf()`


## Setup Snowflake Resources

To create a Role, Database, Warehouse, Schema, Compute Pool and Image Repository for John Snow Labs models you can run
`nlp.snowflake_common_setup` which re-produces the [Common Setup for Snowpark Container Services Tutorials](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/tutorials/common-setup#introduction) automatically
with the same resource-names as in the tutorial.

You must have the [snowflake-connector-python](https://pypi.org/project/snowflake-connector-python/) library installed beforehand installed


```python
from johnsnowlabs import nlp
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = nlp.snowflake_common_setup(
snowflake_user='my_snowflake_user',
snowflake_account='my_snowflake_account',
snowflake_password='my_snowflake_password',
)
```
This will create the following resources:
- role_name=`test_role`
- schema_name=`data_schema`
- repo_name=`tutorial_repository`
- stage_name=`tutorial_stage`
- db_name=`tutorial_db`
- warehouse_name=`tutorial_warehouse`
- compute_pool_name=`tutorial_compute_pool`

You can specify a custom name for any resource by specifying it as a key-word argument.

```python
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = nlp.snowflake_common_setup(
snowflake_user='my_snowflake_user',
snowflake_account='my_snowflake_account',
snowflake_password='my_snowflake_password',
role_name='my_test_role',
schema_name='my_data_schema',
repo_name='my_tutorial_repository',
stage_name='my_tutorial_stage',
db_name='my_tutorial_db',
warehouse_name='my_tutorial_warehouse',
compute_pool_name='tutorial_compute_pool'
)

```

## Deploy Model as Snowflake Container Services UDF

`nlp.deploy_model_as_snowflake_udf()` will build, tag & push a John Snow Labs model server to your
Snowflake image repository and finally create a service & udf from the model and test it.
Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passwed as arguments.
```python
# Either run `nlp.snowflake_common_setup` or manually create&specify these resources
from johnsnowlabs import nlp
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
nlp.deploy_as_snowflake_udf(
nlu_ref='en.de_identify.clinical_pipeline',
snowflake_user='my_snowflake_user',
snowflake_account='my_snowflake_account',
snowflake_password='my_snowflake_password',
license_path='path/to/my/jsl_license.json',
repo_url=repo_url,
role_name=role_name,
database_name=db_name,
warehouse_name=warehouse_name,
schema_name=schema_name,
compute_pool_name=compute_pool_name,
)

```

`nlp.deploy_model_as_snowflake_udf()` will build, tag & push a John Snow Labs model server to your
Snowflake image repository and finally create a service & udf from the model and test it.
Role, Database, Warehouse, Schema, Compute Pool and Image Repository muss be created beforehand and passwed as arguments.
```python
# Either run `nlp.snowflake_common_setup` or manually create&specify these resources
from johnsnowlabs import nlp
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
nlp.deploy_as_snowflake_udf(
nlu_ref='en.de_identify.clinical_pipeline',
snowflake_user='my_snowflake_user',
snowflake_account='my_snowflake_account',
snowflake_password='my_snowflake_password',
license_path='path/to/my/jsl_license.json',
repo_url=repo_url,
role_name=role_name,
database_name=db_name,
warehouse_name=warehouse_name,
schema_name=schema_name,
compute_pool_name=compute_pool_name,
)

```

You can also optionally specify the name of the created service & UDF

```python
# Either run `nlp.snowflake_common_setup` or manually create&specify these resources
from johnsnowlabs import nlp
role_name, db_name, warehouse_name, schema_name, compute_pool_name, repo_url = ...
nlp.deploy_as_snowflake_udf(
nlu_ref='en.de_identify.clinical_pipeline',
snowflake_user='my_snowflake_user',
snowflake_account='my_snowflake_account',
snowflake_password='my_snowflake_password',
license_path='path/to/my/jsl_license.json',
repo_url=repo_url,
role_name=role_name,
database_name=db_name,
warehouse_name=warehouse_name,
schema_name=schema_name,
compute_pool_name=compute_pool_name,
udf_name='my_udf',
service_name='my_service'
)
```

You can now use the `en_de_identify_clinical_pipeline_udf()` function within your Snowflake SQL and Python Worksheets
when using the created role, database, warehouse, schema.


You can run the following commands in Snowflake to get he status of the service and query the UDF
```sql
-- Set context
USE ROLE test_role;
USE DATABASE tutorial_db;
USE WAREHOUSE tutorial_warehouse;
USE SCHEMA data_schema;

-- Describe UDF
DESCRIBE FUNCTION JSL_RESOLVE_MEDICATION(varchar);


-- Get service status of UDF backend
SELECT SYSTEM$GET_SERVICE_STATUS('en_de_identify_clinical_pipeline_service');

-- Describe service
DESCRIBE SERVICE tokenize_servicedelthi123s;

-- Get Logs of container service
CALL SYSTEM$GET_SERVICE_LOGS('en_de_identify_clinical_pipeline_service', '0', 'jsl-container', 1000);

-- Call UDF
SELECT en_de_identify_clinical_pipeline_udf('The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera. The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet.');



```

## Streamlit Example with Snowpark services

Once you created an UDF in Snowflake you can access it within Streamlit Apps.
Make sure to select the same resources to host your Streamlit app as used for hosting the UDF

This is a small example of a simple streamlit app you can now build:
1. Go to the Streamlit Section in `Projects` within you Snowflake account
3. In the bottom left click on your username and then on switch role and select the role we just created. The default value is `test_role`
3. In the side-bar, click on Streamlit and then on the `+ Streamlit App` button. Specify a Database, Schema and Warehouse. The defaults are `TUTORIAL_DB`, `DATA_SCHEMA`, `TUTORIAL_WAREHOUSE`.
Copy and paste the following script into your streamlit app and run it
```python
import streamlit as st
from snowflake.snowpark.context import get_active_session
session = get_active_session()
data = st.text_area("Type Your Text", value='Sample text', height=200)
udf_response = session.sql(f"""SELECT JSL_DEIDENTIFY_CLINICAL('{data}')""",)
st.write(udf_response.collect()[0].as_dict())
```

For a more advanced streamlit example, see [here](todo)



</div>
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: landing
comment: no
title: 'John Snow Labs <span>State of the Art Natural Language Processing in Python</span>'
excerpt: John Snow Labs' NLP & LLM ecosystem include software libraries for state-of-the-art AI at scale, Responsible AI, No-Code AI, and access to over 20,000 models for Healthcare, Legal, Finance, and Visual NLP.
excerpt: John Snow Labs' NLP & LLM ecosystem include software libraries for state-of-the-art AI at scale, Responsible AI, No-Code AI, and access to over 40,000 models for Healthcare, Legal, Finance, and Visual NLP.
seotitle: Spark NLP – State of the Art NLP in Python, Java, and Scala – John Snow Labs.
permalink: /
header: true
Expand Down
2 changes: 1 addition & 1 deletion johnsnowlabs/auto_install/docker/build/base_dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ RUN mkdir /app
RUN mkdir /app/model_cache

# Install Johnsnowlabs libraries
RUN pip install johnsnowlabs==5.3.3 fastapi uvicorn python-multipart nbformat
RUN pip install johnsnowlabs fastapi uvicorn python-multipart nbformat
COPY installer.py /app/installer.py
RUN python3 /app/installer.py

Expand Down
Empty file.
Loading

0 comments on commit 5a3c1ff

Please sign in to comment.