-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into ompp-model-docs
- Loading branch information
Showing
41 changed files
with
558 additions
and
395 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -165,41 +165,45 @@ resource "kubernetes_secret" "aaw-<acronym>-prod-sp-secret" { | |
|
||
#### c. Add bucket info: | ||
|
||
|
||
Add the following to `resource "kubectl_manifest" "fdi-aaw-configuration-data"`, in one of: | ||
Add the following to `resource "kubectl_manifest" "fdi-aaw-configuration-data"`, in one of the following, depending on the classification of the bucket: | ||
|
||
1. `fdi-protected-b-external.json: |` or | ||
2. `fdi-unclassified-external.json: |` or | ||
3. `fdi-protected-b-internal.json: |` or | ||
4. `fdi-unclassified-internal.json: |` | ||
|
||
depending on the classification of the bucket. | ||
|
||
``` | ||
{ | ||
"bucketName": "<should-be-provided-for-you>", | ||
"pvName": "<acronym>-eprotb", | ||
"subfolder": "", | ||
"readers": ["<name-of-kuebeflow-profile>"], | ||
"writers": ["<name-of-kuebeflow-profile>"], | ||
"spn": "aaw-<acronym>-prod-sp" | ||
}, | ||
{ | ||
"bucketName": "<should-be-provided-for-you>-transit", | ||
"pvName": "<acronym>-inbox-eprotb", | ||
"subfolder": "from-de", | ||
"readers": ["<name-of-kuebeflow-profile>"], | ||
"writers": ["<name-of-kuebeflow-profile>"], | ||
"spn": "aaw-<acronym>-prod-sp" | ||
}, | ||
{ | ||
"bucketName": "<should-be-provided-for-you>-transit", | ||
"pvName": "<acronym>-outbox-eprotb", | ||
"subfolder": "to-vers", | ||
"readers": ["<name-of-kuebeflow-profile>"], | ||
"writers": ["<name-of-kuebeflow-profile>"], | ||
"spn": "aaw-<acronym>-prod-sp" | ||
} | ||
{ | ||
"bucketName": "<should-be-provided-for-you>", | ||
"pvName": "<acronym>-eprotb", | ||
"subfolder": "", | ||
"readers": ["<name-of-kuebeflow-profile>"], | ||
"writers": ["<name-of-kuebeflow-profile>"], | ||
"spn": "aaw-<acronym>-prod-sp" | ||
} | ||
``` | ||
|
||
##### Transit Containers | ||
|
||
If the storage solution requires transit containers, you'll want to add this as well. Not all solutions require this. | ||
|
||
``` | ||
{ | ||
"bucketName": "<should-be-provided-for-you>-transit", | ||
"pvName": "<acronym>-inbox-eprotb", | ||
"subfolder": "from-de", | ||
"readers": ["<name-of-kuebeflow-profile>"], | ||
"writers": ["<name-of-kuebeflow-profile>"], | ||
"spn": "aaw-<acronym>-prod-sp" | ||
}, | ||
{ | ||
"bucketName": "<should-be-provided-for-you>-transit", | ||
"pvName": "<acronym>-outbox-eprotb", | ||
"subfolder": "to-vers", | ||
"readers": ["<name-of-kuebeflow-profile>"], | ||
"writers": ["<name-of-kuebeflow-profile>"], | ||
"spn": "aaw-<acronym>-prod-sp" | ||
} | ||
``` | ||
|
||
##### Info | ||
|
@@ -214,19 +218,22 @@ depending on the classification of the bucket. | |
> | ||
> `writers:` use the kubeflow profile name for this | ||
> | ||
> `spn:` this has to be created by YOU. Send a JIRA ticket to the Cloud Team. | ||
> `spn:` this has to be obtained by you by sending a Jira ticket to the Cloud Team. See below for an example SPN request. | ||
> | ||
##### Example Cloud Ticket | ||
|
||
To obtain the SPN, send a Jira ticket to the Cloud Team, follow the template below: | ||
|
||
> Hi, | ||
> | ||
> Can I get a service principle named aaw-\<acronym\>-prod-sp created please? | ||
> | ||
> The owners should be: | ||
> | ||
> [email protected] | ||
> [email protected] | ||
> - [email protected] | ||
> - [email protected] | ||
> | ||
> More info: https://jirab.statcan.ca/browse/?????-???? | ||
> | ||
> Thanks! | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,104 +1,102 @@ | ||
# Overview | ||
# Azure Blob Storage (Containers) | ||
|
||
[Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data. | ||
|
||
Azure Blob Storage Containers are good at three things: | ||
|
||
- Large amounts of data - Containers can be huge: way bigger than hard drives. And they are still fast. | ||
- Accessible by multiple consumers at once - You can access the same data source from multiple Notebook Servers and pipelines at the same time without needing to duplicate the data. | ||
- Sharing - Project namespaces can share a container. This is great for sharing data with people outside of your workspace. | ||
|
||
# Setup | ||
Azure Blob Storage Containers have the following advantages over Kubeflow Volumes (Disks): | ||
|
||
1. **Capacity:** Containers can be huge: way bigger than hard drives. And they are still fast. | ||
2. **Simultaneity:** You can access the same data source from multiple Notebook Servers and pipelines at the same time without needing to duplicate the data. | ||
3. **Shareability:** Project namespaces can share a container. This is great for sharing data with people outside of your workspace. | ||
|
||
<!-- prettier-ignore --> | ||
!!! warning "Azure Blob Storage containers and buckets mount will be replacing the Minio Buckets and Minio storage mounts" | ||
Users will be responsible for migrating data from Minio Buckets to the Azure Storage folders. For larger files, users may contact AAW for assistance. | ||
!!! warning "Azure Blob Storage containers and buckets have replaced MinIO storage and buckets." | ||
Users will be responsible for migrating data from MinIO Buckets to the Azure Storage folders. [Click here for instructions on how to migrate!](#how-to-migrate-from-minio-to-azure-blob-storage). For larger files, users may [contact AAW for assistance](https://statcan-aaw.slack.com). | ||
|
||
## Blob Container Mounted on a Notebook Server | ||
## Setup | ||
|
||
<!-- prettier-ignore --> | ||
### Accessing Blob Container from JupyterLab | ||
|
||
The Blob CSI volumes are persisted under `/home/jovyan/buckets` when creating a Notebook Server. Files under `~/buckets` are backed by Blob storage. All AAW notebooks will have the `~/buckets` mounted to the file-system, making data accessible from everywhere. | ||
The Blob CSI volumes are persisted under `~/buckets` when creating a Notebook Server. Files under `~/buckets` are backed by Blob storage. All AAW notebooks will have the `~/buckets` mounted to the file-system, making data accessible from everywhere. | ||
|
||
![Blob folders mounted as Jupyter Notebook directories](../images/container-mount.png) | ||
These folders can be used like any other - you can copy files to/from using the file browser, write from Python/R, etc. The only difference is that the data is being stored in the Blob storage container rather than on a local disk (and is thus accessible wherever you can access your Kubeflow notebook). | ||
|
||
# Unclassified Notebook AAW folder mount | ||
![Unclassified notebook folders mounted in Jupyter Notebook directories](../images/unclassified-mount.png) | ||
![Blob folders mounted as directories](../images/container-mount.png) | ||
|
||
# Protected-b Notebook AAW folder mount | ||
![Protected-b notebooks mounted as Jupyter Notebook directories](../images/protectedb-mount.png) | ||
#### Unclassified Containers | ||
|
||
These folders can be used like any other - you can copy files to/from using the file browser, write from Python/R, etc. The only difference is that the data is being stored in the Blob storage container rather than on a local disk (and is thus accessible wherever you can access your Kubeflow notebook). | ||
Unclassified blob storage containers will appear as follows in the `~/buckets` folder. | ||
|
||
## How to Migrate from MinIO to Azure Blob Storage | ||
![Unclassified notebook folders mounted as directories in JupyterLab](../images/unclassified-mount.png) | ||
|
||
First, import the environmental variables stored in your secrets vault. You will either import from `minio-gateway` or `fdi-gateway` depending on where your data was ingested. | ||
#### Protected B Containers | ||
|
||
``` | ||
jovyan@rstudio-0:~$ source /vault/secrets/fdi-gateway-protected-b | ||
``` | ||
Protected B blob storage containers will appear as follows in the `~/buckets` folder. | ||
|
||
Then you create an alias to access your data. | ||
![Protected B notebooks mounted as directories in JupyterLab](../images/protectedb-mount.png) | ||
|
||
``` | ||
jovyan@rstudio-0:~$ mc alias set minio $MINIO_URL $MINIO_ACCESS_KEY $MINIO_SECRET_KEY | ||
``` | ||
### Container Types | ||
|
||
List the contents of your data folder with `mc ls`. | ||
The following Blob containers are available. Accessing all Blob containers is the same. The difference between containers is the storage type behind them: | ||
|
||
``` | ||
jovyan@rstudio-0:~$ mc ls minio | ||
``` | ||
- **aaw-unclassified:** By default, use this one to store unclassified data. | ||
- **aaw-protected-b:** Use this one to store sensitive, Protected B data. | ||
- **aaw-unclassified-ro:** This classification is Protected B but read-only access. This is so users can view unclassified data within a Protected B notebook. | ||
|
||
Finally, copy your MinIO data into your Azure Blob Storage directory with `mc cp --recursive`. | ||
### Accessing Internal Data | ||
|
||
``` | ||
jovyan@rstudio-0:~$ mc cp —-recursive minio ~/buckets/aaw-unclassified | ||
``` | ||
Accessing internal data uses the DAS common storage connection which has use for internal and external users that require access to unclassified or Protected B data. The following containers can be provisioned: | ||
|
||
If you have protected-b data, you can copy your data into the protected-b bucket. | ||
- **external-unclassified:** Unclassified and accessible by both StatCan and non-Statcan employees. | ||
- **external-protected-b:** Protected B and accessible by both StatCan and non-StatCan employees. | ||
- **internal-unclassified:** Unclassified and accessible by Statcan employees, only. | ||
- **internal-protected-b:** Protected B and accessible by StatCan employees, only. | ||
|
||
``` | ||
jovyan@rstudio-0:~$ mc cp —-recursive minio ~/buckets/aaw-protected-b | ||
``` | ||
The above containers follow the same convention as the AAW containers in terms of data, however there is a layer of isolation between StatCan employees and non-StatCan employees. Non-Statcan employees are only allowed in **external** containers, while StatCan employees can have access to any container. | ||
|
||
AAW has an integration with the FAIR Data Infrastructure team that allows users to transfer unclassified and Protected B data to Azure Storage Accounts, thus allowing users to access this data from Notebook Servers. | ||
|
||
<!-- prettier-ignore --> | ||
Please reach out to the FAIR Data Infrastructure team if you have a use case for this data. | ||
|
||
## Container Types | ||
## Pricing | ||
|
||
The following Blob containers are available: | ||
<!-- prettier-ignore --> | ||
!!! info "Pricing models are based on CPU and Memory usage" | ||
Pricing is covered by KubeCost for user namespaces (In Kubeflow at the bottom of the Notebooks tab). | ||
|
||
Accessing all Blob containers is the same. The difference between containers is the storage type behind them: | ||
In general, Blob Storage is much cheaper than [Azure Manage Disks](https://azure.microsoft.com/en-us/pricing/details/managed-disks/) and has better I/O than managed SSD. | ||
|
||
- **aaw-unclassified:** By default, use this one. Stores unclassified data. | ||
## The Azure Storage Explorer | ||
|
||
- **aaw-protected-b:** Stores sensitive protected-b data. | ||
Our friends over at the Collaborative Analytics Environment (CAE) have some documentation on accessing your Azure Blob Storage from your AVD using the [Azure Storage Explorer](https://statcan.github.io/cae-eac/en/AzureStorageExplorer/). | ||
|
||
- **aaw-unclassified-ro:** This classification is protected-b but read-only access. This is so users can view unclassified data within a protected-b notebook. | ||
## How to Migrate from MinIO to Azure Blob Storage | ||
|
||
<!-- prettier-ignore --> | ||
First, `source` the environmental variables stored in your secrets vault. You will either `source` from **minio-gateway** or **fdi-gateway** depending on where your data was ingested: | ||
|
||
## Accessing Internal Data | ||
``` | ||
source /vault/secrets/fdi-gateway-protected-b | ||
``` | ||
|
||
<!-- prettier-ignore --> | ||
Accessing internal data uses the DAS common storage connection which has use for internal and external users that require access to unclassified or protected-b data. The following containers can be provisioned: | ||
Then you create an alias to access your data: | ||
|
||
- **external-unclassified** | ||
- **external-protected-b** | ||
- **internal-unclassified** | ||
- **internal-protected-b** | ||
``` | ||
mc alias set minio $MINIO_URL $MINIO_ACCESS_KEY $MINIO_SECRET_KEY | ||
``` | ||
|
||
They follow the same convention as the AAW containers above in terms of data, however there is a layer of isolation between StatCan employees and non-StatCan employees. Non-Statcan employees are only allowed in **external** containers, while StatCan employees can have access to any container. | ||
List the contents of your data folder with `mc ls`: | ||
|
||
AAW has an integration with the FAIR Data Infrastructure team that allows users to transfer unclassified and protected-b data to Azure Storage Accounts, thus allowing users to access this data from Notebook Servers. | ||
``` | ||
mc ls minio | ||
``` | ||
|
||
Please reach out to the FAIR Data Infrastructure team if you have a use case for this data. | ||
Finally, copy your MinIO data into your Azure Blob Storage directory with `mc cp --recursive`: | ||
|
||
## Pricing | ||
``` | ||
mc cp --recursive minio ~/buckets/aaw-unclassified | ||
``` | ||
|
||
<!-- prettier-ignore --> | ||
!!! info "Pricing models are based on CPU and Memory usage" | ||
Pricing is covered by KubeCost for user namespaces (In Kubeflow at the bottom of the Notebooks tab). | ||
If you have Protected B data, you can copy your data into the Protected B bucket: | ||
|
||
In general, Blob Storage is much cheaper than [Azure Manage Disks](https://azure.microsoft.com/en-us/pricing/details/managed-disks/) and has better I/O than managed SSD. | ||
``` | ||
mc cp --recursive minio ~/buckets/aaw-protected-b | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.