-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Copy Files From Source Repo (2024-04-12 18:16)
- Loading branch information
Showing
40 changed files
with
32,934 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Contributing to Microsoft Learning Repositories | ||
|
||
MCT contributions are a key part of keeping the lab and demo content current as the Azure platform changes. We want to make it as easy as possible for you to contribute changes to the lab files. Here are a few guidelines to keep in mind as you contribute changes. | ||
|
||
## GitHub Use & Purpose | ||
|
||
Microsoft Learning is using GitHub to publish the lab steps and lab scripts for courses that cover cloud services like Azure. Using GitHub allows the course’s authors and MCTs to keep the lab content current with Azure platform changes. Using GitHub allows the MCTs to provide feedback and suggestions for lab changes, and then the course authors can update lab steps and scripts quickly and relatively easily. | ||
|
||
> When you prepare to teach these courses, you should ensure that you are using the latest lab steps and scripts by downloading the appropriate files from GitHub. GitHub should not be used to discuss technical content in the course, or how to prep. It should only be used to address changes in the labs. | ||
It is strongly recommended that MCTs and Partners access these materials and in turn, provide them separately to students. Pointing students directly to GitHub to access Lab steps as part of an ongoing class will require them to access yet another UI as part of the course, contributing to a confusing experience for the student. An explanation to the student regarding why they are receiving separate Lab instructions can highlight the nature of an always-changing cloud-based interface and platform. Microsoft Learning support for accessing files on GitHub and support for navigation of the GitHub site is limited to MCTs teaching this course only. | ||
|
||
> As an alternative to pointing students directly to the GitHub repository, you can point students to the GitHub Pages website to view the lab instructions. The URL for the GitHub Pages website can be found at the top of the repository. | ||
To address general comments about the course and demos, or how to prepare for a course delivery, please use the existing MCT forums. | ||
|
||
## Additional Resources | ||
|
||
A user guide has been provided for MCTs who are new to GitHub. It provides steps for connecting to GitHub, downloading and printing course materials, updating the scripts that students use in labs, and explaining how you can help ensure that this course’s content remains current. | ||
|
||
<https://microsoftlearning.github.io/MCT-User-Guide/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Module: 00 | ||
## Lab/Demo: 00 | ||
### Task: 00 | ||
#### Step: 00 | ||
|
||
Description of issue | ||
|
||
Repro steps: | ||
|
||
1. | ||
1. | ||
1. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
--- | ||
name: Bug report | ||
about: Create a report to help us improve | ||
title: '' | ||
labels: '' | ||
assignees: '' | ||
|
||
--- | ||
|
||
# Module: 00 | ||
## Lab/Demo: 00 | ||
### Task: 00 | ||
#### Step: 00 | ||
|
||
Description of issue | ||
|
||
Repro steps: | ||
|
||
1. | ||
1. | ||
1. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Module: 00 | ||
## Lab/Demo: 00 | ||
|
||
Fixes # . | ||
|
||
Changes proposed in this pull request: | ||
|
||
- | ||
- | ||
- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
|
||
Instructions/Labs/09-real-time-analytics-eventstream.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"cSpell.words": [ | ||
"lakehouse" | ||
] | ||
} |
Binary file not shown.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"cells":[{"cell_type":"markdown","id":"0a9715a0-c5f4-49d5-a5d7-949c445ad340","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["# Generate and visualize predictions\n","\n","In this notebook, you'll train a machine learning model that predicts a quantitative measure of diabetes. You'll register the model to the Microsoft Fabric workspace and apply the model to a test dataset to generate new predictions. Finally, you'll retrieve the predictions from the delta table where they are saved."]},{"cell_type":"markdown","id":"f9355cd0-49f9-4ec4-a548-aa2e083876c3","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["## Train a machine learning model to predict diabetes\n","\n","First, let's train a machine learning model to predict diabetes. You'll then use the model to generate predictions on a test dataset. Let's start by loading the training dataset:\n"]},{"cell_type":"code","execution_count":null,"id":"4285efbc-3235-4b7e-8851-418140146a4e","metadata":{},"outputs":[],"source":["# Azure storage access info for open dataset diabetes\n","blob_account_name = \"azureopendatastorage\"\n","blob_container_name = \"mlsamples\"\n","blob_relative_path = \"diabetes\"\n","blob_sas_token = r\"\" # Blank since container is Anonymous access\n","\n","# Set Spark config to access blob storage\n","wasbs_path = f\"wasbs://%s@%s.blob.core.windows.net/%s\" % (blob_container_name, blob_account_name, blob_relative_path)\n","spark.conf.set(\"fs.azure.sas.%s.%s.blob.core.windows.net\" % (blob_container_name, blob_account_name), blob_sas_token)\n","print(\"Remote blob path: \" + wasbs_path)\n","\n","# Spark read parquet, note that it won't load any data yet by now\n","df = spark.read.parquet(wasbs_path)"]},{"cell_type":"markdown","id":"7060afcb-17bf-495d-a1e3-813eae542050","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Convert the Spark dataframe to a Pandas dataframe:"]},{"cell_type":"code","execution_count":null,"id":"7df7a421-d957-4b77-9e8a-f793c0ac1d13","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["import pandas as pd\n","df = df.toPandas()\n","df.head()"]},{"cell_type":"markdown","id":"ca037de4-8ddd-4f74-80b8-521b741a834f","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Split the data:"]},{"cell_type":"code","execution_count":null,"id":"5ee9af2e-f53f-485c-887a-416d47c728fc","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["from sklearn.model_selection import train_test_split\n","\n","X, y = df[['AGE','SEX','BMI','BP','S1','S2','S3','S4','S5','S6']].values, df['Y'].values\n","\n","X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)"]},{"cell_type":"markdown","id":"5b2a9819-418a-44a9-97a0-6fbdbabce0ec","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Create an experiment in the workspace:"]},{"cell_type":"code","execution_count":null,"id":"e79341af-9908-444e-9c3f-d3ce78a262f5","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["import mlflow\n","experiment_name = \"experiment-diabetes\"\n","mlflow.set_experiment(experiment_name)"]},{"cell_type":"markdown","id":"065c2501-64e1-4aea-be73-33a239360a75","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Train and track the model:"]},{"cell_type":"code","execution_count":null,"id":"b780186f-6033-4ed7-b543-dc4c8bf2438c","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["from sklearn.tree import DecisionTreeRegressor\n","from mlflow.models.signature import ModelSignature\n","from mlflow.types.schema import Schema, ColSpec\n","\n","with mlflow.start_run():\n"," mlflow.autolog(log_models=False)\n","\n"," model = DecisionTreeRegressor(max_depth=5) \n"," model.fit(X_train, y_train)\n","\n"," # create the signature manually\n"," input_schema = Schema([\n"," ColSpec(\"integer\", \"AGE\"),\n"," ColSpec(\"integer\", \"SEX\"),\n"," ColSpec(\"double\", \"BMI\"),\n"," ColSpec(\"double\", \"BP\"),\n"," ColSpec(\"integer\", \"S1\"),\n"," ColSpec(\"double\", \"S2\"),\n"," ColSpec(\"double\", \"S3\"),\n"," ColSpec(\"double\", \"S4\"),\n"," ColSpec(\"double\", \"S5\"),\n"," ColSpec(\"integer\", \"S6\"),\n"," ])\n","\n"," output_schema = Schema([ColSpec(\"integer\")])\n","\n"," # Create the signature object\n"," signature = ModelSignature(inputs=input_schema, outputs=output_schema)\n","\n"," # manually log the model\n"," mlflow.sklearn.log_model(model, \"model\", signature=signature)"]},{"cell_type":"markdown","id":"78812668-4e8e-4c14-a327-62029f0ee9c6","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["When the model is trained and tracked in an experiment, you can register the model from the latest experiment run output. Start by retrieving the latest run ID:"]},{"cell_type":"code","execution_count":null,"id":"d8c90ae8-2aaa-476b-a66a-b6a8a90cf5ff","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["exp = mlflow.get_experiment_by_name(experiment_name)\n","\n","last_run = mlflow.search_runs(exp.experiment_id, order_by=[\"start_time DESC\"], max_results=1)\n","\n","last_run_id = last_run.iloc[0][\"run_id\"]\n","\n","print(\"Last Run ID:\", last_run_id)"]},{"cell_type":"markdown","id":"36a9c127-1f80-49f4-a4da-3c0a336557e2","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Create the model URI by specifying the `model` output folder to which all model artifacts are stored and including the experiment run ID:"]},{"cell_type":"code","execution_count":null,"id":"624928ec-b0c2-430a-a0da-e9f82869a117","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["model_uri = \"runs:/{}/model\".format(last_run_id)"]},{"cell_type":"markdown","id":"fe1e8ee8-6287-4588-b09d-398f5fc80016","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Save the model by registering it to the workspace:"]},{"cell_type":"code","execution_count":null,"id":"270f9f4a-c62f-4ba9-ba48-8f72f1276f4a","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["mv = mlflow.register_model(model_uri, \"diabetes-model\")\n","\n","print(\"Name: {}\".format(mv.name))\n","print(\"Version: {}\".format(mv.version))"]},{"cell_type":"markdown","id":"21a53c35-880b-49f4-9d1b-f119bb113daf","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Your model is now saved in your workspace under the name `diabetes-model`. \n","\n","Optionally, you can use the browse feature in your workspace to find the model in the workspace and explore it using the UI."]},{"cell_type":"markdown","id":"d3dfefdb-b745-495b-bf68-824fe09db0ad","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["## Create a test dataset and save in a lakehouse\n","\n","Before running the cell below, complete the following steps:\n","\n","1. In the **Add lakehouse** pane, select **Add** to add a lakehouse.\n","1. Select **New lakehouse** and select **Add**.\n","1. Create a new **Lakehouse** with a name of your choice.\n","1. When asked to stop the current session, select **Stop now** to restart the notebook.\n","\n","When the lakehouse is created and attached to this notebook, run the following cell to create a test dataset:"]},{"cell_type":"markdown","id":"95f1ed2d-41fb-43a9-b926-0fc3d7f8f659","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Create the dataframe with test data:"]},{"cell_type":"code","execution_count":null,"id":"f21dd5a6-d800-468e-965d-f8834cc4ffa3","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["data = [\n"," (62, 2, 33.7, 101.0, 157, 93.2, 38.0, 4.0, 4.8598, 87),\n"," (50, 1, 22.7, 87.0, 183, 103.2, 70.0, 3.0, 3.8918, 69),\n"," (76, 2, 32.0, 93.0, 156, 93.6, 41.0, 4.0, 4.6728, 85),\n"," (25, 1, 26.6, 84.0, 198, 131.4, 40.0, 5.0, 4.8903, 89),\n"," (53, 1, 23.0, 101.0, 192, 125.4, 52.0, 4.0, 4.2905, 80),\n"," (24, 1, 23.7, 89.0, 139, 64.8, 61.0, 2.0, 4.1897, 68),\n"," (38, 2, 22.0, 90.0, 160, 99.6, 50.0, 3.0, 3.9512, 82),\n"," (69, 2, 27.5, 114.0, 255, 185.0, 56.0, 5.0, 4.2485, 92),\n"," (63, 2, 33.7, 83.0, 179, 119.4, 42.0, 4.0, 4.4773, 94),\n"," (30, 1, 30.0, 85.0, 180, 93.4, 43.0, 4.0, 5.3845, 88)\n","]\n","\n","columns = ['AGE','SEX','BMI','BP','S1','S2','S3','S4','S5','S6']\n","\n","df = spark.createDataFrame(data, schema=columns)\n","df.show()"]},{"cell_type":"markdown","id":"7733c515-d1fa-4856-86eb-a7bf7cc01659","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Visualize the data types of the columns:"]},{"cell_type":"code","execution_count":null,"id":"2b981cb2-d2a9-4c73-9586-61f45767c441","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["df.dtypes"]},{"cell_type":"markdown","id":"e4c1e822-56be-4253-9cba-ed9dc6e00b15","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Change the data types for the columns to align with the model's expected input:"]},{"cell_type":"code","execution_count":null,"id":"1a8c4784-9a4a-487e-a78d-509d6436aa44","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["from pyspark.sql.types import IntegerType, DoubleType\n","\n","df = df.withColumn(\"AGE\", df[\"AGE\"].cast(IntegerType()))\n","df = df.withColumn(\"SEX\", df[\"SEX\"].cast(IntegerType()))\n","df = df.withColumn(\"BMI\", df[\"BMI\"].cast(DoubleType()))\n","df = df.withColumn(\"BP\", df[\"BP\"].cast(DoubleType()))\n","df = df.withColumn(\"S1\", df[\"S1\"].cast(IntegerType()))\n","df = df.withColumn(\"S2\", df[\"S2\"].cast(DoubleType()))\n","df = df.withColumn(\"S3\", df[\"S3\"].cast(DoubleType()))\n","df = df.withColumn(\"S4\", df[\"S4\"].cast(DoubleType()))\n","df = df.withColumn(\"S5\", df[\"S5\"].cast(DoubleType()))\n","df = df.withColumn(\"S6\", df[\"S6\"].cast(IntegerType()))\n","\n","df.dtypes"]},{"cell_type":"markdown","id":"526403dc-7b8a-4bf9-9c41-b668e7aa46e4","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Save the test dataset in the lakehouse as a Delta table named `diabetes_test`:"]},{"cell_type":"code","execution_count":null,"id":"d537a640-c47e-4cc9-80f3-36d74ef0e4b0","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["table_name = \"diabetes_test\"\n","df.write.format(\"delta\").mode(\"overwrite\").save(f\"Tables/{table_name}\")\n","print(f\"Spark dataframe saved to delta table: {table_name}\")"]},{"cell_type":"markdown","id":"3433c6ad-183d-41ff-b7bc-50c70a218601","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["To view the delta table, select the **...** next to the **Tables** in the **Lakehouse explorer** pane, and select **Refresh**. The `diabetes_test` table should appear.\n","\n","Expand the `diabetes_test` table in the left pane to view all fields it includes. Note that there's **no** field named `predictions` **yet**."]},{"cell_type":"markdown","id":"7993c803-a1a1-4214-86a8-b9c40095f115","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["## Apply the model to generate predictions\n","\n","Finally, you can apply the model you trained."]},{"cell_type":"code","execution_count":null,"id":"9405589b-da33-414d-a4ce-b3a5ac66668d","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["import mlflow\n","from synapse.ml.predict import MLFlowTransformer\n","\n","df_test = spark.read.format(\"delta\").load(f\"Tables/{table_name}\")\n","\n","model = MLFlowTransformer(\n"," inputCols=[\"AGE\",\"SEX\",\"BMI\",\"BP\",\"S1\",\"S2\",\"S3\",\"S4\",\"S5\",\"S6\"],\n"," outputCol=\"predictions\",\n"," modelName=\"diabetes-model\",\n"," modelVersion=1\n",")\n","df_test = model.transform(df)\n","\n","df_test.write.format('delta').mode(\"overwrite\").option(\"mergeSchema\", \"true\").save(f\"Tables/{table_name}\")"]},{"cell_type":"markdown","id":"811bdae4-bbd9-49b8-9ea7-6715ae44c881","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["Select the **...** next to the `diabetes_test` table and select **Refresh**. A new field `predictions` has been added. \n","\n","Drag and drop the `diabetes_test` table to the field below. The necessary code to view the table's contents will appear. Run the cell to visualize the data."]},{"cell_type":"code","execution_count":null,"id":"9b8672f6-5b26-4bcc-95cd-136e70432aaa","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":[]}],"metadata":{"kernel_info":{"name":"synapse_pyspark"},"kernelspec":{"display_name":"Synapse PySpark","language":"Python","name":"synapse_pyspark"},"language_info":{"name":"python"},"microsoft":{"host":{"synapse_widget":{"state":{},"token":"4351ef1c-c8dc-44ed-a53d-13aac62a44a1"}},"language":"python","ms_spell_check":{"ms_spell_check_language":"en"}},"notebook_environment":{},"nteract":{"version":"[email protected]"},"save_output":true,"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{},"enableDebugMode":false}},"synapse_widget":{"state":{},"version":"0.1"},"trident":{"lakehouse":{}},"widgets":{}},"nbformat":4,"nbformat_minor":5} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
>> Sales Shipped measure definition [1] | ||
================================================== | ||
Sales Shipped = | ||
CALCULATE ( | ||
SUM ( 'Sales'[Sales Amount] ), | ||
USERELATIONSHIP ( 'Date'[DateKey], 'Sales'[ShipDateKey] ) | ||
) | ||
|
||
|
||
>> Sales Shipped measure definition [2] | ||
================================================== | ||
Sales Shipped = | ||
CALCULATE ( | ||
SUM ( 'Sales'[Sales Amount] ), | ||
CROSSFILTER ( 'Date'[DateKey], 'Sales'[OrderDateKey], NONE ), | ||
TREATAS ( | ||
VALUES ( 'Date'[DateKey] ), | ||
'Ship Date'[ShipDateKey] | ||
) | ||
) | ||
|
||
|
||
>> Sales Unshipped measure definition | ||
================================================== | ||
Sales Unshipped = | ||
CALCULATE ( | ||
SUM ( 'Sales'[Sales Amount] ), | ||
ISBLANK ( 'Sales'[ShipDateKey] ) | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This file is intentionally blank. |
Binary file added
BIN
+5.67 MB
Allfiles/Labs/15/Solution/Sales Analysis - Work with model relationships.pbix
Binary file not shown.
Binary file added
BIN
+5.64 MB
Allfiles/Labs/15/Starter/Sales Analysis - Work with model relationships.pbix
Binary file not shown.
Oops, something went wrong.