A very important part of adopting any solution is understanding what it will cost. Whilst we will have some residual Azure costs, these will be negligible compared to MGDC costs. There has been a pipeline written to extract the total object counts in each dataset required for the Capacity scenario, this being Sites and Files.
Note:
The Files dataset is in private preview until 20th August. Please note you will not be able to run this pipeline before this date unless your tenant has been enrolled on the private preview.
Currently, the forecasting pipeline requires a Spark pool to execute the notebook that is used to pull out the total objects from the job metadata that is returned to the storage account on successful execution of MGDC. Future aspirations are to have a different process extract this info and serve up in some form of web interface. But for now you need a Spark pool. Sorry! If you already have a spark pool in your Synapse workspace then great you can use this.
If you followed Jose's blog you should have an MGDC app that has permission to extract the Sites dataset.
For oversharing we need the following three datasets:
- Sites
- Files
Please navigate to you MGDC app in the Azure portal and validate that the app has permission to extract these datasets. Remember that if you make any changes you will need to re-approve the app in the MAC (using another global admin account)
- Navigate to the MGDC app in the Azure portal and select you app
- Validate the datasets under settings
- NAvigate to MGDC apps in the Microsoft Admin Centre. Settings > Org Settings > Security & Privacy
- Approve (if required). Click and follow the approval flow.
You can create a spark pool directly from your Synapse Workspace resource in the Azure portal
- Create Spark Pool
- Provide your spark pool a name and a size. Small will be more than adequate for the forecast
- Review and Create > Create
Login to your Synapse Studio and import the pipeline.
- Download the Storage_Top1.zip
- From the Home menu, navigate to
Integrate
- Import the pipeline from the + button. Browse to the downloaded pipeline template.
- Select your Linked Services (created following Jose's blog) and click Open Pipeline. This will import 1 pipeline, 4 datasets and 1 notebook into your Synapse Studio.
- Before publishing, navigate to the Forecast Notebook, under the develop tab and ensure that a spark pool has been selected in the
Attach to
dropdown
- Click Publish all > Publish
Great you are now ready to execute the pipeline to obtain a forecast. To obtain a full pull we need to provide both the same start and end date.
- Navigate to the Integrate Menu and select the
Storage_Top1
pipeline. ClickAdd Trigger
>Trigger now
- Populate full pull parameters and click
OK
- Navigate to the Monitor tab to see the execution details. Wait for the pipeline to
Complete
. Typically this will be 25 minutes.
- Once complete we can check the details extracted in the notebook
Great you are now ready to execute the pipeline to obtain a delta forecast. To obtain a delta pull we need to provide a different start and end date.
- Navigate to the Integrate Menu and select the
Storage_Top1
pipeline. ClickAdd Trigger
>Trigger now
- Populate parameters and click
OK
. Notice that the dates are 7 days apart. This timescale should be ajusted for the expected cadence. i.e. if running bi-weekly then get a delta forecast for a 14 day period.
- Navigate to the Monitor tab to see the execution details. Wait for the pipeline to
Complete
. Typically this will be 25 minutes.
- Once complete we can check the details extracted in the notebook