This Terraform module exports Azure cost-related data and forwards to AWS S3. The supported data sets are described below:
- Cost Data: Daily parquet files containing standardized cost and usage details in FOCUS format
- Azure Advisor Recommendations: Daily JSON files containing cost optimization recommendations from Azure Advisor
- Carbon Emissions Data: Monthly JSON reports with carbon footprint metrics across Scope 1 and Scope 3 emissions
Note
There is currently an issue with publishing Function App code on the Flex Consumption Plan using a managed identity. We have had to revert to using the storage account connection string for now. More details can be found here (behind a paywall, sadly).
This module creates a fully integrated solution for exporting multiple Azure datasets and forwarding them to AWS S3. The following diagram illustrates the data flow and component architecture for all three export types:
graph TD
subgraph "Data Sources"
CMF[Cost Management<br/>FOCUS Export]
AAA[Azure Advisor API<br/>Daily Timer]
COA[Carbon Optimization API<br/>Monthly Timer]
end
subgraph "Azure Storage"
SA[Storage Account]
end
subgraph "Processing"
QF[Queue: FOCUS]
FAF[CostExportProcessor<br/>Function App]
FAR[AdvisorRecommendationsExporter<br/>Function App]
FAC[CarbonExporter<br/>Function App]
end
subgraph "AWS"
S3[S3 Bucket]
APP[Entra ID App<br/>Registration<br/>for Upload Auth]
end
%% Data Flow
CMF -->|Daily Parquet| SA
AAA -->|Daily Timer| FAR
COA -->|Monthly Timer| FAC
SA -->|Blob Event| QF
QF -->|Trigger| FAF
%% Upload Flow with App Registration Authentication
FAF -->|Upload via<br/>App Registration| S3
FAR -->|Upload via<br/>App Registration| S3
FAC -->|Upload via<br/>App Registration| S3
FAF -.->|Uses for Auth| APP
FAR -.->|Uses for Auth| APP
FAC -.->|Uses for Auth| APP
%% Styling
classDef datasource fill:#4285f4,color:#fff
classDef storage fill:#4285f4,color:#fff
classDef queue fill:#00d4aa,color:#fff
classDef function fill:#4285f4,color:#fff
classDef aws fill:#ff9900,color:#fff
classDef auth fill:#28a745,color:#fff
class CMF,AAA,COA datasource
class SA storage
class QF queue
class FAF,FAR,FAC function
class S3 aws
class APP auth
The module creates three distinct export pipelines for each of the data sets:
- Daily Export: Cost Management exports daily FOCUS-format cost data (Parquet files) to Azure Storage
- Event Trigger: Blob creation events trigger the
CostExportProcessorfunction via storage queue - Processing: Function processes and transforms the data (removes sensitive columns, restructures paths)
- Upload: Processed data uploaded to S3 in partitioned structure:
billing_period=YYYYMMDD/
- Daily Trigger:
AdvisorRecommendationsExporterfunction runs daily at 2 AM (timer trigger) - API Call: Function calls Azure Advisor Recommendations API for all subscriptions in scope, filtering for cost category recommendations
- Processing: Response data formatted as JSON with subscription tracking and date metadata
- Upload: JSON data uploaded to S3 in partitioned structure:
gds-recommendations-v1/billing_period=YYYYMMDD/
- Monthly Trigger:
CarbonEmissionsExporterfunction runs monthly on the 20th (timer trigger) - API Call: Function calls Azure Carbon Optimization API for previous month's Scope 1 & 3 emissions
- Processing: Response data formatted as JSON with dynamic date range validation (12-month rolling window)
- Upload: JSON data uploaded to S3 in partitioned structure:
billing_period=YYYYMMDD/
The Carbon Optimization API provides a rolling 12-month window of emissions data. The available date range is calculated dynamically based on Microsoft's data availability policy:
- Data Availability: Previous month's data becomes available by the 19th of the current month
- Rolling Window: API provides access to exactly 12 months of historical data
- Dynamic Calculation: Date ranges are recalculated on each function execution (no hard-coded dates)
- Automatic Adjustment: Functions automatically use the most recent available data within the API's current range
Example: On October 30, 2024 (day ≥19), the API would provide data for September 2023 through September 2024. The same function running on January 15, 2025 would provide data for November 2023 through November 2024.
A test endpoint is available at /api/carbon-date-range to view the current calculated date range.
- Function Apps use Managed Identity to authenticate with Entra ID Application
- Entra ID Application uses OIDC federation to assume AWS IAM Role
- All data transfers secured with cross-cloud federation (no long-lived AWS credentials)
- Application Insights provides telemetry and monitoring for all pipelines
- Private Networking: All components use private endpoints and VNet integration
- Zero Trust: No public network access (except during deployment if
deploy_from_external_network=true) - Managed Identity: Azure resources authenticate using system-assigned managed identities
- Cross-Cloud Federation: OIDC federation eliminates need for long-lived AWS credentials
- An existing virtual network with two subnets, one of which has a delegation for Microsoft.App.environments (
function_app_subnet_id) - Role assignments:
- Azure RBAC:
Reader and Data Access,User Access AdministratorandContributorat the subscription scope (where you will be provisioning resources)User Access Administratorat the Tenant Root Group management group scope*
- Billing:
- Enterprise Agreement (EA):
EnrollmentReaderat the billing account scope (see Assign Enterprise Agreement roles to service principals) - Microsoft Customer Agreement (MCA):
Billing account contributorat the billing account scope
- Enterprise Agreement (EA):
- Azure RBAC:
Tip
* Role assignment privileges can be constrained to Carbon Optimization Reader, Management Group Reader and Reader
provider "azurerm" {
# These need to be explicitly registered
resource_providers_to_register = ["Microsoft.CostManagementExports", "Microsoft.App"]
features {}
}
module "example" {
source = "git::https://github.com/co-cddo/terraform-azure-focus?ref=1833bb30497da1b2faac808c0a4ba3adde71494e" # v0.0.2
aws_account_id = "<aws-account-id>"
billing_account_ids = ["<billing-account-id>"] # List of billing account IDs (applicable to FOCUS cost data only)
subnet_id = "/subscriptions/<subscription-id>/resourceGroups/existing-infra/providers/Microsoft.Network/virtualNetworks/existing-vnet/subnets/default"
function_app_subnet_id = "/subscriptions/<subscription-id>/resourceGroups/existing-infra/providers/Microsoft.Network/virtualNetworks/existing-vnet/subnets/functionapp"
virtual_network_name = "existing-vnet"
virtual_network_resource_group_name = "existing-infra"
resource_group_name = "rg-cost-export"
# Setting to false or omitting this argument assumes that you have private GitHub runners configured in the existing virtual network. It is not recommended to set this to true in production
deploy_from_external_network = false
}Tip
If you don't have a suitable existing Virtual Network with two subnets (one of which has a delegation to Microsoft.App.environments), please refer to the example configuration here, which provisions the prerequisite baseline infrastructure before consuming the module.
When the terraform apply has completed, exports in each billing account should appear on the exports blade in Cost Management + Billing. Search for 'focus-backfill', multi-select reports and click 'Run now' in small batches:
Note
An alert will appear saying 'Failed to run one or more export (1 out of 1 failed)'. Sometimes this message appears to be wrong, other times you may need to retry some of the exports.
For historical carbon emissions data, use the backfill HTTP endpoint instead of running the timer function:
Endpoint: POST /api/carbon-backfill
Query Parameters:
force_overwrite=true- Overwrite existing data files (default: false)skip_existing=false- Process all months regardless of existing data (default: true)
Examples:
POST /api/carbon-backfill- Skip months that already have data (idempotent)POST /api/carbon-backfill?force_overwrite=true- Overwrite all existing dataPOST /api/carbon-backfill?skip_existing=false- Process all months, but don't overwrite existing
Check current API availability and existing data:
Endpoint: GET /api/carbon-date-range
Query Parameters:
check_existing=true- Also check which months already have data in S3
Run the function named 'CarbonEmissionsExporter' once. Note that you will need to temporarily configure the firewall and CORS rules to allow this (add an entry for https://portal.azure.com).
Idempotency: Both the timer function and backfill endpoint are idempotent - they will skip processing if data already exists for a given month.
We don't provide a backfill for this dataset.
The terraform-docs utility is used to generate this README. Follow the below steps to update:
- Make changes to the
.terraform-docs.ymlfile - Fetch the
terraform-docsbinary (https://terraform-docs.io/user-guide/installation/) - Run
terraform-docs markdown table --output-file ${PWD}/README.md --output-mode inject .
| Name | Version |
|---|---|
| archive | >= 2.0 |
| azapi | >= 1.7.0 |
| azuread | > 2.0 |
| azurerm | > 4.0 |
| null | >= 3.0 |
| random | >= 3.0 |
| time | >= 0.7.0 |
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| aws_account_id | AWS account ID to use for the S3 bucket | string |
n/a | yes |
| billing_account_ids | List of billing account IDs to create FOCUS cost exports for. Use the billing account ID format from Azure portal (e.g., 'bdfa614c-3bed-5e6d-313b-b4bfa3cefe1d:16e4ddda-0100-468b-a32c-abbfc29019d8_2019-05-31') | list(string) |
n/a | yes |
| function_app_subnet_id | ID of the subnet to connect the function app to. This subnet must have delegation configured for Microsoft.App/environments and must be in the same virtual network as the private endpoints | string |
n/a | yes |
| resource_group_name | Name of the new resource group | string |
n/a | yes |
| subnet_id | ID of the subnet to deploy the private endpoints to. Must be a subnet in the existing virtual network | string |
n/a | yes |
| virtual_network_name | Name of the existing virtual network | string |
n/a | yes |
| virtual_network_resource_group_name | Name of the existing resource group where the virtual network is located | string |
n/a | yes |
| aws_region | AWS region for the S3 bucket | string |
"eu-west-2" |
no |
| aws_s3_bucket_name | Name of the AWS S3 bucket to store cost data | string |
"uk-gov-gds-cost-inbound-azure" |
no |
| deploy_from_external_network | If you don't have existing GitHub runners in the same virtual network, set this to true. This will enable 'public' access to the function app during deployment. This is added for convenience and is not recommended in production environments | bool |
false |
no |
| focus_dataset_version | Version of the cost and usage details (FOCUS) dataset to use | string |
"1.0r2" |
no |
| location | The Azure region where resources will be created | string |
"uksouth" |
no |
| Name | Description |
|---|---|
| aws_app_client_id | The aws app client id |
| billing_account_ids | Billing account IDs configured for cost reporting |
| billing_accounts_map | Map of billing account indices to IDs and scopes |
| carbon_container_name | The storage container name for carbon data (not used - carbon data goes directly to S3) |
| carbon_export_name | The name of the carbon optimization export (timer-triggered function) |
| focus_container_name | The storage container name for FOCUS cost data |
| publish_code_command | Publish code command for debugging |
| recommendations_export_name | The name of the Azure Advisor recommendations export (timer-triggered function) |
| report_scopes | Report scopes created for each billing account |
