Skip to content

Latest commit

 

History

History
164 lines (121 loc) · 48 KB

File metadata and controls

164 lines (121 loc) · 48 KB

Azure Data Lakehouse

This module provisions a Data Lakehouse in Azure, by employing the following Azure resources:


Access Patterns

The module provisions Azure Roles and Role Assignments alongside Active Directory Groups to allow for the following access patterns:

Data Factory Contributor

Data Warehouse Admins


Considerations

Azure Data Factory Github Integration

Azure Data Factory has a Github Integration that allows us to store our Data Factory configuration in a Github repository. Though we are able to configure it, we are not able to automatically authenticate against it.

This is left to the user to do manually, by following the steps outlined in the Azure Data Factory Github Integration documentation.

Troubleshooting

Key Vault access issues:

Symptom:

Sometimes, when applying, you might get an error like this:

Error: checking for presence of existing Key "[KEY NAME]" (Key Vault "[KEY VAULT FQDN]"):
keyvault.BaseClient#GetKey: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error.
Status=403 Code="Forbidden" Message="The user, group or application '[...]' does not have keys get permission on key vault '[NAME OF KEYVAULT];location=northeurope'.
[...]
InnerError={"code":"AccessDenied"}

Probable cause:

Key Vault Access Controls often take a while to propagate. This can cause issues when trying to access a newly created Key Vault.

Resolution:

Wait a few minutes and try again. If that does not work, review the access policies for the Key Vault and make sure that the access policies are correctly configured.

Key Vault Firewall issues:

Symptom

Sometimes, when applying, you might get an error like this:

Error: checking for presence of existing Key "[KEY NAME]" (Key Vault "[KEY VAULT FQDN]):
keyvault.BaseClient#GetKey: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error.
Status=403 Code="Forbidden" Message="Client address is not authorized and caller is not a trusted service.
[...]
InnerError={"code":"ForbiddenByFirewall"}

Probable cause:

The Network Access / Firewall rules for the Key Vault might be blocking your acess.

Resolution:

Add your IP to the Key Vault Firewall rules.

Inputs

Name Description Type Default Required
budget_contact_emails Emails to send budget notifications to list(string) n/a yes
instance Instance name string n/a yes
org_code Organization code string n/a yes
platform_config workload_subscription_id - The ID of the subscription which we want to provision into.
platform_subscription_id - The ID of the platform subscription.
workload_management_group_name (optional) - The name of the management group which we want to provision our workload subscription into. If this is not set, the placement of the workload subscription inside the management group hieracry will not be changed.
object({
workload_subscription_id = string
platform_subscription_id = string
workload_management_group_name = optional(string, "")
})
n/a yes
tier Tier of the environment string n/a yes
warehouse_config sku_name - The sku-name for the datawarehouse. This controls the sku-name that will be used for the SQL Server Database. sku-names vary across regions and offerings, run az sql db list-editions -l [your region] -o table to see available options.
max_size_gb - This controls the max-size-gb setting, i.e. how much storage to allocate for the SQL Server Database.
zone_redundant - Whether the datawarehouse should be zone-redundant. This controls the zone-redundant setting that will be used for the SQL Server Database. Be aware that might not be available for all sku's.
admin_group_id (optional) - The name of an existing AD Group that should be used as an admin to the datawarehouse. If this is not set, a new AD Group will be created.
collation (optional) - The collation to use for the datawarehouse. This controls the collation that will be used for the SQL Server Database.
ip_whitelist (optional) - A list of maps containing ip_address / name pairs to whitelist for the datalakehouse
object({
sku_name = string
max_size_gb = optional(number, 500)
zone_redundant = optional(bool, true)
admin_group_name = optional(string, "")
collation = optional(string, "Icelandic_100_CI_AS")
ip_whitelist = optional(list(any), [])
})
n/a yes
adf_git_backend_config Configuration for the github repository
object({
type = string # must be either "github" or "azuredevops"

account_name = string
branch_name = string
repository_name = string
root_folder = string

# Required for Github
git_url = optional(string)

# Required for Azure DevOps
project_name = optional(string)
tenant_id = optional(string)
})
null no
api_management_config sku_name - SKU name for the API management instance
sku_capacity - SKU capacity for the API management instance
publisher_name - Publisher name for the API management instance
publisher_email - Publisher email for the API management instance
object({
sku_name = optional(string, "Developer")
sku_capacity = optional(number, 1)
publisher_name = optional(string, "")
publisher_email = optional(string, "")
})
null no
budget_for_resource_group Budget for the resource group number 50 no
datalake_whitelisted_cidrs A list of CIDRs to whitelist for the datalake list(string) [] no
datalakehouse_admins A list of Azure AD User Principal ID's that are allowed to administer the Data Lakehouse. list(string) [] no
datalakehouse_contributor_can_contribute_to_keyvault Whether the data engineers should be able to contribute to the key vault. bool false no
datalakehouse_contributor_group_name The name of an existing AD Group that should be used as a contributor to the datalakehouse.
If this is not set, a new AD Group will be created.
string "" no
datalakehouse_contributors Contributors to the datalakehouse list(string) [] no
existing_audit_keyvault_id An existing Keyvault to use for audit logs. If not provided, a new one will be created. string null no
existing_resource_group_info An existing Resource Group to use. If not provided, a new one will be created.
object({
id = string
name = string
location = string
})
null no
features Features to enable or disable.
object({
api_management = optional(bool, true)
datawarehouse = optional(bool, true)
data_factory = optional(bool, true)
datalake = optional(bool, true)
keyvault = optional(bool, true)
})
{
"api_management": true,
"data_factory": true,
"datalake": true,
"datawarehouse": true,
"keyvault": true
}
no
keyvault_ip_whitelist IP addresses to whitelist for the keyvault list(string) [] no
name_overrides Map of resource names to override. If not set, the name will be generated from the instance name.
This variable is an escape hatch for some naming scheme conflicts that can occur and should, ideally, not be used.
The schema for this variable is defined inside resource and service modules and is not documented here.
map(string) {} no
tags Any tags that should be present on created resources. Will get merged with local.default_tags map(string) {} no

Outputs

Name Description
datafactory_info The Data Factory Info
datalake_info The Data Lake Info
datalakehouse_contributor_group_info The Data Lakehouse Contributor group
datalakehouse_warehouse_admin_group_info The Data Lakehouse Warehouse Admin group
datalakehouse_warehouse_connection_info The Data Lakehouse Warehouse Connection Info

Resources

Name Type
azurerm_management_group_subscription_association.workload_subscription_association resource
azuread_user.datalakehouse_contributors data source
azurerm_client_config.current data source
azurerm_management_group.workload_mgtm_group data source
azurerm_subscription.workload_subscription data source

Modules

Name Source Version
api_management ../../modules/azure/api-management n/a
base_setup ../../modules/azure/base-setup n/a
data_engineer_group_role_assignments ../../modules/azure/role-assignment n/a
data_engineer_role ../../modules/azure/role-definition n/a
data_engineer_user_group ../../modules/azure/ad-group n/a
datafactory ../../modules/azure/datafactory n/a
datalake ../../modules/azure/datalake n/a
datawarehouse ../../modules/azure/datawarehouse n/a
keyvault ../../modules/azure/keyvault n/a
warehouse_admin_group ../../modules/azure/ad-group n/a

Requirements

Name Version
terraform >= 1.1
azuread >=2.47.0
azurerm >=3.0.0

Providers

Name Version
azuread 2.47.0
azurerm 3.92.0