copyright | lastupdated | subcollection | ||
---|---|---|---|---|
|
2022-01-17 |
AnalyticsEngine |
{:new_window: target="_blank"} {:shortdesc: .shortdesc} {:codeblock: .codeblock} {:screen: .screen} {:pre: .pre} {:external: target="_blank" .external}
{: #key-management-application-serverless}
This topic describes how to manage column encryption keys by application. It explains how to provide master keys and how to write and read encrypted data using these master keys.
{: #key-management-application-serverless-1}
To provide master keys:
-
Set the class implementing EncryptionPropertiesFactory:
parameter name: "parquet.crypto.factory.class" parameter value: "com.ibm.parquet.key.management.IBMKeyToolsFactory"
-
Pass the explicit master keys, in the following format:
parameter name: "parquet.encryption.key.list" parameter value: "<master key ID>:<master key (base64)> , <master key ID>:<master key (base64)>.."
For example:
sc.hadoopConfiguration.set("parquet.crypto.factory.class","com.ibm.parquet.key.management.IBMKeyToolsFactory") sc.hadoopConfiguration.set("parquet.encryption.key.list" , "k1:iKwfmI5rDf7HwVBcqeNE6w== , k2:LjxH/aXxMduX6IQcwQgOlw== , k3:rnZHCxhUHr79Y6zvQnxSEQ==")
The length of master keys before base64 encoding can be 16, 24 or 32 bytes (128, 192 or 256 bits).
{: #key-management-application-serverless-1}
To write encrypted data:
-
Specify which columns to encrypted, and which master keys to use:
parameter name: "parquet.encryption.column.keys" parameter value: "<master key ID>:<column>,<column>;<master key ID>:<column>,..."
-
Specify the footer key:
parameter name: "parquet.encryption.footer.key" parameter value: "<master key ID>"
For example:
dataFrame.write .option("parquet.encryption.footer.key" , "k1") .option("parquet.encryption.column.keys" , "k2:SSN,Address;k3:CreditCard") .parquet("<path to encrypted files>")
Note: If either the
"parquet.encryption.column.keys"
parameter or the"parquet.encryption.footer.key"
parameter is not set, an exception will be thrown.
{: #key-management-application-serverless-2}
The required metadata is stored in the encrypted Parquet files.
To read the encrypted data:
- Set the class implementing EncryptionPropertiesFactory:
sc.hadoopConfiguration.set("parquet.crypto.factory.class","com.ibm.parquet.key.management.IBMKeyToolsFactory")
- Provide the encryption keys:
sc.hadoopConfiguration.set("parquet.encryption.key.list" , "k1:iKwfmI5rDf7HwVBcqeNE6w== , k2:LjxH/aXxMduX6IQcwQgOlw== , k3:rnZHCxhUHr79Y6zvQnxSEQ==")
- Call the regular parquet read commands, such as:
val dataFrame = spark.read.parquet("<path to encrypted files>")
{: #key-rotation-key-mgt-application}
If key rotation is required, the following Hadoop configuration properties must be set:
- The parameters
"parquet.encryption.key.list"
,"parquet.encryption.new.key.list"
- The parameter
"parquet.encryption.key.material.store.internally"
must be set to"false"
- The parameter
"parquet.encryption.kms.client.class"
must be set to"com.ibm.parquet.key.management.InMemoryKMS"
- The parameter
"parquet.crypto.factory.class"
must be set to"com.ibm.parquet.key.management.IBMKeyToolsFactory"
For example:
sc.hadoopConfiguration.set("parquet.encryption.key.list", OLD_KEYS)
sc.hadoopConfiguration.set("parquet.encryption.new.key.list", NEW_KEYS)
sc.hadoopConfiguration.set("parquet.encryption.key.material.store.internally", "false")
sc.hadoopConfiguration.set("parquet.encryption.kms.client.class", "com.ibm.parquet.key.management.InMemoryKMS")
sc.hadoopConfiguration.set("parquet.crypto.factory.class","com.ibm.parquet.key.management.IBMKeyToolsFactory")
KeyToolkit.rotateMasterKeys("<path to encrypted files>", sc.hadoopConfiguration)
Note: Key rotation can be performed only for files written with "parquet.encryption.key.material.store.internally"
parameter set to "false"
.