These new features could be helpful for companies that store personally identifiable information and other sensitive data such as credit card data and biometric information
Google recently released new features for its SaaS data warehouse BigQuery, including column-level encryption functions and dynamic masking of information. These features add a second layer of defence on access control to help secure and manage sensitive data.
Specifically, dynamic masking of information can be used for real-time transactions, whereas column-level encryption provides additional security for data at rest or in motion where real-time usability is not required.
These new features could be helpful for companies that store personally identifiable information (PII) and other sensitive data such as credit card data and biometric information. Companies that store and analyse data in countries where data regulation and privacy mandates are evolving face ongoing risks from data breaches and data leakage and need to control data access, and these companies may also benefit from the new features.
Column-level encryption enables the encryption and decryption of information at the column level, which means that the administrator can select which column is encrypted and which is not. It supports the AES-GCM (non-deterministic) and AES-SIV (deterministic) encryption algorithms. Functions support AES-SIV, allowing for grouping, aggregation and joining on encrypted data. This new feature enables some unique use cases: when data is natively encrypted in BigQuery and must be decrypted when accessed, or where data is externally encrypted, stored in BigQuery, and must then be decrypted when accessed.
Column-level encryption is integrated with Cloud Key Management System (Cloud KMS) to provide the administrator more control, allow management of the encryption keys in KMS, and enable on-access secure key retrieval and detailed logging. Cloud KMS can be used to generate the KEK (key encryption key) that encrypts the DEK (data encryption key) that encrypts the data in BigQuery columns. Cloud KMS uses IAM (identity and access management) to define roles and permissions. KEK is a symmetric encryption key set stored in Cloud KMS, and referencing an encrypted keyset in BigQuery reduces the risk of critical exposure.
The BigQuery documentation explains that – At query execution time, you provide the Cloud KMS resource path of the KEK and the ciphertext from the wrapped DEK. BigQuery calls Cloud KMS to unwrap the DEK and then uses that key to decrypt the data in your query. The unwrapped version of the DEK is only stored in memory for the duration of the query and then destroyed.
In one example of a use case, the ZIP code is the data to be encrypted and a non-deterministic functions decrypt data when it is accessed by using the function in the query that is being run on the table.
In a second example, the AEAD deterministic function can decrypt data when it is accessed by using the function in the query that is being run on the table and supports aggregation and joins using the encrypted data.
In this way, even a user who is not allowed to access the encrypted data can perform a join.
Before the release of the column-level encryption feature, the administrators need to make copies of the datasets with data obfuscated to manage the right access to groups. This creates an inconsistent approach to protecting data, which can be expensive to operate. Column level encryption increases the security level because each column can have its own encryption key instead of a single key for the entire database. Using column-level encryption allows faster data access because there are fewer encryption data.
Dynamic masking of information, released in preview, allows more control to administrators who can choose, combined with the column-level access control, to grant full access, no access to data or masked data extending the column-level security. This capability selectively masks column-level data at query time based on the defined masking rules, user roles and privileges. This feature allows the administrators to obfuscate sensitive data and control user access while mitigating the risk of data leakage.
This new feature makes sharing data easier because the administrators can hide information selectively, and the tables can be shared with large groups of users. At the application level, the developers don’t need to modify the query to hide sensitive data; after the data masking is configured at the BigQuery level, the existing query automatically hides the data based on the user’s roles. Last but not least, the application of security is easier because the administrator can write the security rule once and then apply it to any number of columns with tags.
Any masking policies or encryption applied on the base tables are carried over to authorised views and materialised views, and masking or encryption is compatible with other security features such as row-level security.
Both new features can increase security, manage access control, comply with privacy law, and create safe test environments. Allow a more consistent way to manage tables with sensitive data, the administrators don’t need to create multiple datasets with encrypted (or not) data and share these copies with the right users.