Skip to content

Databricks SQL PolicySync Overview and Configuration

One purpose of PolicySync for Databricks SQL is to limit users access to your entire Databricks data source or portions thereof such as views, entire tables, or only certain columns or rows.

Planning and General Process#

The general process for connecting with JDBC to a Databricks SQL data source, creating policy, and limiting user access is as follows, You should plan to have the necessary information before you begin the specific steps described here.

  1. Add the privacera_tag service.
  2. Create an endpoint in Databricks SQL for PrivaceraCloud to connect to, with JDBC username, password, and URL.
  3. Add Databricks SQL as a service in PrivaceraCloud.
  4. Define a data source for the Databricks SQL endpoint in PrivaceraCloud using the values from the first step and other desired fields.
  5. Define the Databricks SQL service.
  6. Determine the users, groups, or roles who need access from PrivaceraCloud to your Databricks SQL.
    1. Ensure that all users in PrivaceraCloud who will access Databricks SQL have an email address in their PrivaceraCloud account.
    2. Define those users with appropriate permisions in Databricks.
    3. Create a resource policy to assign users, groups, or roles the necessary permissions to access the Databricks SQL data source at the appropriate depth.
    4. Decide the depth of the data access you will give to users: views, source tables, columns, or rows. See Allowable Privileges.

Prerequisites: Privacera Tag Service and Databricks SQL Endpoint#

Make these configuration updates before you configure Databricks SQL PolicySync.

Enable PrivaceraCloud Tag Service#

In PrivaceraCloud, the administrator must add the privacera_tag service to enable PolicySync with Databricks SQL.

See the steps in Adding the privacera_tag Service.

Create Endpoint in Databricks SQL#

In Databricks SQL, an administrator must create a Databricks SQL endpoint for connecting from PrivaceraCloud. This process is described in Create an Endpoint in Databricks SQL.

Make note of the following values for entering into the fields in PrivaceraCloud as detailed in Add Data Source for PolicySync and Databricks SQL PolicySync Fields:

  • The email address of the user defined in the endpoint. This is the value of the JDBC username (Service jdbc username) in PrivaceraCloud.
  • The Databricks generated access token. This is the value of the JDBC password (Service jdbc password) for the defined JDBC username in PrivaceraCloud.
  • The JDBC URL (Service jdbc url) defined for the endpoint.

Add Databricks SQL Service Configuration#

As described in Service Config, follow these steps to add a service configuration for Databricks SQL:

  1. Navigate to Access Manager > Service Config.
  2. Click Add service.
  3. For type of service, select databricks_sql_analytics or hive. If you choose hive, you must configure additional fields. See Databricks SQL Hive Service Definition.

  4. Save the service configuration.

Add Data Source for PolicySync#

With the values for the JDBC username, JDBC paswword, and JDBC URL that you noted in Prerequsites: Databricks SQL Endpoint, define the data source connection in PrivaceraCloud to the Databricks SQL endpoint.

As described in Data Source Connectors, follow these steps to add a Databricks SQL data source:

  1. Navigate to Settings > Datasource.
  2. Click Add system.
  3. Enter a name for this system.
  4. Click Add Application.
  5. Select POLICY SYNC.
  6. From the Service dropdown, select DATABRICKSQL service.
  7. Add required fields. For a description of all fields that must or can be set for resource policy, see Databricks SQL PolicySync Fields.
  8. Save the application.
  9. For the Application, be sure to select DATABRICKS SQL.

Grant Databricks SQL Permissions to PrivaceraCloud Users#

For each PrivaceraCloud user that needs access to Databricks SQL, the administrator needs to define that user with approperiate access permissions in Databricks.

Ensure All PrivaceraCloud Users Have an Email Address#

All PrivaceraCloud users who will access Databricks SQL must have an email address in their user account on PrivaceraCloud. This email address is required to login to Databricks SQL.

Grant Databricks SQL Access#

In your Databricks account:

  1. Navigate to Data science and engineering.
  2. Click Workspace on the top right.
  3. To open the Admin Console, go to the top right of the Workspace, click the user account icon, and select Admin Console.
  4. In the Databricks SQL access column, select the checkbox for the user.

Grant Databricks SQL Endpoint Access#

In the Databricks SQL Dashboard:

  1. Navigate to SQL > Endpoints
  2. Click the name of the Endpoint for which you want to add user permission.
  3. In the top right, click Permissions.
  4. In the SQL Endpoint Permissions dialog, select the desired user from drop down
  5. Give the user Can Use permission.
  6. Click Add.
  7. Click Save.

Define a Resource Policy#

In PrivaceraCloud, define a resource policy to grant access to the Databricks SQL data source to the desired users, groups, or roles.

Follow the steps in Resource Policies and the details about allowed privileges described here.

Allowable Privileges#

The following privileges can be specified for a Databricks SQL resource policy:

  • SELECT: Allows read access to an object.

  • CREATE: Provides ability to create an object (for example, a table in a database).

  • MODIFY: Provides ability to add, delete, and modify data to or from an object.

  • USAGE: An additional requirement to perform any action on a database object.

  • READ_METADATA: Provides ability to view an object and its metadata.

  • CREATE_NAMED_FUNCTION: Provides ability to create a named UDF in an existing catalog or database.

  • ALL PRIVILEGES: Gives all privileges, equivalent to all the above privileges.

  • Data_Admin Privilege for Secure Views: With the Data_Admin privilege, access policies are applied to source tables. If you want to restrict the access policies only to the views and not to the source tables, enable the following property in the PolicySync configuration, as detailed in Add Data Source for PolicySync and Databricks SQL PolicySync Fields:

Secure view Access By Table policies: true

Test the Policy#

To assign privileges to users, groups, or roles, follow the steps in Policy Definitions.

Then have a user other than the adminstrator test the effect.

Databricks SQL PolicySync Fields#

For a description of all fields that must or can be set for resource policy, see Databricks SQL PolicySync Fields.

Configuring Column-level Access Control#

To enable column-level access control, set the following fields when you define the PolicySync fields:

Set:

  • Column Level Access Control: true.

  • In custom fields, add the following, where # REDACTED # is any string of your choice:

    ranger.policysync.connector.4.access.control.number.value=0
    ranger.policysync.connector.4.access.control.double.value=0
    ranger.policysync.connector.4.access.control.text.value='# REDACTED #'
    

View-based Masking Functions and Row-level Filtering#

For supported masking functions and supported row-level filtering, see Databricks SQL Masking Functions.


Last update: July 29, 2021