Skip to main content

Privacera Documentation

Table of ContentsTable of Contents

Use a custom policy repository with Databricks

You can use a custom policy repository with Databricks if you use the fine-grained access control (FGAC) plug-in for access management. A custom policy repository uses a unique prefix, such as dev, which you specify as part of your Databricks cluster configuration.

The prefix string is injected into the Apache Ranger access control plug-in through an environment variable named SERVICE_NAME_PREFIX.

When configured, the FGAC plug-in can use access policies in custom policy repositories for the following service types:

  • Hive

  • S3

  • Files

  • ADLS

Prerequisites

To use Databricks with a custom policy repository, you must first create custom policy repository.

  1. Login to Privacera Portal.

  2. On the Privacera Portal home page, expand Access Management, and then click the Resource Policies.

  3. Click the three dots menu on the service for which you want to create a custom policy repository.

  4. Click Add Service.

  5. Add the values in the given fields, and then click Save.

    Service Name: name of your service.

    For example, <prefix>_<service_name>. Where service name will be hive, s3, files, or so on.

    Select Tag Service: select privacera_tag to apply tag based policies for this custom repository.

    Username: It should be service user i.e., hive

    Password: xxxxx

    jdbc.driverClassName: dummy or org.apache.hive.jdbc.HiveDriver

    Jdb.url: dummy

Complete one of the following procedures to use Databricks with custom policy repositories:

Configure a custom policy repository for all Databricks clusters with cluster policy

You can configure your Databricks cluster policy to inject the SERVICE_NAME_PREFIX environment variable for all clusters through cluster policy.

  1. Log in to your Databricks account.

  2. Define a cluster policy and specify the following JSON configuration:

    {
      "spark_env_vars.SERVICE_NAME_PREFIX": {
        "type": "fixed",
        "value": "<SERVICE_NAME_PREFIX>"
      }
    }

    Where:

    • <SERVICE_NAME_PREFIX>: Specifies the policy repository prefix, such as qa.

  3. To apply the new cluster policy, restart each Databricks cluster.

Configure a custom policy repository for a single Databricks cluster with an environment variable

You can configure a specific Databricks cluster to inject the SERVICE_NAME_PREFIX environment variable through a cluster environment variable.

  1. Log in to your Databricks account.

  2. From your list of Databricks clusters, edit the cluster you want to use with a custom policy repository.

  3. Update the cluster environment variables to include the following value:

    SERVICE_NAME_PREFIX=<SERVICE_NAME_PREFIX>

    Where:

    • <SERVICE_NAME_PREFIX>: Specifies the policy repository prefix, such as qa.

  4. To apply the new cluster policy, restart the Databricks cluster.

Configure a custom policy repository for a single Databricks cluster with an Init Script

You can edit the FGAC plug-in Init Script that your Databricks cluster runs such that it sets the SERVICE_NAME_PREFIX environment variable.

  1. Log in to the system where you installed Privacera Manager.

  2. Locate the privacera_databricks.sh script and open the script with an editor.

  3. Modify the script you opened in the previous step and specify the following variable:

    SERVICE_NAME_PREFIX=<SERVICE_NAME_PREFIX>

    Where:

    • <SERVICE_NAME_PREFIX>: Specifies the policy repository prefix, such as qa.

  4. To update the modified script to your Databricks DBFS, enter the following command:

    dbfs cp privacera_databricks.sh dbfs:/<PATH>/privacera_databricks.sh

    Where:

    • <PATH>: Specifies the DBFS path to copy the updated script to.

  5. To apply the new cluster policy, restart the Databricks cluster.