Skip to main content

Privacera Documentation

Connect Databricks to PrivaceraCloud

The topic describes how to connect Databricks application to PrivaceraCloud using AWS and Azure platforms. Privacera provides Spark Fine-Grained Access Control (FGAC) plugin and Spark Object-Level Access Control (OLAC) plugin solutions for access control in Databricks clusters.

Note

  • OLAC and FGAC methods are mutually exclusive and cannot be enabled on the same cluster.

  • If you are using SQL, Python, and R language notebooks, recommendation is to use FGAC. See the Spark FGAC Plug-In for Databricks on AWS and Azure section above.

  • OLAC plugin was introduced to provide an alternative solution for Scala language clusters, since using Scala language on Databricks Spark has some security concerns.

  1. Go the Setting > Applications.

  2. In the Applications screen, select Databricks.

  3. Select the platform type (Amazon AWS or Microsoft Azure) on which you want to configure the Databricks application.

  4. Enter the application Name and Description, and then click Save.

  5. Click the toggle button to enable Access Management for Databricks.

PrivaceraCloud integrates with Databricks SQL using the Plug-In integration method with an account-specific cluster-scoped initialization script. Privacera’s Spark plug-In will be installed on the Databricks cluster enabling FGAC. This script will be added it to your cluster as an init script to run at cluster startup. As your cluster is restarted, it runs the init script and connects to PrivaceraCloud.

Prerequisites

Ensure that the following prerequisites are met:

  • You must have an existing Databricks account and login credentials with sufficient privileges to manage your Databricks cluster.

  • PrivaceraCloud portal admin user access.

This setup is recommended for SQL, Python, and R language notebooks.

  • It provides FGAC on databases with row filtering and column masking features.

  • It uses privacera_hive, privacera_s3, privacera_adls, privacera_files services for resource-based access control, and privacera_tag service for tag-based access control.

  • It uses the plugin implementation from Privacera.

Obtain Init Script for Databricks FGAC

  1. Log in to the PrivaceraCloud portal as an admin user (role ROLE_ACCOUNT_ADMIN).

  2. Generate the new API and Init Script. For more information, see API Key.

  3. On the Databricks Init Script section, click DOWNLOAD SCRIPT.

    By default, this script is named privacera_databricks.sh. Save it to a local filesystem or shared storage.

  4. Log in to your Databricks account using credentials with sufficient account management privileges.

  5. Copy the Init script to your Databricks cluster. This can be done via the UI or using the Databricks CLI.

    1. Using the Databricks UI:

      1. On the left navigation, click the Data icon.

      2. Click the Add Data button from the upper right corner.

      3. In the Create New Table dialog, select Upload File, and then click browse.

      4. Select privacera_databricks.sh, and then click Open to upload it.

        Once the file is uploaded, the dialog will display the uploaded file path. This filepath will be required in the later step.

        The file will be uploaded to /FileStore/tables/privacera_databricks.sh path, or similar.

    2. Using the Databricks CLI, copy the script to a location in DBFS:

      databricks fs cp ~/<sourcepath_privacera_databricks.sh> dbfs:/<destinaton_path>
      

      For example:

      databricks fs cp ~/Downloads/privacera_databricks.sh dbfs:/FileStore/tables/
      
  6. You can add PrivaceraCloud to an existing cluster, or create a new cluster and attach PrivaceraCloud to that cluster.

    a. In the Databricks navigation panel select Clusters.

    b. Choose a cluster name from the list provided and click Edit to open the configuration dialog page.

    c. Open Advanced Options and select the Init Scripts tab.

    d. Enter the DBFS init script path name you copied earlier.

    e. Click Add.

    f. From Advanced Options, select the Spark tab. Add the following Spark configuration content to the Spark Config edit window. For more information on the properties, see Spark FGAC properties.

    spark.databricks.isv.product privacera
    spark.databricks.cluster.profile serverless
    spark.databricks.delta.formatCheck.enabled false
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar 
    spark.databricks.repl.allowedLanguages sql,python,r
    
  7. Restart the Databricks cluster.

Validate installation

To confirm the successful association of an access management policy to data in your Databricks installation, first make sure of the following:

  • You have created at least one resource policy associated with your data that gives a user access to the database. For an example of creating a policy for Databricks SQL, which is similar to policies for Databricks, see Example: Manage access to Databricks SQL with Privacera.

  • This resource policy must not be for Databrick's default database. You must configure the policy for any database other than the default.

Example steps: After you have applied a policy to a Databricks database, the user tries to access the database defined in the policy, and you confirm the results of the policy by looking at the events in the Access Manager > Audits logs.

  1. Login to Databricks as a user who is defined in the resource policy.

  2. Create or open an existing notebook. Associate the Notebook with the Databricks cluster you secured in the steps above.

  3. Select the database to which you have associated the policy.

  4. Run an SQL show tables command in the notebook:

    sql show tables ;
  5. On PrivaceraCloud, go to Access Manager > Audits to view the success or failure of the resource policy. A successful access is indicated as Allowed.

    image28.png

As additional check, you can create a Deny resource policy for a different user, run this same SQL access sequence as that user, and confirm a corresponding Denied event.

This section outlines the steps needed to setup OLAC in Databricks clusters. This setup is recommended for Scala language notebooks.

  • It provides OLAC on S3 locations accessed via Spark.

  • It uses privacera_s3 service for resource-based access control and privacera_tag service for tag-based access control.

  • It uses the signed-authorization implementation from Privacera.

Prerequisites

Ensure that the following prerequisites are met:

  • You must have an existing Databricks account and login credentials with sufficient privileges to manage your Databricks cluster.

  • PrivaceraCloud portal admin user access.

Setup OLAC in Databricks cluster

Note

For working with Delta format files, configure the AWS S3 application using IAM role permissions.

  1. Create a new AWS S3 Databricks connection. For more information, see Connect S3 to PrivaceraCloud.

    After creating an S3 application, follow the steps:

    1. In the BASIC tab, provide Access Key, Secret Key, or an IAM Role.

    2. In the ADVANCED tab, add the following properties.

      • Optional: Required to set this property only in the legacy workspace.

        dataserver.databricks.allowed.urls=<DATABRICKS_URL_LIST>

        where <DATABRICKS_URL_LIST>: Comma-separated list of the target Databricks cluster URLs.

        For example: dataserver.databricks.allowed.urls=https://dbc-yyyyyyyy-xxxx.cloud.databricks.com/

      • Optional: If you want to record the Service Principal name in audit logs, set the following property to true. Otherwise, the Service Princial ID is recorded.

        dataserver.dbx.olac.use.displayname=true
    3. Click Save.

  2. If you are updating an S3 application:

    1. Go to Settings > Applications > S3, and click the pen icon to edit properties.

    2. Click the Access Management toggle button.

    3. In the ADVANCED tab, add the following properties.

      • Optional: Required to set this property only in the legacy workspace.

        dataserver.databricks.allowed.urls=<DATABRICKS_URL_LIST>

        where <DATABRICKS_URL_LIST>: Comma-separated list of the target Databricks cluster URLs.

        For example: dataserver.databricks.allowed.urls=https://dbc-yyyyyyyy-xxxx.cloud.databricks.com/

      • Optional: If you want to record the Service Principal name in audit logs, set the following property to true. Otherwise, the Service Princial ID is recorded.

        dataserver.dbx.olac.use.displayname=true
    4. Save your configuration.

  3. Download the Databricks init script:

    1. Log in to the PrivaceraCloud portal.

    2. Generate the new API and Init Script. For more information, refer to the topic API Key on PrivaceraCloud.

    3. On the Databricks Init Script section, click the DOWNLOAD SCRIPT button.

      By default, this script is named privacera_databricks.sh. Save it to a local filesystem or shared storage.

  4. Upload the Databricks init script to your Databricks clusters:

    1. Log in to your Databricks cluster using administrator privileges.

    2. On the left navigation, click the Data icon.

    3. Click Add Data from the upper right corner.

    4. From the Create New Table dialog box select Upload File, then select and open privacera_databricks.sh.

    5. Copy the full storage path onto your clipboard.

  5. Add the Databricks init script to your target Databricks clusters:

    1. In the Databricks navigation panel select Clusters.

    2. Choose a cluster name from the list provided and click Edit to open the configuration dialog page.

    3. Open Advanced Options and select the Init Scripts tab.

    4. Enter the DBFS init script path name you copied earlier.

    5. Click Add.

    6. From Advanced Options, select the Spark tab. Add the following Spark configuration content to the Spark Config edit window. For more information on the properties, see Spark FGAC properties

      Note

      • If you are accessing only S3 and not accessing any other AWS services, do not associate any IAM role with the Databricks cluster. OLAC for S3 access relies on the Privacera Data Access Server.

        If you are accessing services other than S3, such as Glue or Kinesis, create an IAM role with minimal permission and associate that IAM role with those services.

      New properties:

      spark.databricks.isv.product privacera
      spark.databricks.repl.allowedLanguages sql,python,r,scala
      spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
      spark.executor.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
      spark.databricks.delta.formatCheck.enabled false

      Old properties:

      spark.databricks.isv.product privacera
      spark.databricks.repl.allowedLanguages sql,python,r,scala
      spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
      spark.hadoop.fs.s3.implcom.databricks.s3a.PrivaceraDatabricksS3AFileSystem
      spark.hadoop.fs.s3n.implcom.databricks.s3a.PrivaceraDatabricksS3AFileSystem
      spark.hadoop.fs.s3a.implcom.databricks.s3a.PrivaceraDatabricksS3AFileSystem
      spark.executor.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
      spark.hadoop.signed.url.enable true

      Note

      • From the PrivaceraCloud release 4.1.0.1 onwards, it is recommended to replace the old properties with the new properties. However, the old properties will also continue to work.

      • Old properties should only be used with Databricks versions 8.2 and lower because those versions are in extended support.

      • If you are upgrading the Databricks Runtime from an existing version (i.e., 6.4–8.2) to version 8.3 or higher, contact the Privacera technical sales representative for assistance.

      Add the following property in the Environment Variables text box:

      PRIVACERA_PLUGIN_TYPE=OLAC

      Properties to enable JWT Auth:

      privacera.jwt.oauth.enable true
      privacera.jwt.token /tmp/ptoken.dat
    7. Optional: Along with the allowed URL mentioned in configuration properties, you must disable the following flag in the legacy workspace:

      spark.hadoop.privacera.dbx.private.link.support.enable false
    8. Save and close.

    9. Restart the DatabricksCluster.

Your S3 Databricks cluster data resource is now available for Access Manager Policy Management, under Access Manager > Resource Policies, Service "privacera_s3".

This section outlines the steps needed to setup OLAC in Databricks clusters. This setup is recommended for Scala language notebooks.

  • It provides OLAC on ADLS locations accessed via Spark.

  • It uses privacera_adls service for resource-based access control and privacera_tag service for tag-based access control.

  • It uses the signed-authorization implementation from Privacera.

Prerequisites

Ensure that the following prerequisites are met:

  • You must have an existing Databricks account and login credentials with sufficient privileges to manage your Databricks cluster.

  • PrivaceraCloud portal admin user access.

Setup OLAC in Databricks cluster

  1. Create a new Azure ADLS Databricks connection. For more information, see Connect Azure Data Lake Storage Gen 2 (ADLS) to PrivaceraCloud.

    After creating an ADLS Gen2 application, follow the steps:

    1. In the BASIC tab, select a configuration type. See Connect Azure Data Lake Storage Gen 2 (ADLS) to PrivaceraCloud.

    2. In the ADVANCED tab, add the following properties.

      • dataserver.databricks.allowed.urls=<DATABRICKS_URL_LIST>

        where <DATABRICKS_URL_LIST>: Comma-separated list of the target Databricks cluster URLs.

        For example: dataserver.databricks.allowed.urls=https://dbc-yyyyyyyy-xxxx.cloud.databricks.com/

      • Optional: If you want to record the Service Principal name in audit logs, set the following property to true. Otherwise, the Service Princial ID is recorded.

        dataserver.dbx.olac.use.displayname=true
    3. Click Save.

  2. If you are updating an ADLS Gen2 application:

    1. Go to Settings > Applications > ADLS Gen2, and click the pen icon to edit properties.

    2. Click the Access Management toggle button.

    3. In the ADVANCED tab, add the following properties.

      • dataserver.databricks.allowed.urls=<DATABRICKS_URL_LIST>

        where <DATABRICKS_URL_LIST>: Comma-separated list of the target Databricks cluster URLs.

        For example: dataserver.databricks.allowed.urls=https://dbc-yyyyyyyy-xxxx.cloud.databricks.com/

      • Optional: If you want to record the Service Principal name in audit logs, set the following property to true. Otherwise, the Service Princial ID is recorded.

        dataserver.dbx.olac.use.displayname=true
    4. Save your configuration.

  3. Download the Databricks init script:

    1. Log in to the PrivaceraCloud portal.

    2. Generate the new API and Init Script. For more information, refer to the topic API Key on PrivaceraCloud.

    3. On the Databricks Init Script section, click the DOWNLOAD SCRIPT button.

      By default, this script is named privacera_databricks.sh. Save it to a local filesystem or shared storage.

  4. Upload the Databricks init script to your Databricks clusters:

    1. Log in to your Databricks cluster using administrator privileges.

    2. On the left navigation, click the Data icon.

    3. Click Add Data from the upper right corner.

    4. From the Create New Table dialog box select Upload File, then select and open privacera_databricks.sh.

    5. Copy the full storage path onto your clipboard.

  5. Add the Databricks init script to your target Databricks clusters:

    1. In the Databricks navigation panel select Clusters.

    2. Choose a cluster name from the list provided and click Edit to open the configuration dialog page.

    3. Open Advanced Options and select the Init Scripts tab.

    4. Enter the DBFS init script path name you copied earlier.

    5. Click Add.

    6. From Advanced Options, select the Spark tab. Add the following Spark configuration content to the Spark Config edit window. For more information on the properties, see Spark FGAC properties

      Note

      • For OLAC on Azure to read ADLS files, make sure the Databricks cluster does not have the Azure shared key or passthrough configured. OLAC for ADLS access relies on the Privacera Data Access Server.

      New properties:

      spark.databricks.isv.product privacera
      spark.databricks.repl.allowedLanguages sql,python,r,scala
      spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
      spark.executor.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
      spark.databricks.delta.formatCheck.enabled false

      Old properties:

      spark.databricks.isv.product privacera
      spark.databricks.repl.allowedLanguages sql,python,r,scala
      spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
      spark.hadoop.fs.s3.implcom.databricks.s3a.PrivaceraDatabricksS3AFileSystem
      spark.hadoop.fs.s3n.implcom.databricks.s3a.PrivaceraDatabricksS3AFileSystem
      spark.hadoop.fs.s3a.implcom.databricks.s3a.PrivaceraDatabricksS3AFileSystem
      spark.executor.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
      spark.hadoop.signed.url.enable true

      Note

      • From the PrivaceraCloud release 4.1.0.1 onwards, it is recommended to replace the old properties with the new properties. However, the old properties will also continue to work.

      • Old properties should only be used with Databricks versions 8.2 and lower because those versions are in extended support.

      • If you are upgrading the Databricks Runtime from an existing version (i.e., 6.4–8.2) to version 8.3 or higher, contact the Privacera technical sales representative for assistance.

      Add the following property in the Environment Variables text box:

      PRIVACERA_PLUGIN_TYPE=OLAC

      Properties to enable JWT Auth:

      privacera.jwt.oauth.enable true
      privacera.jwt.token /tmp/ptoken.dat
    7. Save and close.

    8. Restart the DatabricksCluster.

Your ADLS Gen2 Databricks cluster data resource is now available for Access Manager Policy Management, under Access Manager > Resource Policies, Service "privacera_adls".

Databricks cluster deployment matrix with Privacera plugin

Job/Workflow use-case for automated cluster:

Run-Now will create the new cluster based on the definition mentioned in the job description.

Table 10. 

Job Type  

Languages

FGAC/DBX version

OLAC/DBX Version

Notebook

Python/R/SQL

Supported [7.3, 9.1 , 10.4]

JAR

Java/Scala

Not supported

Supported[7.3, 9.1 , 10.4]

spark-submit

Java/Scala/Python

Not supported

Supported[7.3, 9.1 , 10.4]

Python

Python

Supported [7.3, 9.1 , 10.4]

Python wheel

Python

Supported [9.1 , 10.4]

Delta Live Tables pipeline

Not supported

Not supported



Job on existing cluster:

Run-Now will use the existing cluster which is mentioned in the job description.

Table 11. 

Job Type

Languages

FGAC/DBX version

OLAC

Notebook

Python/R/SQL

supported [7.3, 9.1 , 10.4]

Not supported

JAR

Java/Scala

Not supported

Not supported

spark-submit

Java/Scala/Python

Not supported

Not supported

Python

Python

Not supported

Not supported

Python wheel

Python

supported [9.1 , 10.4]

Not supported

Delta Live Tables pipeline

Not supported

Not supported



Interactive use-case

Interactive use-case is running a notebook of SQL/Python on an interactive cluster.

Table 12. 

Cluster Type

Languages

FGAC

OLAC

Standard clusters

Scala/Python/R/SQL

Not supported

Supported [7.3,9.1,10.4]

High Concurrency clusters

Python/R/SQL

Supported [7.3,9.1,10.4

Supported [7.3,9.1,10.4]

Single Node

Scala/Python/R/SQL

Not supported

Supported [7.3,9.1,10.4]