Skip to main content

Privacera Documentation

Set up Discovery on Databricks for Privacera Platform

This topic covers the installation of Privacera Discovery on Databricks.

  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.discovery.databricks.yml config/custom-vars/
    vi custom-vars/vars.discovery.databricks.yml
    
  3. Add and provide the following details in custom-vars/vars.discovery.databricks.yml file if the Databricks plugin is not enabled. To configure Databricks plugin, see Configure Databricks Spark Fine-Grained Access Control Plugin [FGAC] [Python, SQL].

    DATABRICKS_HOST_URL: "<PLEASE_UPDATE>"
    DATABRICKS_TOKEN: "<PLEASE_UPDATE>"
    
    DATABRICKS_WORKSPACES_LIST:
    - alias: DEFAULT
        databricks_host_url: "{{DATABRICKS_HOST_URL}}"
        token: "{{DATABRICKS_TOKEN}}"
    
  4. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    AWS

    DATABRICKS_DRIVER_INSTANCE_TYPE: "m5.xlarge"
    DATABRICKS_INSTANCE_TYPE: "m5.xlarge"
    DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12"
    DATABRICKS_DISCOVERY_INSTANCE_PROFILE: "arn:aws:iam::<ACCOUNT_ID>:instance-profile/<DATABRICKS_CLUSTER_IAM_ROLE>"
    DISCOVERY_AWS_CLOUD_ASSUME_ROLE: "true"
    DISCOVERY_AWS_CLOUD_ASSUME_ROLE_ARN: "arn:aws:iam::<ACCOUNT_ID>:role/<DISCOVERY_IAM_ROLE>"
    

    Azure

    DATABRICKS_DRIVER_INSTANCE_TYPE: "Standard_DS3_v2"
    DATABRICKS_INSTANCE_TYPE: "Standard_DS3_v2"
    DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12"

Note

PRIVACERA_DISCOVERY_DATABRICKS_DOWNLOAD_URL is no longer in use. The Discovery Databricks packages will be downloaded from PRIVACERA_BASE_DOWNLOAD_URL.

Databricks Discovery configuration properties

Property

Description

Example

DATABRICKS_DRIVER_INSTANCE_TYPE

For AWS driver's instance type can be "m5.xlarge" or "m5.2xlarge"

For Azure driver's instance type can be "Standard_DS3_v2"

m5.xlarge

DATABRICKS_INSTANCE_TYPE

For AWS driver's instance type can be "m5.xlarge" or "m5.2xlarge"

For Azure driver's instance type can be "Standard_DS3_v2"

m5.xlarge

SETUP_DATABRICKS_JAR

USE_DATABRICKS_SPARK

DATABRICKS_ELASTIC_DISK

DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT

Set to true if you want to create databricks init script.

false

DATABRICKS_DISCOVERY_WORKERS

DATABRICKS_DISCOVERY_JOB_NAME

DATABRICKS_DISCOVERY_SPARK_VERSION

Spark version can be as follows:

  • 6.4.x-scala2.11 (Spark 2.4)

  • 7.3.x-scala2.12 (Spark 3.0)

  • 7.4.x-scala2.12 (Spark 3.0)

  • 7.5.x-scala2.12 (Spark 3.0)

  • 7.6.x-scala2.12 (Spark 3.0)

7.3.x-scala2.12

DATABRICKS_DISCOVERY_INSTANCE_PROFILE

Property is used for the instance role, for the Databricks instance node where your discovery will be running

arn:aws:iam::1234564835:instance-profile/privacera_databricks_cluster_iam_role

DISCOVERY_AWS_CLOUD_ASSUME_ROLE

Property to grant Discovery access to AWS services to perform the scanning operation.

true

DISCOVERY_AWS_CLOUD_ASSUME_ROLE_ARN

ARN of the AWS IAM Role

arn:aws:iam::12345671758:role/DiscoveryCrossAccAssumeRole_k