Configure Databricks Spark Object-level Access Control Plugin

Prerequisites

Dataserver must be installed and confirmed working:
- For AWS, see Configure Databricks Spark Fine-Grained Access Control Plugin [FGAC] [Python, SQL].
- For Azure, see Integrate ADLS with Privacera Platform using the Data Access Server.

Procedure

Run the following commands.

cd ~/privacera/privacera-manager/
cp config/sample-vars/vars.databricks.scala.yml config/custom-vars/
vi config/custom-vars/vars.databricks.scala.yml

Edit the following properties. For property details and description, refer to the Configuration Properties below.
```
DATASERVER_DATABRICKS_ALLOWED_URLS : "<PLEASE_UPDATE>"
DATASERVER_AWS_STS_ROLE: "<PLEASE_CHANGE>"
```

Run the following commands.

cd ~/privacera/privacera-manager
./privacera-manager.sh update

Databricks Spark object-level access control plugin configuration properties

Property	Description	Example
`DATABRICKS_SCALA_ENABLE`	Set the property to enable/disable Databricks Scala. This is found under Databricks Signed URL Configuration For Scala Clusters section.
`DATASERVER_DATABRICKS_ALLOWED_URLS`	A URL or list of comma-separated URLs. Privacera Dataserver serves only those URLs listed in this property. For an E2 shared workspace `DATASERVER_DATABRICKS_ALLOWED_URLS` is not required. However, for other Databricks configurations listed below, set `DATASERVER_DATABRICKS_ALLOWED_URLS` to point to: AWS single-tenant legacy Databricks workspace, as discussed by Databricks. Azure Databricks workspace.	`https://xxx-7xxxfaxx-xxxx.cloud.databricks.com`
`DATASERVER_AWS_STS_ROLE`	Add the instance profile ARN of the AWS role, which can access Delta Files in Databricks.	`arn:aws:iam::111111111111:role/assume-role`
`DATABRICKS_MANAGE_INIT_SCRIPT`	Set the init script. If enabled, Privacera Manager will upload Init script ('ranger_enable.sh') to the identified Databricks Host. If disabled, Privacera Manager will take no action regarding the Init script for the Databricks File System.
`DATABRICKS_SCALA_CLUSTER_POLICY_SPARK_CONF`	Configure Databricks Cluster policy. Add the following JSON in the text area: [{"Note":"First spark conf", "key":"spark.hadoop.first.spark.test", "value":"test1"}, {"Note":"Second spark conf", "key":"spark.hadoop.first.spark.test", "value":"test2"}]

Managing init script

Automatic Upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "true", the Init script will be uploaded automatically to your Databricks host. The Init Script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Manual Upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "false" the Init script must be uploaded to your Databricks host.

Open a terminal and connect to Databricks account using your Databricks login credentials/token.
- Connect using login credentials:
  1. If you're using login credentials, then run the following command.
    databricks configure --profile privacera
  2. Enter the Databricks URL.
    Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
  3. Enter the username and password.
    Username: email-id@yourdomain.com Password:
- Connect using Databricks token:
  1. If you don't have a Databricks token, you can generate one. For more information, see Configure JSON Web Tokens for Databricks.
  2. If you're using token, then run the following command.
    databricks configure --token --profile privacera
  3. Enter the Databricks URL.
    Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
  4. Enter the token.
    Token:
To check if the connection to your Databricks account is established, run the following command.
```
dbfs ls dbfs:/ --profile privacera
```
You should see the list of files in the output, if you are connected to your account.
Upload files manually to Databricks.
1. Copy the following files to DBFS, which are available in the PM host at the location, ~/privacera/privacera-manager/output/databricks:
  - ranger_enable_scala.sh
  - privacera_spark_scala_plugin.conf
  - privacera_spark_scala_plugin_job.conf
2. Run the following command. For the value of <DEPLOYMENT_ENV_NAME>, you can get it from the file, ~/privacera/privacera-manager/config/vars.privacera.yml.
```
export DEPLOYMENT_ENV_NAME=<DEPLOYMENT_ENV_NAME>
dbfs mkdirs dbfs:/privacera/${DEPLOYMENT_ENV_NAME} --profile privacera
dbfs cp ranger_enable_scala.sh dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
dbfs cp privacera_spark_scala_plugin.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
dbfs cp privacera_spark_scala_plugin_job.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
```
3. Verify the files have been uploaded.
```
dbfs ls dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
```
  The Init Script is uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Configure Databricks cluster

Once the update completes successfully, log on to the Databricks console with your account and open the target cluster, or create a new target cluster.
Open the Cluster dialog. enter Edit mode.
In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then the Spark tab.

Add the following content to the Spark Config edit box. For more information on the Spark config properties, see click here.

spark.databricks.isv.product privacera
spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
spark.executor.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
spark.databricks.repl.allowedLanguages sql,python,r,scala
spark.databricks.delta.formatCheck.enabled false

Optional: for legacy workspaces, add the following to the Spark Config edit box:
```
spark.hadoop.privacera.dbx.private.link.support.enable false
```

Optional: To use regional endpoints for S3 access, add the following name/value pairs to the Spark Config edit box, where <region> is the desired AWS region.

spark.hadoop.fs.s3a.endpoint https://s3.<region>.amazonaws.com
spark.hadoop.fs.s3.endpoint https://s3.<region>.amazonaws.com
spark.hadoop.fs.s3n.endpoint https://s3.<region>.amazonaws.com

In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then set init script path. For the <DEPLOYMENT_ENV_NAME> variable, enter the deployment name as defined for the DEPLOYMENT_ENV_NAME variable in the vars.privacera.yml.
```
dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh
```
For OLAC on AWS to read S3 files, whether an IAM role is needed depends on which services you are accessing:
- If you are accessing only S3 and not accessing any other AWS services, do not associate any IAM role with the Databricks cluster. OLAC for S3 access relies on the Privacera Data Access Server.
- If you are accessing services other than S3, such as Glue or Kinesis, create an IAM role with minimal permission and associate that IAM role with those services.
For OLAC on Azure to read ADLS files, make sure the Databricks cluster does not have the Azure shared key or passthrough configured. OLAC for ADLS access relies on the Privacera Data Access Server.
Save (Confirm) this configuration.
Start (or Restart) the selected Databricks Cluster.

Related information

For further reading, see:

If you want to enable JWT-based user authentication for your Databricks clusters, see Authenticate Privacera Platform services using JSON Web Tokens.
If you want PM to add cluster policies in Databricks, see Configure Databricks cluster policy.
If you want to add additional Spark properties for your Databricks cluster, see Add custom Spark configuration for Databricks on Privacera Platform.

Privacera Documentation

Table of ContentsTable of Contents

Configure Databricks Spark Object-level Access Control Plugin

Databricks Spark object-level access control plugin configuration properties

Managing init script

Configure Databricks cluster