Azure Data Factory Integration with Privacera Enabled Databricks Cluster

This topic is about Azure Data Factory integration with Privacera enabled Databricks cluster.

Prerequisites

Create Azure Data Factory, for more information see Create a data factory.

Create a pipeline with new Databricks cluster

Open Azure Data Factory studio.
Open Author > Factory Resources > Pipelines and click on the ellipsis to create a new pipeline, for more information see Create a new pipeline.
The Properties section is displayed.
Enter the Name and Description for the Pipeline.
On the Activities section, select Databricks > Notebook and then drag the Notebook to the right panel to configure the cluster.
Select the Notebook and navigate to Azure Databricks tab.
Click +New to create Databricks linked service.
The New linked service section is displayed.

Enter or choose appropriate values in the New linked service section:

Table 3.

Field	Values
Name	Auto-populated or you can enter a new name
Description	Enter a brief description
Connect via integration runtime	Auto-populated or select a value from the dropdown
Account selection method	Select a value from the options
Azure subscription	Select Azure subscription
Databrick Workspace URL	Enter Databrick Workspace URL
Authentication Type	Select Authentication Type as Managed service identity
Workspace resource ID	Auto-populated depending on the Databrick Workspace URL
Select cluster	Select cluster as New job cluster
Cluster version	Select appropriate cluster version
Cluster node type	Select appropriate cluster node type
Python Version	Select appropriate Python version
Worker options	Select Worker options as Fixed
Workers	Enter the number of workers
Additional cluster settings	Provide or enter the following Spark properties. For more information see Spark FGAC properties. spark.hadoop.privacera.fgac.use.cluster.owner true Default Value = False spark.hadoop.privacera.fgac.use.cluster.ownertag owner_email Default Value = Owner
Databricks int scripts	Add Databricks init scripts, for more information see Obtain Init Script for Databricks FGACConnect Databricks to PrivaceraCloud

Create a pipeline with an existing Databricks cluster

Open Azure Data Factory studio.
Open Author > Factory Resources > Pipelines and click on the ellipsis to create a new pipeline, for more information see Create a new pipeline.
The Properties section is displayed.
Enter the Name and Description for the Pipeline.
On the Activities section, select Databricks > Notebook and then drag the Notebook to the right panel to configure the cluster.
Select the Notebook and navigate to Azure Databricks tab.
Click +New to create Databricks linked service.
The New linked service section is displayed.

Enter or choose appropriate values in the New linked service section:

Table 4.

Fields	Values
Name	Auto-populated or you can enter a new name
Description	Enter a brief description
Connect via integration runtime	Auto-populated or select a value from the dropdown
Account selection method	Select a value from the options
Azure subscription	Select Azure subscription
Databricks workspace	Select the appropriate Databricks workspace
Select cluster	Select cluster as Existing interactive cluster
Databricks workspace URL	Enter Databrick Workspace URL
Authentication Type	Select Authentication Type as Managed service identity
Workspace resource ID	Auto-populated depending on the Databrick Workspace URL
Existing cluster ID	Select existing cluster ID from your existing cluster.
Update existing cluster with additional spark properties. For more information see Spark FGAC properties	spark.hadoop.privacera.fgac.use.cluster.owner true Default Value = False spark.hadoop.privacera.fgac.use.cluster.ownertag owner_email Default Value = Owner
Databricks int scripts	Existing clusters contain Databricks init script. Update cluster with Privacera plugin, for more information see Obtain Init Script for Databricks FGAC.Connect Databricks to PrivaceraCloud

Click Test Connection , once the connection is successful click Create.

Validate and Debug pipeline

To validate the pipeline, click the Validate button and click Debug/Trigger to run the pipeline. Once succeeded, you can check the audit log on Privacera portal. For more information, see Validate and Debug Pipelines

Privacera Documentation

Table of ContentsTable of Contents

Azure Data Factory Integration with Privacera Enabled Databricks Cluster

Prerequisites

Create a pipeline with new Databricks cluster

Create a pipeline with an existing Databricks cluster

Validate and Debug pipeline