Skip to content

Connect Data Sources

Data repositories are connected to PrivaceraCloud by configuring connectors to data sources. 

PrivaceraCloud uses three different connector methods: Data Access Server, Policy Sync, and Plug-In.

The appropriate connector method depends on several factors, including the type of data resource and the type and level of control required.

A service must be added and activated. See Access Manager: Service Config for adding and activating a corresponding service.

Activation of the corresponding service also creates corresponding resource service and service group in Access Manager: Resource Policies.

A default set of resource policies will be automatically created for each newly created resource service. This will include an all access default policy. Additional policies can be created and defined in PrivaeraCloud Access Manager:  Resource Policies

Data Access Server#

The Data Access Server integration method redirects data access requests to a Privacera Data 'authentication broker inserted into the control and data flow. A maximum of one Data Access Server can be enabled at one time.

Data Access Server style connector configuration is done in Settings: Datasource. See the appropriate setup section for each of the following:

Policy Sync#

A Policy Sync integration works by mapping PrivaceraCloud defined Resource Policies to the native access controls functions provided by the target data repository system. This approach is used for data repository systems providing a sufficient native level of data control. This method is used for: Azure Synapse, MS SQL, AWS Redshift, Snowflake, and Postgres. PrivaceraCloud supports multiple concurrent Policy Sync connections but only one Policy Sync connector of each data resource type.

To configure PolicySync connectors, Settings: Datasource.

Plug-In Connections#

Databricks Spark, EMR PrestoDB, and EMR Hive have built-in support for external authentication using plug-in architecture. Privacera inserts itself into the Databricks or EMR authentication control flow using a Plug-In module. Authentication for data access requests are directed to the PrivaceraCloud Plugin component by the repository system itself. This is the most direct and efficient method and is transparent to the data users. Each PrivaceraCloud allows multiple concurrent plug-in connections. This method is used for:

AWS EMR: Hive, PrestoDB, PrestoSQL#

Use one of two methods for attaching PrivaceraCloud to your EMR clusters:

  • Method A: Attach PrivaceraCloud authorization in new EMR Clusters

  • Method B: Attach PrivaceraCloud authorization in an existing EMR cluster

Both steps start with obtaining an account-specific script from your PrivaceraCloud account, followed by adding a startup step to your EMR Cluster.

Obtain Installation Script#

Obtain the account unique <emr-script-download-url>. This script will run as a step in the cluster and will complete the PrivaceraCloud installation.

Steps:

  1. Open Settings: Api Key.

  2. Use an existing Active Api Key or create a new one. Set Expiry = Never Expires.

  3. Open the Api Key Info box (click the (i) in the key row).

  4. Under AWS EMR Setup Script, click Copy Url. Save this value. It is needed the <emr-script-download-url>, in the following instructions.

Configure EMR Cluster#

From your AWS EMR web console:

  1. Find and Open AWS EMR Cluster Step':

    1. For new EMR Clusters (Method A), in the Create EMR dialog, open "Advanced Options" (click Go to advanced options).

    2. For existing EMR Clusters (Method B), locate and the open the existing Cluster for configuration update. Open the Steps tab. Click Add Step.

  2. In the Add Step dialog, complete the fields as follows:

    Step type: Custom JAR
    Name: Install PrivaceraCloud Plugin
    JAR location: command-runner.jar
    Arguments:

        bash -c "wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh"
    

    Action on failure: Terminate cluster

The EMR Hive plug-in supports view-Level access management via the Data_admin feature. By default it supports view-based row-Level filtering and column masking. For more information see AWS User Guide: EMR.

  • This plug-in also supports View-level Access Management using Data_admin feature and View-based Row-Level Filtering and Column Masking features.

  • By default, the PrestoSQL plug-in on EMR will use policies from “privacera_hive” repository for Access Management. For more information see AWS User Guide: EMR.

Validate Installation#

In PrivaceraCloud, open Access Manager: Audit, and click the Plugin tab.  Look for audit items reporting the status "Policies synced to plugin. This indicates that your EMR Hive, Presto, or Spark data resource is connected. 

EMR Spark (Fine-Grained Access Control)#

These instructions enable Fine-Grained Access Control (FGAC) for an existing connected AWS S3 data resource. FGAC enables policies at the  database, table, and column level to be defined in service "privacera_hive" in Access Manager: Resource Policies. Either Object Level Acess Control (OLAC) or Fine-Grained Access Control (FGAC) can be added to an existing AWS S3 configuration but not both.

Once installed and enabled, each data user query is first parsed by Spark and authenticated by PrivaceraCloud Spark Plug-In. The requesting user must have authenticated access to all resources referenced by the query for it to be allowed.

Steps#

  1. In PrivaceraCloud, obtain your account unique call-in <emr-script-download-url> to allow the EMR cluster to obtain additional scripts and setup.
    1. Open Settings: Api Key.
    2. Use an existing Active Api Key or create a new one. Set Expiry = Never Expires.
    3. Open the Api Key Info box (click the (i) in the key row).
    4. Under AWS EMR Setup Script, click Copy Url. Save this value. It will be used as the <emr-script-download-url>, in the following instructions.

From your AWS EMR web console:

  1. Adding a step:

    1. For a new AWS EMR Cluster:

      1. In AWS, EMR console, when creating a new Cluster.

      2. In "Advanced Options" (click Go to advanced options, next to Quick Options), select the appropriate EMR version and the desired associated applications.

        Skip ahead to the "Add Step" instruction.

    2. Adding a step to an existing AWS EMR Cluster:

      1. In AWS, EMR console, locate the existing Cluster.

      2. Open the Steps tab. Click Add Step

  2. Add Step to install the Privacera Spark FGAC Plugin.

    1. In a new Cluster, look at the bottom of the configuration page. Select Configure Step, Custom JAR
      For an existing Cluster, open the Cluster to edit, click the Steps tab, to open the Add Step dialog.
    2. Complete the fields as follows:

      • Step type: Custom JAR
      • Name:

        Install PrivaceraCloud Spark Plugin
        
      • JAR location: command-runner.jar

      • Arguments:

        bash -c "wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"
        
      • Action on failure: Terminate cluster

      The results will be similar to:

Note

The Privacera plugin also supports view-level access control using Data admin, view-based row-Level filtering and column masking features. To enable this, see Spark in Privacera.

Databricks - SQL#

PrivaceraCloud integrates with Databricks SQL using the  Plug-In integration method with an account-specific cluster-scoped initialization script.  Privacera’s Spark Plug-In will be installed on the Databricks Cluster enabling Fine-Grained Access Control. This script will be added it to your cluster as an init script to run at Cluster startup.

Note

Accounts upgrading from PrivaceraCloud 2.0 to PrivaceraCloud 2.1 and intending to use Privacera Encryption with Databricks must re-install the init script to Databricks.

As your Cluster is restarted, it runs the 'init script and connects to PrivaceraCloud.  From that point, access to Cluster-associated resources is monitored and controlled by the policies defined in the Resource Policies using the privacera_hive, privacera_s3, privacera_adls, and privacera_files resource service.

You must have an existing Databricks account and login credentials with sufficient privileges to manage your Databricks cluster.  

Steps#

  1. Log in to the PrivaceraCloud portal as an admin user (role ROLE_ACCOUNT_ADMIN).

  2. Generate the new API and Init Script. For more information, refer to the topic API Key.

  3. On the Databricks Init Script section, click DOWNLOAD SCRIPT.

    By default, this script is named privacera_databricks.sh.  Save it to a local filesystem or shared storage.

  4. Log in to your Databricks account using credentials with sufficient account management privileges. 

  5. Copy the "Init" script to your Databricks cluster. This can be done via the UI or using the Databricks CLI.

    1. Using the Databricks UI:

      1. On the left navigation, click the Data Icon ,.

      2. Click the Add Data button from the upper right corner.

      3. In the Create New Table dialog, select Upload File, and then click browse. 

      4. Select privacera_databricks.sh, and then click Open to upload it. 

        Once the file is uploaded, the dialog will display the uploaded file path. This filepath will be required in the later step.

        The file will be uploaded to /FileStore/tables/privacera_databricks.sh path, or similar.

    2. Using the Databricks CLI, copy the script to a location in DBFS:

      databricks fs cp ~/<sourcepath_privacera_databricks.sh> dbfs:/<destinaton_path>
      

      For example:

      databricks fs cp ~/Downloads/privacera_databricks.sh dbfs:/FileStore/tables/
      

  6. You can add PrivaceraCloud to an existing Cluster, or create a new Cluster and attach PrivaceraCloud to that Cluster. 
    Open the Clusters dialog (Click Clusters  in the Databricks navigation bar on the left).

    1. For a new cluster, click +Create Cluster.  For an existing cluster, click the Cluster name to open its configuration dialog, and then click Edit

    2. In Edit mode,  at the bottom of this dialog, open Advanced Options, and, in the Init Scripts tab, add the full filepath to the init script, as shown below:

  7. In this same Advanced Options tabbed interface, open the Spark tab.  Add the following Spark configuration content to the Spark Config edit window.

    spark.databricks.isv.product privacera 
    spark.databricks.cluster.profile serverless
    spark.databricks.delta.formatCheck.enabled false
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
    spark.databricks.repl.allowedLanguages sql,python,r
    
  8. Restart the Cluster.

Related Information

For further reading, see:

Validate Installation#

Confirm connectivity by executing a simple data access sequence and then examining the PrivaceraCloud audit stream. 

You will see corresponding events in the Access Manager > Audits.

Example data access sequence:

  1. Create or open an existing Notebook. Associate the Notebook with the Databricks Cluster you secured in the steps above.

  2. Run an SQL show tables command in the Notebook:

  3. On the PrivaceraCloud, go to Access Manager > Audits to view the monitored data access.

  4. Create a Deny policy, run this same SQL access sequence a second time, and confirm corresponding Denied events.

Qubole Presto#

Connecting your Qubole Presto cluster to PrivaceraCloud consists of three basic steps.

The first step is to create a service user on PrivaceraCloud for data user access control call-in from Presto to PrivaceraCloud.

The second step is to create, or identify and use an existing, unique call-in authentication (access control) and audit URLs from your Qubole Presto cluster to PrivaceraCloud.

The third step is to configure your Qubole Presto cluster to first load the necessary Privacera hosted Apache Ranger Plug in components (on boot), and execute the call-in for access control and audit.

See reference section Qubole Cluster Setup for specific Qubole configuration step-by-step instructions.

Starburst Enterprise (Presto)#

Connecting a Starburst Enterprise Platform (SEP) requires setup on the SEP host. See Starburst Enterprise Platform (SEP) Setup for setup and configuration.


Last update: October 15, 2021