Skip to main content

PrivaceraCloud Documentation

Elastic MapReduce from Amazon

:

EMR: Hive, PrestoDB, PrestoSQL

This topic describes how to connect an EMR application to PrivaceraCloud.

Note

PrivaceraCloud supports EMR versions 6.x and higher with Kerberos enabled.

Connect EMR application

  1. Go the Settings > Applications.

  2. In the Applications screen, select EMR.

  3. Enter the application Name and Description, and then click Save.

  4. Click the toggle button to enable Access Management for your application.

Obtain Init script for EMR

  1. In the Edit Application screen, click the Copy URL button to obtain installation script.

    Save this value, it will be needed for the <emr-script-download-url> later on.

    EMR clusters can be connected to the PrivaceraCloud in two ways:

    • Attach PrivaceraCloud authorization in new EMR clusters.

    • Attach PrivaceraCloud authorization in an existing EMR cluster.

    Both methods start with obtaining an account-specific script from your PrivaceraCloud account, followed by adding a startup step to your EMR cluster.

    Notice

    PrestoDB by default blocks few operations on Hive catalog. This can be enabled by updating hive.properties.

  2. Click Save.

You can now use PrivaceraCloud to define fine-grained policies and control access to Hive and Presto resources within the EMR cluster.

Configure EMR cluster

From your AWS EMR web console:

  1. Open your AWS EMR cluster, then:

    1. For new EMR clusters , go to Create EMR > Advanced Options and click Go to advanced options.

    2. For existing EMR clusters, locate and the open the existing cluster for configuration update. Open the Steps tab and click Add Step.

  2. In the Add Step dialog, complete the fields as follows:

    Step type: Custom JAR

    Name: Install PrivaceraCloud Plugin

    JAR location: command-runner.jar

    Arguments:

    bash -c "wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh"

    Action on failure: Terminate cluster

    The EMR Hive plug-in supports view-level access management via the Data_admin feature. By default it supports view-based row-Level filtering and column masking.

    • This plug-in also supports View-level Access Management using Data_admin feature and View-based Row-Level Filtering and Column Masking features.

    • By default, the PrestoSQL plug-in on EMR will use policies from privacera-hive repository for Access Management.

Validate installation

In PrivaceraCloud, open Access Manager: Audit, and click the Plugin tab. Look for audit items reporting the status "Policies synced to plugin. This indicates that your EMR Hive, Presto, or Spark data resource is connected.

EMR Spark (Fine-Grained Access Control)

These instructions enable Fine-Grained Access Control (FGAC) for an existing connected AWS S3 data resource. FGAC enables policies at the database, table, and column level to be defined in service "privacera_hive" in Access Manager: Resource Policies. Either Object Level Acess Control (OLAC) or Fine-Grained Access Control (FGAC) can be added to an existing AWS S3 configuration but not both.

Once installed and enabled, each data user query is first parsed by Spark and authenticated by PrivaceraCloud Spark Plug-In. The requesting user must have authenticated access to all resources referenced by the query for it to be allowed.

  1. In PrivaceraCloud, obtain your account unique call-in <emr-script-download-url> to allow the EMR cluster to obtain additional scripts and setup.

    1. Open Settings > API Key.

    2. Use an existing active API Key* or generate a new one.

      Caution

      Make sure the Expiry column is set to "Never Expires".

    3. Click the i icon to get the scripts.

    4. Under AWS EMR Setup Script, click Copy Url. Save this value. It will be used as the <emr-script-download-url>, in the following instructions.

  2. From the AWS EMR web console:

    • For new EMR clusters, go to Create EMR > Advanced Options and click Go to advanced options.

    • For existing EMR clusters, locate and the open the existing cluster for configuration update. Open the Steps tab and click Add Step.For new EMR clusters, go to Create EMR > Advanced Options and click Go to advanced options.

    Note

    To add multiple JWT configurations, see How to configure multiple JSON Web Tokens (JWTs) for EMR

  3. Install the Privacera Spark FGAC Plugin:

    1. In a new cluster: select Configure Step > Custom JAR at the bottom of the configuration page.

      For an existing cluster: in Steps, select Custom Jar and click Add Step.

    2. Add the given values in the following fields and click Add.

      • Name: Install PrivaceraCloud Spark Plugin

      • JAR location: command-runner.jar

      • Arguments: add the following command:

        bash -c "wget <emr-script-download-url> 
        chmod +x ./privacera_emr.sh 
        sudo ./privacera_emr.sh spark-fgac"                                         
      • Action on failure: Terminate cluster

    3. (Optional) To specify the custom policy name for hive, spark, or trino services, export the following variable in arguments:

      bash -c "export 
      EMR_HIVE_SERVICE_NAME=<hive_repo_name>; export 
      EMR_TRINO_HIVE_SERVICE_NAME=<trino_hive_repo_name>; export 
      EMR_SPARK_HIVE_SERVICE_NAME=<spark_hive_repo_name>; wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"

      where:

      hive_repo_nameis a custom hive service name for hive application in EMR.

      spark_hive_repo_nameis a custom hive service name for spark applications in EMR.

      trino_hive_repo_nameis a custom hive service name for trino application in EMR.

      Note

      The Privacera plugin also supports view-level access control using Data admin, view-based row-Level filtering and column masking features.

EMR Spark (Object Level Access Control)

These instructions enable Object Level Access Control (OLAC) for existing connected AWS S3 resources. If AWS S3 is not already configured, do so by following the instructions here, then follow these additional configuration steps.

Either Object Level Access Control (OLAC) or Fine-Grained Access Control (FGAC) can be added to an existing AWS S3 configuration, but not both.

Two subcomponents are installed:

  • Privacera Credential Token Service (P-CTS) is installed to the targeted AWS EMR master node. P-CTS is a secure service running on an EMR master node which provides encrypted access tokens to the requesting user. Tokens are encrypted using a shared secret key with the Privacera Cloud Signing Server.

  • Privacera Signing Agent (P-SA) installed to targeted AWS EMR worker nodes. P-SA redirects Spark S3 requests to the Privacera Cloud Signing Server with a P-CTS access token in the request. P-SA then provides the appropriate signed response to Spark for accessing the S3 data if:

    (a) The incoming request has a valid P-CTS token;

    and (b) The requesting user has permissions on the S3 resource as defined in the “privacera_s3“ service in Access Manager: Resource Policies.

Prerequisites

  1. Obtain or determine a character string to serve as a "shared key" between PrivaceraCloud and the AWS EMR cluster. We'll refer to this as <SHARED_KEY> in the configuration steps below.

  2. Obtain your account unique call-in <emr-script-download-url> to allow the EMR cluster to obtain additional scripts and setup from PrivaceraCloud:

    1. Open Settings: Api Key.

    2. Use an existing Active Api Key or create a new one. Set Expiry = Never Expires.

    3. Open the Api Key Info box (click the (i) in the key row).

    4. Copy and store as <emr-script-download-url> using the Copy Url link found under AWS EMR Setup Script.

AWS configuration steps

  1. Create an EMR Security Configuration for Kerberos Authentication:

    1. Open your AWS EMR web console.

    2. Click  Security Configurations, then Create.

    3. Provide a name for this Security Configuration such as PRIVACERA_KDC. We'll refer to this same Security Configuration later.

    4. Under Authentication, select Enable Kerberos authentication and complete the fields as appropriate for your environment.

  2. Create a new EMR cluster and assign to it the new Security Configuration.

    1. In the AWS EMR Console, create a new cluster.

    2. In Advanced Options, click Go to advanced options.

    3. In the Software Configuration, select the appropriate EMR release and any associated applications.

    4. In Edit Software Settings, select Enter configuration, and add the following properties:

      [ { "classification":"spark-defaults", "properties":{ "spark.driver.extraJavaOptions":"-javaagent:/usr/lib/spark/jars/privacera-signing-agent.jar", "spark.executor.extraJavaOptions":"-javaagent:/usr/lib/spark/jars/privacera-signing-agent.jar", } } ]
    5. In Steps, select Custom Jar and click Add Step.

      Add code to download and install the Privacera Credential Token Service. Complete the fields as below substituting your <emr-sript-download-url>, value in the wget command below. Click Add when all fields are complete.

      • Name: ``Install Privacera CTS```

      • JAR location: command-runner.jar

      • Arguments:bash -c "wget &lt;emr-script-download-url&gt; ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh priv-cts"

      • Action on failure: Continue

      Click Next.

  3. Configure hardware by selecting values Networking, Node, and Instance values as appropriate for your environment.

  4. Configure general cluster settings by adding two scripts that will Install Privacera Signing Agent on master and worker nodes.

    1. Assign Cluster name, Logging, Debugging, and Termination protection as appropriate for your environment.

    2. Install the Master signing agent:

      1. Go to Additional Options > Bootstrap Actions and select bootstrap action "Run if" and click Configure and add to open the Add Bootstrap Action dialog.

      2. In this dialog set the name to Privacera Signing Agent for Master, copy the following script into Optional Arguments the and click Add when done. Replace <emr-script-download-url> with your own value.

        instance.isMaster=true "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"
      3. The Worker signing agent is installed in the same way. Under Additional Options, expand Bootstrap Actions, select bootstrap action "Run if" and click Configure and add to open the Add Bootstrap Action dialog. In this dialog set the name to Privacera Signing Agent for Worker, copy the following script into Optional Arguments . Replace <emr-script-download-url> with your own value.

        instance.isMaster=false "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"
  5. Configure security options

    1. Complete Security Options as appropriate for your environment.

    2. Open Security Configuration, and select the configuration you created earlier, e.g. "PRIVACERA_KDC". Then n the following fields, enter values:

      • Realm

      • KDC admin password

  6. Click Create cluster to complete.