Skip to content

EMR: Hive, PrestoDB, PrestoSQL#

This topic describes how to connect EMR application to PrivaceraCloud.

Note

PrivaceraCloud supports EMR versions above 6.x with Kerberos enabled.

Connect Application#

  1. Go the Settings > Applications.

  2. In the Applications screen, select EMR.

  3. Enter the application Name and Description, and then click Save.

  4. Click the toggle button to enable Access Management for your application.

Obtain Installation Script#

  1. In the Edit Application screen, click the Copy URL button to obtain installation script.

    Save this value. It will be needed the <emr-script-download-url>, in the following instructions.

    Connecting your EMR clusters to PrivaceraCloud portal can be done in one of two methods:

    • Method A: Attach PrivaceraCloud authorization in new EMR clusters

    • Method B: Attach PrivaceraCloud authorization in an existing EMR cluster

    Both methods start with obtaining an account-specific script from your PrivaceraCloud account, followed by adding a startup step to your EMR cluster.

    Related Information

    For further reading, see:

    • PrestoDB by default blocks few operations on Hive catalog. This can be enabled by updating hive.properties. For more information, see PrestoDB
  2. Click Save.

You can now use PrivaceraCloud to define fine-grained policies and control access to Hive and Presto resources within the EMR cluster.

Configure EMR Cluster#

From your AWS EMR web console:

  1. Find and Open AWS EMR cluster Step:

    1. For new EMR clusters (Method A), go to Create EMR > Advanced Options and click Go to advanced options.

    2. For existing EMR clusters (Method B), locate and the open the existing cluster for configuration update. Open the Steps tab and click Add Step.

  2. In the Add Step dialog, complete the fields as follows:

    Step type: Custom JAR
    Name: Install PrivaceraCloud Plugin
    JAR location: command-runner.jar
    Arguments:

    bash -c "wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh"
    

    Action on failure: Terminate cluster

The EMR Hive plug-in supports view-Level access management via the Data_admin feature. By default it supports view-based row-Level filtering and column masking. For more information see AWS User Guide: EMR.

  • This plug-in also supports View-level Access Management using Data_admin feature and View-based Row-Level Filtering and Column Masking features.

  • By default, the PrestoSQL plug-in on EMR will use policies from “privacera_hive" repository for Access Management. For more information see AWS User Guide: EMR.

Validate Installation#

In PrivaceraCloud, open Access Manager: Audit, and click the Plugin tab.  Look for audit items reporting the status "Policies synced to plugin. This indicates that your EMR Hive, Presto, or Spark data resource is connected. 

EMR Spark (Fine-Grained Access Control)#

These instructions enable Fine-Grained Access Control (FGAC) for an existing connected AWS S3 data resource. FGAC enables policies at the  database, table, and column level to be defined in service "privacera_hive" in Access Manager: Resource Policies. Either Object Level Acess Control (OLAC) or Fine-Grained Access Control (FGAC) can be added to an existing AWS S3 configuration but not both.

Once installed and enabled, each data user query is first parsed by Spark and authenticated by PrivaceraCloud Spark Plug-In. The requesting user must have authenticated access to all resources referenced by the query for it to be allowed.

Steps#

  1. In PrivaceraCloud, obtain your account unique call-in <emr-script-download-url> to allow the EMR cluster to obtain additional scripts and setup.

    1. Open Settings > API Key.

    2. Use an existing active API Key* or generate a new one.

      Note

      Make sure the Expiry column is set to "Never Expires".

    3. Click the i icon to get the scripts.

    4. Under AWS EMR Setup Script, click Copy Url. Save this value. It will be used as the <emr-script-download-url>, in the following instructions.

    From your AWS EMR web console:

    1. For new EMR clusters, go to Create EMR > Advanced Options and click Go to advanced options.

    2. For existing EMR clusters, locate and the open the existing cluster for configuration update. Open the Steps tab and click Add Step.

  2. Install the Privacera Spark FGAC Plugin.

    1. In a new cluster: select Configure Step > Custom JAR at the bottom of the configuration page.

      For an existing cluster: in Steps, select Custom Jar and click Add Step.

    2. Add the given values in the following fields and click Add.

      • Name: Install PrivaceraCloud Spark Plugin
      • JAR location: command-runner.jar
      • Arguments: add the following command:

        bash -c "wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"
        
      • Action on failure: Terminate cluster

Note

The Privacera plugin also supports view-level access control using Data admin, view-based row-Level filtering and column masking features. To enable this, see Spark in Privacera.

EMR Spark (Object Level Access Control)#

These instructions enable Object Level Access Control (OLAC) on an existing connected AWS S3 resource. If AWS S3 is not already configured, do so by following the instructions Connect Data Resource - AWS: S3 and Athena, then return here for additional configuration steps.

Either Object Level Acess Control (OLAC) or Fine-Grained Access Control (FGAC) can be added to an existing AWS S3 configuration, but not both.

Two subcomponents are installed:

  • Privacera Credential Token Service (P-CTS) is installed to the targeted AWS EMR master node. P-CTS is a secure service running on an EMR master node which provides encrypted access tokens to the requesting user. Tokens are encrypted using a shared secret key with the Privacera Cloud Signing Server.

  • Privacera Signing Agent (P-SA) installed to targeted AWS EMR worker nodes. P-SA redirects Spark S3 requests to the Privacera Cloud Signing Server with a P-CTS access token in the request. P-SA then provides the appropriate signed response to Spark for accessing the S3 data if:
    (a) The incoming request has a valid P-CTS token;
    and (b) The requesting user has permissions on the S3 resource as defined in the “privacera_s3“ service in Access Manager: Resource Policies.

These steps will:

  1. Create an AWS Kerberos-based Security Configuration.

  2. Establish a shared secret between PrivaceraCloud and the AWS EMR Kerberos based Security Configuration.

  3. Create a new AWS cluster configured to use that Security Configuration. That cluster will link back to the Privacera Signing Agent (P-SA) and Privacera Credential Token Service (P-CTS).

Prerequisites#

  1. Obtain or determine a character string to serve as a "shared key" between PrivaceraCloud and the AWS EMR cluster. We'll refer to this as <SHARED_KEY> in the configuration steps below.

  2. Obtain your account unique call-in <emr-script-download-url> to allow the EMR cluster to obtain additional scripts and setup from PrivaceraCloud. Steps:

    1. Open Settings: Api Key.
    2. Use an existing Active Api Key or create a new one. Set Expiry = Never Expires.
    3. Open the Api Key Info box (click the (i) in the key row).
    4. Copy and store as <emr-script-download-url> using the Copy Url link found under AWS EMR Setup Script.

Steps#

  1. In PrivaceraCloud console, Setting: Application, select the existing AWS Data Server application (S3 or Athena), and click the edit (pen) icon.

  2. In the the ADVANCED tab, add the following property:

    dataserver.shared.secret=<SHARED_KEY>
    
  3. Click Save.

From your AWS EMR web console:

  1. Create an EMR Security Configuration for Kerberos Authentication.

    1. Open your AWS EMR web console.

    2. Click Security Configurations, then Create.

    3. Provide a name for this Security Configuration such as PRIVACERA_KDC. We'll refer to this same Security Configuration later.

    4. Under Authentication, select Enable Kerberos authentication and complete the fields as appropriate for your environment.

  2. Create a new EMR cluster and assign to it the new Security Configuration.

    1. In the AWS EMR Console, create a new cluster.

    2. In Advanced Options, click Go to advanced options.

      Step 1: Software and Steps

      1. In the Software Configuration, select the appropriate EMR release and any associated applications.

      2. In Edit Software Settings, select Enter configuration, and add the following properties:

        [ {
            "classification":"spark-defaults",
            "properties":{
                "spark.driver.extraJavaOptions":"-javaagent:/usr/lib/spark/jars/privacera-signing-agent.jar",
                "spark.executor.extraJavaOptions":"-javaagent:/usr/lib/spark/jars/privacera-signing-agent.jar",
                }
        } ]
        
      3. In Steps, select Custom Jar and click Add Step.

        Add code to download and install the Privacera Credential Token Service. Complete the fields as below substituting your <emr-sript-download-url> value in the wget command below. Click Add when all fields are complete.

        • Name: Install Privacera CTS

        • JAR location: command-runner.jar

        • Arguments:

          bash -c "wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh priv-cts"
          
        • Action on failure: Continue

      4. Click Next to progress to Step 2: Hardware

      Step 2: Hardware

      1. In this step, select values Networking, Node, and Instance values as appropriate for your environment.

      Step 3: General Cluster Settings.

      Add two scripts that will Install Privacera Signing Agent on master and worker nodes.

      1. Assign Cluster name, Logging, Debugging, and Termination protection as appropriate for your environment.

      2. Install the Master signing agent:

        Go to Additional Options > Bootstrap Actions and select bootstrap action Run if and click Configure and add to open the Add Bootstrap Action dialog.

        In this dialog set the name to Privacera Signing Agent for Master, copy and paste into Optional Arguments the following script, using your own <emr-script-download-url>and click Add when done.

        instance.isMaster=true "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"
        

        To enable Delta Lake on EMR, run the following command:

        instance.isMaster=true "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<EMR_SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fbac"
        

        <EMR_SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL> is the download URL of Delta Lake core JAR. The Delta Lake core JAR has dependency with Spark version. - You have to find the appropriate version for your EMR. See Delta Lake compatibility with Apache Spark. - Get the appropriate Delta Lake core JAR download link and update the property. See Delta Core.

        For example, for Spark version 3.1.x, the download URL is https://repo1.maven.org/maven2/io/delta/delta-core_2.12/1.0.1/delta-core_2.12-1.0.1.jar.

      3. The Worker signing agent is installed in the same way.Under Additional Options, expand Bootstrap Actions, select bootstrap action Run if and click Configure and add to open the Add Bootstrap Action dialog. In this dialog set the name to

        Privacera Signing Agent for Worker, copy and paste into Optional Arguments the following script, using your own <emr-script-download-url>and click Add when done.

        instance.isMaster=false "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"
        

        To enable Delta Lake on EMR, run the following command:

        instance.isMaster=false "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<EMR_SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fbac"
        

        <EMR_SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL> is the download URL of Delta Lake core JAR. The Delta Lake core JAR has dependency with Spark version. - You have to find the appropriate version for your EMR. See Delta Lake compatibility with Apache Spark. - Get the appropriate Delta Lake core JAR download link and update the property. See Delta Core.

        For example, for Spark version 3.1.x, the download URL is https://repo1.maven.org/maven2/io/delta/delta-core_2.12/1.0.1/delta-core_2.12-1.0.1.jar.

      Step 4: Security

      1. Complete Security Options as appropriate for your environment.

      2. Open Security Configuration, and select the configuration you created earlier, e.g. "PRIVACERA_KDC". Enter values inthe following fileds:

        • Realm

        • KDC admin password

    3. Click Create cluster to complete.


Last update: March 25, 2022