Skip to main content

Privacera Documentation

Connect AWS EMR with Native Apache Ranger to Privacera Platform

AWS EMR provides native Apache Ranger integration with the open source Apache Ranger plugins for Apache Spark and Hive. By connecting EMR’s native Ranger with Privacera’s Ranger-based data access governance, you can:

  • Sync your existing policies with your EMR solution.

  • Extend Apache Ranger’s open source capabilities to take advantage of Privacera’s centralized enterprise-ready solution.

Note

Supported EMR version: 5.32 and above in EMR 5.x series.

Prerequisites

To store the Ranger Admin and Ranger plugin certificates, AWS Secrets are required for the following:

  • ranger-admin-pub-cert

  • ranger-plugin-private-keypair

To create these secrets in AWS Secret Manager, do the following:

  1. Log in to AWS console, select Secrets Manager, and then click Store a new secret.

  2. Select the Other type of secrets secret type

  3. Go to the Plaintext tab. Keep the Default value unchanged. You will get the actual value for this secret after installation.

  4. Select the encryption key as per your requirement.

  5. Click Next.

  6. Enter a name for the secret into the Secret name field. For example: ranger-admin-pub-cert, ranger-plugin-private-keypair.

  7. Click Next.

    The Configure automatic rotation page is displayed.

  8. Click Next.

  9. On the Review page, you can check your secret settings and then click Store to save your changes.

    The Secret is stored successfully.

Procedure

To connect AWS EMR with Native Apache Ranger to Privacera Platform, follow these steps:

  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.emr.native.ranger.yml config/custom-vars/
    vi config/custom-vars/vars.emr.native.ranger.yml
    
  3. Edit the properties. For property details, see EMR Native Apache Ranger properties.

    Note

    You can also add custom properties that are not included by default. For more information, see EMR custom properties.

  4. Run the following commands.

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update
    

    Once update is done, all the CloudFormation JSON template files will be available at ~/privacera/privacera-manager/output/emr-native-ranger path.

  5. Run the following command in the AWS instance where Privacera is installed.

    cd ~/privacera/privacera-manager/output/emr-native-ranger
    
  6. Create the certificates which needs to be added in AWS Secrets Manager.

    You will get multiple prompts to enter the keystore password. Use the property value of RANGER_PLUGIN_SSL_KEYSTORE_PASSWORD set in ~/privacera/privacera-manager/config/custom-vars/vars.ssl.yml for each prompt.

    1. Run the following command.

      ./emr-native-create-certs.sh
      

      The following files are created. Update both files with the secrets you created in Prerequisites.

      • ranger-admin-pub-cert.pem

      • ranger-plugin-keypair.pem

    2. Display the contents of the ranger-admin-pub-cert.pem file.

      cat ranger-admin-pub-cert.pem
      
    3. Select the file contents and then right-click in the terminal to copy the contents.

    4. Login to AWS console and navigate to Secrets Manager and then click ranger-admin-pub-cert.

    5. Navigate to Secret value section and then go to Retrieve Secret Value > Edit > Plaintext.

    6. Replace the secrets with the new value, which you copied in step 2.

    7. Similarly, follow the steps b-e above to display the file contents of ranger-plugin-keypair.pem and use the contents to replace the value of the ranger-plugin-private-keypair secrets in the AWS Secrets Manager.

  7. (Optional) Create IAM roles using the emr-native-role-creation-template.json template.

    aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-role-creation --template-body file://emr-native-role-creation-template.json --capabilities CAPABILITY_NAMED_IAM
    

    Note

    For giving access to data for Apache Hive and Apache Spark services, navigate to IAM Management in your AWS Console and add required S3 policies in the EMR_NATIVE_DATA_ACCESS_ROLE.

  8. (Optional) Create Security Configurations using the emr-native-sec-config-template.json template.

    aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-security-config-creation  --template-body file://emr-native-sec-config-template.json
    
  9. Create EMR using the emr-native-template.json template.

    aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-creation  --template-body file://emr-native-template.json
    

EMR Native Apache Ranger properties

Property

Description

Example

EMR_NATIVE_ENABLE

Enables EMR native Ranger integration.

EMR_NATIVE_ENABLE: "true"

Properties for EMR Specifications

EMR_NATIVE_CLUSTER_NAME

Name of the EMR Cluster.

EMR_NATIVE_CLUSTER_NAME: "Privacera-EMR-Native-Ranger"

EMR_NATIVE_AWS_REGION

AWS Region where the cluster will reside.

EMR_NATIVE_AWS_REGION: "{{AWS_REGION}}"

EMR_NATIVE_AWS_ACCT_ID

AWS Account ID where the EMR Cluster and its resources will reside.

EMR_NATIVE_AWS_ACCT_ID: "587946681758"

EMR_NATIVE_SUBNET_ID

Subnet ID where the EMR Cluster nodes will reside.

EMR_NATIVE_SUBNET_ID: ""

EMR_NATIVE_KEYPAIR

An existing EC2 key pair to SSH into the node of cluster

EMR_NATIVE_KEYPAIR: "privacera-test-pair"

EMR_NATIVE_EC2_MARKET_TYPE

Market Type for the EMR Cluster nodes. For example, SPOT or ON_DEMAND.

EMR_NATIVE_EC2_MARKET_TYPE: "SPOT"

EMR_NATIVE_EC2_INSTANCE_TYPE

Instance Type for the EMR Cluster nodes.

EMR_NATIVE_EC2_INSTANCE_TYPE: "m5.2xlarge"

EMR_NATIVE_MASTER_NODE_COUNT

Node count for Master.

EMR_NATIVE_MASTER_NODE_COUNT: "1"

EMR_NATIVE_CORE_NODE_COUNT

Node count for Core.

EMR_NATIVE_CORE_NODE_COUNT: "1"

EMR_NATIVE_VERSION

EMR Native Ranger integation is supported from 5.32 and above.

EMR_NATIVE_VERSION: "emr-5.32.0"

EMR_NATIVE_TERMINATION_PROTECT

To enable termination protection.

EMR_NATIVE_TERMINATION_PROTECT: "true"

EMR_NATIVE_LOGS_PATH

S3 location for EMR logs storage.

EMR_NATIVE_LOGS_PATH: "s3://privacera-emr/logs"

Properties to configure EMR Security Group

EMR_NATIVE_CREATE_SG

Set this to true if you don't have existing security groups and want Privacera Manager to add security groups in EMR CloudFormation Template.

EMR_NATIVE_CREATE_SG: "false"

If EMR_NATIVE_CREATE_SG is false, fill the following properties with existing security group ids:

EMR_NATIVE_MASTER_SG_ID

Security Group ID for EMR Master Node Group.

EMR_NATIVE_MASTER_SG_ID: "sg-xxxxxxx"

EMR_NATIVE_SLAVE_SG_ID

Security Group ID for EMR Slave Node Group.

EMR_NATIVE_SLAVE_SG_ID: "sg-xxxxxxx"

EMR_NATIVE_SERVICE_ACCESS_SG_ID

Security Group ID for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a private network.

EMR_NATIVE_SERVICE_ACCESS_SG_ID: "sg-xxxxxxx"

If EMR_NATIVE_CREATE_SG is true, fill the following properties to give security group names for new groups which will be added in emr-template.json :

EMR_NATIVE_SG_VPC_ID

VPC ID in which you want to create the EMR Cluster.

EMR_NATIVE_SG_VPC_ID: "vpc-xxxxxxxxxxx"

EMR_NATIVE_MASTER_SG_NAME

Security Group Name for EMR Master Node Group.

EMR_NATIVE_MASTER_SG_NAME: "priv-master-sg"

EMR_NATIVE_SLAVE_SG_NAME

Security Group Name for EMR Slave Node Group.

EMR_NATIVE_SLAVE_SG_NAME: "priv-slave-sg"

EMR_NATIVE_SERVICE_ACCESS_SG_NAME

Security Group Name for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a private network.

EMR_NATIVE_SERVICE_ACCESS_SG_NAME: "priv-private-sg"

EMR_NATIVE_SECURITY_CONFIG

Name of the security configurations created for EMR. This can be an existing configuration or Privacera Manager can generate a template through which new configurations can be created. The new template will be available at ~/privacera/privacera-manager/output/emr/emr-native-sec-config-template.json after you run the Privacera Manager update command.

EMR_NATIVE_SECURITY_CONFIG: ""

Properties for EMR Hive Metastore

EMR_NATIVE_HIVE_METASTORE

Metastore type. For example, internal, hive (For external hive-metastore)

EMR_NATIVE_HIVE_METASTORE: "hive"

EMR_NATIVE_HIVE_METASTORE_WAREHOUSE_PATH

S3 location for Hive metastore warehouse

EMR_NATIVE_HIVE_METASTORE_WAREHOUSE_PATH: "s3://hive-warehouse"

Fill the following properties, if EMR_NATIVE_HIVE_METASTORE is hive:

EMR_NATIVE_METASTORE_CONNECTION_URL

JDBC Connection URL for connecting to Hive Metastore.

EMR_NATIVE_METASTORE_CONNECTION_URL: jdbc:mysql://<jdbc-host>:3306/<hive-db-name>?createDatabaseIfNotExist=true

EMR_NATIVE_METASTORE_CONNECTION_DRIVER

JDBC Driver Name

EMR_NATIVE_METASTORE_CONNECTION_DRIVER: "org.mariadb.jdbc.Driver"

EMR_NATIVE_METASTORE_CONNECTION_USERNAME

JDBC UserName

EMR_NATIVE_METASTORE_CONNECTION_USERNAME: "hive"

EMR_NATIVE_METASTORE_CONNECTION_PASSWORD

JDBC Password

EMR_NATIVE_METASTORE_CONNECTION_PASSWORD: "StRong@PassWord"

Properties of Kerberos Server

EMR_NATIVE_KDC_ADMIN_PASSWORD

The password used within the cluster for the kadmin service.

EMR_NATIVE_KDC_ADMIN_PASSWORD: ""

EMR_NATIVE_CROSS_REALM_PASSWORD

The cross-realm trust principal password, which must be identical across realms.

EMR_NATIVE_CROSS_REALM_PASSWORD: ""

EMR_NATIVE_KERB_TICKET_LIFETIME

The period for which a Kerberos ticket issued by the cluster’s KDC is valid. Cluster applications and services auto-renew tickets after they expire.

EMR_NATIVE_KERB_TICKET_LIFETIME: 24

EMR_NATIVE_KERB_REALM

The Kerberos realm name for the other realm in the trust relationship.

EMR_NATIVE_KERB_REALM: ""

EMR_NATIVE_KERB_DOMAIN

The domain name of the other realm in the trust relationship.

EMR_NATIVE_KERB_DOMAIN: ""

EMR_NATIVE_KERB_ADMIN_SERVER

The fully qualified domain name (FQDN) and optional port for the Kerberos admin server in the other realm. If a port is not specified, 749 is used.

EMR_NATIVE_KERB_ADMIN_SERVER: ""

EMR_NATIVE_KERB_KDC_SERVER

The fully qualified domain name (FQDN) and optional port for the KDC in the other realm. If a port is not specified, 88 is used.

EMR_NATIVE_KERB_KDC_SERVER: ""

Properties of Certificates Secrets

EMR_NATIVE_RANGER_PLUGIN_SECRET_ARN

Full ARN of AWS secret [stored in AWS Secrets Manager] for Ranger plugin key-pair. This is the secret created in the Prerequisites step above.

EMR_NATIVE_RANGER_PLUGIN_SECRET_ARN: "arn:aws:secretsmanager:us-east-1:99999999999:secret:ranger-plugin-key-pair-ixZbO2"

EMR_NATIVE_RANGER_ADMIN_SECRET_ARN

Full ARN of AWS secret [stored in AWS Secrets Manager] for Ranger admin public certificate. This is the secret created in the Prerequisites step above.

EMR_NATIVE_RANGER_ADMIN_SECRET_ARN: "arn:aws:secretsmanager:us-east-1:99999999999:secret:ranger-admin-public-cert-ixfCO5"

Properties of EMR application

EMR_NATIVE_APP_SPARK_ENABLE

Installs Spark application with EMR native Ranger plugin, if set to true.

EMR_NATIVE_APP_SPARK_ENABLE: "true"

EMR_NATIVE_APP_HIVE_ENABLE

Installs Hive application with EMR native Ranger plugin, if set to true.

EMR_NATIVE_APP_HIVE_ENABLE: "true"

EMR_NATIVE_APP_ZEPPELIN_ENABLE

Installs Zeppelin application, if set to true.

EMR_NATIVE_APP_ZEPPELIN_ENABLE: "true"

EMR_NATIVE_APP_LIVY_ENABLE

Installs Livy application, if set to true.

EMR_NATIVE_APP_LIVY_ENABLE: "true"

Properties of IAM Role Configuration

EMR_NATIVE_DEFAULT_ROLE

Default role attached to EMR cluster for performing cluster related activities. This should be an existing role.

EMR_NATIVE_DEFAULT_ROLE: "EMR_DefaultRole"

EMR_NATIVE_INSTANCE_ROLE

The IAM Role which will be attached to each node in the EMR Cluster. This should have only minimal permissions for basic EMR functionalities.

EMR_NATIVE_INSTANCE_ROLE: "restricted_instance_role"

EMR_NATIVE_DATA_ACCESS_ROLE

This role provides credentials for trusted execution engines, such as Apache Hive and AWS EMR Record Server AWS EMR Components, to access AWS S3 data. Use this role only to access AWS S3 data, including any KMS keys, if you are using S3 SSE-KMS.

EMR_NATIVE_DATA_ACCESS_ROLE: "emr_native_data_access_role"

EMR_NATIVE_USER_ACCESS_ROLE

This role provides users who are not trusted execution engines with credentials to interact with AWS services, if needed. Do not use this IAM role to allow access to AWS S3 data, unless its data that should be accessible by all users.

EMR_NATIVE_USER_ACCESS_ROLE: "emr_native_user_access_role"

Properties to send EMR Ranger Engines Audits to Solr

EMR_NATIVE_ENABLE_SOLR_AUDITS

Enable audits to Solr.

EMR_NATIVE_ENABLE_SOLR_AUDITS: "true"

AUDITSERVER_AUTH_TYPE

EMR Native Ranger Audits Frameworks does not support basic authentication, hence this needs to be disabled. This property needs to changed in vars.auditserver.yml, if already existing.

AUDITSERVER_AUTH_TYPE: "none"

AUDITSERVER_SSL_ENABLE

Incase of self-signed SSL, EMR native Ranger does not support SSL for Solr audits. Hence, AuditServer SSL should be disabled.

AUDITSERVER_SSL_ENABLE: "false"

EMR_NATIVE_CLOUDWATCH_GROUPNAME

Add a CloudWatch LogGroup to push Ranger Audits. This should be an existing Group.

EMR_NATIVE_CLOUDWATCH_GROUPNAME: "emr_privacera_native_logs"