Skip to content

EMR Native Ranger Integration with PrivaceraCloud

AWS EMR provides native Apache Ranger integration with the open source Apache Ranger plug-ins for Apache Spark and Hive. By connecting EMR’s plug-in with PrivaceraCloud’s Ranger-based data access governance has following advantages:

  • Enterprises can synch their existing policies with EMR.
  • Organizations can extend Apache Ranger’s open source capabilities to take advantage of Privacera’s centralized enterprise-ready solution.

Prerequisite#

  • Enable "privacera_hive" and "privacera_emrfs_s3" services in your PrivaceraCloud tenant. For more information see - How to enable Service Configs topic.

Configuration#

Certificate Setup in Secrets Manager#

AWS EMR Native Ranger mandates usage of mutual TLS between Ranger plug-ins and the Privacera Ranger Admin. To provide these TLS certificates, they must be in the AWS Secrets Manager and provided in an EMR Security Configuration. Perform the following steps to proceed with configuration:

Create two secrets in AWS Secret Manager:

  1. Ranger Admin Public Cert

    1. Login to AWS Console and navigate to Secrets Manager and then click Store a new secret option.

    2. Select secret type as Other type of secrets and then go to the Plaintext tab.

    3. Go to your PrivaceraCloud account and follow navigation Settings > ApiKey > AWS EMR Native Ranger Plugin > Ranger Admin Public Cert > Download Certificate.

    4. Add the contents of this Certificate in the Plaintext tab.

    5. Select the encryption key as per your requirement.

    6. Click Next. Enter the Secret name. For example: ranger-admin-pub-cert

    7. Click Next. The Configure automatic rotation page is displayed. No action required.
      Click Next.
    8. Review Secret details and click Store.

      The Secret is stored successfully.

  2. Ranger Client KeyPair

    1. Login to AWS Console and navigate to Secrets Manager and then click Store a new secret option.

    2. Select secret type as Other type of secrets and then go to Plaintext tab.

    3. Go to your PrivaceraCloud account and follow navigation Settings > ApiKey > AWS EMR Native Ranger Plugin > Ranger Client KeyPair > Download Certificate.

    4. Add the contents of this certificate in the Plaintext tab.

    5. Select the encryption key as per your requirement.

    6. Click Next. Enter the Secret name. For example: ranger-plugin-key-cert
    7. Click Next. The Configure automatic rotation page is displayed. No action required.
      Click Next.
    8. Review Secret details and click Store.

      The Secret is stored successfully.

IAM Roles Setup#

Following three IAM roles need to be created before launching the Cluster.

  • A custom Amazon EC2 instance profile for Amazon EMR, this will be attached to all the cluster nodes - [EmrNativePrivaceraInstanceRole]
  • An IAM role for Apache Ranger Engines, this will be used for data access from S3 - [EmrNativePrivaceraDataAccessRole]
  • An IAM role for other AWS services, this will be used to attach any other required permissions for the user on EMR Cluster - [EmrNativePrivaceraUserAccessRole]

These can be created easily with required minimal permission using the following CloudFormation template. You can modify the template based on your requirements (if required).

Sample CloudFormation Template
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "Create roles and policies for use by Emr-Native Ranger with Privacera",
"Parameters": {
    "EmrNativePrivaceraInstanceRole": {
    "Description": "IAM Role which will be attached to all Instances in the Cluster. Should have minimal permissions. e.g. emr_native_privacera_restricted_instance_role",
    "Type": "String",
    "Default": "emr_native_privacera_restricted_instance_role"
    },
    "EmrNativePrivaceraDataAccessRole": {
    "Description": "IAM Role which will be used by EMR Applications for accessing actual S3 Data. e.g. emr_native_privacera_data_access_role",
    "Type": "String",
    "Default": "emr_native_privacera_data_access_role"
    },
    "EmrNativePrivaceraUserAccessRole": {
    "Description": "IAM Role which will allows users to interact with AWS Services. Shouldn't be used to access s3 Data. e.g. emr_native_privacera_user_access_role",
    "Type": "String",
    "Default": "emr_native_privacera_user_access_role"
    },
    "RangerPluginKeyPairSecretArn": {
    "Description": "Full ARN of secret [stored in AWS Secrets Manager] for ranger plugin key-pair. e.g. arn:aws:secretsmanager:us-east-1:999999999999:secret:ranger-plugin-cert-k4xsLM",
    "Type": "String",
    "Default": ""
    },
    "RangerAdminPublicSecretArn": {
    "Description": "Full ARN of secret [stored in AWS Secrets Manager] for ranger admin public cert. e.g arn:aws:secretsmanager:us-east-1:999999999999:secret:ranger-admin-cert-3W5Zdt",
    "Type": "String",
    "Default": ""
    },
    "Region": {
    "Description": "AWS Region where cluster will be created. e.g. us-east-1",
    "Type": "String",
    "Default": "us-east-1"
    },
    "CloudwatchLogGroupName": {
    "Description": "CloudWatch Log group name which will be used to store RangerAudits. This should be an existing one e.g. emr_native_privacera_audits",
    "Type": "String",
    "Default": ""
    },
    "AwsAcctId": {
    "Description": "Account ID of your Amazon Account. e.g. 999999999999",
    "Type": "String",
    "Default": ""
    },
    "LogsBucketS3": {
    "Description": "S3 path to store emr logs (without the protocol). e.g. privacera-logs/emr-native-logs",
    "Type": "String",
    "Default": ""
    }
},
"Resources": {
    "EmrPrivaceraInstanceRole": {
    "Type": "AWS::IAM::Role",
    "Properties": {
        "RoleName": {
        "Fn::Sub": "${EmrNativePrivaceraInstanceRole}"
        },
        "AssumeRolePolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
            {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
            }
        ]
        },
        "Path": "/"
    }
    },
    "EmrPrivaceraInstancePolicy": {
    "Type": "AWS::IAM::Policy",
    "Properties": {
        "PolicyName": {
        "Fn::Join": [
            "",
            [
            "emr_native_privacera_instance_policy"
            ]
        ]
        },
        "PolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
            {
            "Sid": "EmrServiceLimited",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
                "ec2:Describe*",
                "elasticmapreduce:Describe*",
                "elasticmapreduce:ListBootstrapActions",
                "elasticmapreduce:ListClusters",
                "elasticmapreduce:ListInstanceGroups",
                "elasticmapreduce:ListInstances",
                "elasticmapreduce:ListSteps",
                "glue:CreateDatabase",
                "glue:UpdateDatabase",
                "glue:DeleteDatabase",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateTable",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:GetTable",
                "glue:GetTables",
                "glue:GetTableVersions",
                "glue:CreatePartition",
                "glue:BatchCreatePartition",
                "glue:UpdatePartition",
                "glue:DeletePartition",
                "glue:BatchDeletePartition",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchGetPartition",
                "glue:CreateUserDefinedFunction",
                "glue:UpdateUserDefinedFunction",
                "glue:DeleteUserDefinedFunction",
                "glue:GetUserDefinedFunction",
                "glue:GetUserDefinedFunctions"
            ]
            },
            {
            "Sid": "EmrS3Limited",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::*.elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce",
                {
                "Fn::Sub": "arn:aws:s3:::${LogsBucketS3}"
                },
                {
                "Fn::Sub": "arn:aws:s3:::${LogsBucketS3}/*"
                }
            ]
            },
            {
            "Sid": "AllowAssumeOfRolesAndTagging",
            "Effect": "Allow",
            "Action": [
                "sts:TagSession",
                "sts:AssumeRole"
            ],
            "Resource": [
                {
                "Fn::Sub": "arn:aws:iam::${AwsAcctId}:role/${EmrNativePrivaceraDataAccessRole}"
                },
                {
                "Fn::Sub": "arn:aws:iam::${AwsAcctId}:role/${EmrNativePrivaceraUserAccessRole}"
                }
            ]
            },
            {
            "Sid": "AllowSecretsRetrieval",
            "Effect": "Allow",
            "Action": "secretsmanager:GetSecretValue",
            "Resource": [
                {
                "Fn::Sub": "${RangerPluginKeyPairSecretArn}"
                },
                {
                "Fn::Sub": "${RangerAdminPublicSecretArn}"
                }
            ]
            }
        ]
        },
        "Roles": [
        {
            "Ref": "EmrPrivaceraInstanceRole"
        }
        ]
    }
    },
    "EmrPrivaceraUserAccessRole": {
    "Type": "AWS::IAM::Role",
    "Properties": {
        "RoleName": {
        "Fn::Sub": "${EmrNativePrivaceraUserAccessRole}"
        },
        "AssumeRolePolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
            {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
            },
            {
            "Effect": "Allow",
            "Principal": {
                "AWS": {
                "Fn::GetAtt": [
                    "EmrPrivaceraInstanceRole",
                    "Arn"
                ]
                }
            },
            "Action": "sts:AssumeRole"
            }
        ]
        },
        "Path": "/"
    }
    },
    "EmrPrivaceraDataAccessRole": {
    "Type": "AWS::IAM::Role",
    "Properties": {
        "RoleName": {
        "Fn::Sub": "${EmrNativePrivaceraDataAccessRole}"
        },
        "AssumeRolePolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
            {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
            },
            {
            "Effect": "Allow",
            "Principal": {
                "AWS": {
                "Fn::GetAtt": [
                    "EmrPrivaceraInstanceRole",
                    "Arn"
                ]
                }
            },
            "Action": "sts:AssumeRole"
            }
        ]
        },
        "Path": "/"
    }
    },
    "DataAccessPolicy": {
    "Type": "AWS::IAM::Policy",
    "Properties": {
        "PolicyName": {
        "Fn::Join": [
            "",
            [
            "emr_native_privacera_data_access_policy"
            ]
        ]
        },
        "PolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
            {
            "Sid": "CloudwatchLogsPermissions",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Effect": "Allow",
            "Resource": [
                {
                "Fn::Sub": "arn:aws:logs:${Region}:${AwsAcctId}:log-group:${CloudwatchLogGroupName}:*"
                }
            ]
            },
            {
            "Sid": "BucketPermissionsInS3Buckets",
            "Action": [
                "s3:CreateBucket",
                "s3:DeleteBucket",
                "s3:ListAllMyBuckets",
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::examplebucket"
            ]
            },
            {
            "Sid": "ObjectPermissionsInS3Objects",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::examplebucket/*"
            ]
            }
        ]
        },
        "Roles": [
        {
            "Ref": "EmrPrivaceraDataAccessRole"
        }
        ]
    }
    },
    "EmrPrivaceraInstanceProfile": {
    "Type": "AWS::IAM::InstanceProfile",
    "Properties": {
        "InstanceProfileName": {
        "Ref": "EmrNativePrivaceraInstanceRole"
        },
        "Roles": [
        {
            "Ref": "EmrPrivaceraInstanceRole"
        }
        ]
    }
    },
    "EmrNativePrivaceraDataAccessProfile": {
    "Type": "AWS::IAM::InstanceProfile",
    "Properties": {
        "InstanceProfileName": {
        "Ref": "EmrNativePrivaceraDataAccessRole"
        },
        "Roles": [
        {
            "Ref": "EmrPrivaceraDataAccessRole"
        }
        ]
    }
    },
    "EmrNativePrivaceraUserAccessProfile": {
    "Type": "AWS::IAM::InstanceProfile",
    "Properties": {
        "InstanceProfileName": {
        "Ref": "EmrNativePrivaceraUserAccessRole"
        },
        "Roles": [
        {
            "Ref": "EmrPrivaceraUserAccessRole"
        }
        ]
    }
    }
},
"Outputs": {
    "EmrNativePrivaceraInstanceRole": {
    "Value": {
        "Ref": "EmrPrivaceraInstanceRole"
    }
    },
    "EmrNativePrivaceraDataAccessRole": {
    "Value": {
        "Ref": "EmrPrivaceraDataAccessRole"
    }
    },
    "EmrNativePrivaceraUserAccessRole": {
    "Value": {
        "Ref": "EmrPrivaceraUserAccessRole"
    }
    }
}
}

Note

To know about how to create a stack using CloudFormation template, refer Create CloudFormation stack topic.

Note

After the above stack is created successfully, you will have three IAM roles. Use EmrNativePrivaceraDataAccessRole IAM Role, to give access for S3 data to the Apache Ranger services.

For detailed information, see IAM roles for native integration with Apache Ranger

Create Security Configurations#

  • A new SecurityConfiguration needs to be created with the Kerberos Server and Ranger Integration details which will be attached to the EMR Cluster.

  • This can be created easily with required minimal permission using the following CloudFormation template.

    Note

    This template assumes that you have Cluster dedicated KDC with Cross Realm Trust Enabled.

  • You can modify the CloudFormation template based on your requirements (if required).

    Note

    Common variables from the previous setup steps should be kept the same.

Sample CloudFormation Template
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "Create Security Configuration for use by Privacera-Protected EMR Clusters",
"Parameters": {
    "EmrNativePrivaceraSecConfName": {
    "Description": "Name to be given for the Security Configuration. e.g. emr_native_privacera_sec_conf",
    "Type": "String",
    "Default": "emr_native_privacera_sec_conf"
    },
    "EmrNativePrivaceraDataAccessRole": {
    "Description": "IAM Role which will be used by EMR Applications for accessing actual S3 Data. e.g. emr_native_privacera_data_access_role",
    "Type": "String",
    "Default": "emr_native_privacera_data_access_role"
    },
    "EmrNativePrivaceraUserAccessRole": {
    "Description": "IAM Role which will allows users to interact with AWS Services. Shouldn't be used to access s3 Data. e.g. emr_native_privacera_user_access_role",
    "Type": "String",
    "Default": "emr_native_privacera_user_access_role"
    },
    "RangerPluginKeyPairSecretArn": {
    "Description": "Full ARN of secret [stored in AWS Secrets Manager] for ranger plugin key-pair. e.g. arn:aws:secretsmanager:us-east-1:999999999999:secret:ranger-plugin-cert-k4xsLM",
    "Type": "String",
    "Default": ""
    },
    "RangerAdminPublicSecretArn": {
    "Description": "Full ARN of secret [stored in AWS Secrets Manager] for ranger admin public cert. e.g arn:aws:secretsmanager:us-east-1:999999999999:secret:ranger-admin-cert-3W5Zdt",
    "Type": "String",
    "Default": ""
    },
    "Region": {
    "Description": "AWS Region where cluster will be created. e.g. us-east-1",
    "Type": "String",
    "Default": ""
    },
    "CloudwatchLogGroupName": {
    "Description": "CloudWatch Log group name which will be used to store RangerAudits. This should be an existing one e.g. emr_native_privacera_audits",
    "Type": "String",
    "Default": ""
    },
    "AwsAcctId": {
    "Description": "Account ID of your Amazon Account. e.g. 999999999999",
    "Type": "String",
    "Default": ""
    },
    "EmrNativeRangerAdminUrl": {
    "Description": "Get from--> PCloud Portal >> Access Manager >> Settings >> ApiKey >> Click on Info Icon >> AWS EMR Native Ranger Plugin Section >> Ranger Admin mTLS URL >> Copy URL. e.g. https://api-mtls.privaceracloud.com/api/<api-key>",
    "Type": "String",
    "Default": "https://api-mtls.privaceracloud.com/api/<api-key>"
    },
    "HiveRepoName": {
    "Description": "Hive Repo Name in RangerAdmin",
    "Type": "String",
    "Default": "privacera_hive"
    },
    "EmrfsRepoName": {
    "Description": "EMRFS-S3 Repo Name in RangerAdmin",
    "Type": "String",
    "Default": "privacera_emrfs_s3"
    },
    "KerberosTicketLifetime": {
    "Description": "The period for which a Kerberos ticket issued by the cluster’s KDC is valid. Cluster applications and services auto-renew tickets after they expire",
    "Type": "Number",
    "Default": 24
    },
    "KerberosAdminServer": {
    "Description": "The fully qualified domain name (FQDN) and optional port for the Kerberos admin server in the other realm. If a port is not specified, 749 is used",
    "Type": "String",
    "Default": ""
    },
    "KerberosDomain": {
    "Description": "The domain name of the other realm in the trust relationship",
    "Type": "String",
    "Default": ""
    },
    "KDCServer": {
    "Description": "The fully qualified domain name (FQDN) and optional port for the KDC in the other realm. If a port is not specified, 88 is used",
    "Type": "String",
    "Default": ""
    },
    "KerberosRealm": {
    "Description": "The Kerberos realm name for the other realm in the trust relationship",
    "Type": "String",
    "Default": ""
    }
},
"Resources": {
    "SecurityConfiguration": {
    "Type": "AWS::EMR::SecurityConfiguration",
    "Properties": {
        "Name": {
        "Fn::Sub": "${EmrNativePrivaceraSecConfName}"
        },
        "SecurityConfiguration": {
        "AuthorizationConfiguration": {
            "RangerConfiguration": {
            "AdminServerURL": {
                "Fn::Sub": "${EmrNativeRangerAdminUrl}"
            },
            "RoleForRangerPluginsARN": {
                "Fn::Sub": "arn:aws:iam::${AwsAcctId}:role/${EmrNativePrivaceraDataAccessRole}"
            },
            "RoleForOtherAWSServicesARN": {
                "Fn::Sub": "arn:aws:iam::${AwsAcctId}:role/${EmrNativePrivaceraUserAccessRole}"
            },
            "AdminServerSecretARN": {
                "Fn::Sub": "${RangerAdminPublicSecretArn}"
            },
            "RangerPluginConfigurations": [
                {
                "App": "Spark",
                "ClientSecretARN": {
                    "Fn::Sub": "${RangerPluginKeyPairSecretArn}"
                },
                "PolicyRepositoryName": {
                    "Fn::Sub": "${HiveRepoName}"
                }
                },
                {
                "App": "Hive",
                "ClientSecretARN": {
                    "Fn::Sub": "${RangerPluginKeyPairSecretArn}"
                },
                "PolicyRepositoryName": {
                    "Fn::Sub": "${HiveRepoName}"
                }
                },
                {
                "App": "EMRFS-S3",
                "ClientSecretARN": {
                    "Fn::Sub": "${RangerPluginKeyPairSecretArn}"
                },
                "PolicyRepositoryName": {
                    "Fn::Sub": "${EmrfsRepoName}"
                }
                }
            ],
            "AuditConfiguration": {
                "Destinations": {
                "AmazonCloudWatchLogs": {
                    "CloudWatchLogGroup": {
                    "Fn::Sub": "arn:aws:logs:${Region}:${AwsAcctId}:log-group:${CloudwatchLogGroupName}:*"
                    }
                }
                }
            }
            }
        },
        "AuthenticationConfiguration": {
            "KerberosConfiguration": {
            "Provider": "ClusterDedicatedKdc",
            "ClusterDedicatedKdcConfiguration": {
                "TicketLifetimeInHours": {
                "Ref": "KerberosTicketLifetime"
                },
                "CrossRealmTrustConfiguration": {
                "AdminServer": {
                    "Fn::Sub": "${KerberosAdminServer}"
                },
                "Domain": {
                    "Fn::Sub": "${KerberosDomain}"
                },
                "KdcServer": {
                    "Fn::Sub": "${KDCServer}"
                },
                "Realm": {
                    "Fn::Sub": "${KerberosRealm}"
                }
                }
            }
            }
        }
        }
    }
    }
}
}

Note

To know about how to create a stack using CloudFormation template, refer Create CloudFormation stack topic.

  1. Login to AWS Console and navigate to EMR Console > Security Configuration (from left panel) > Create New Security Configuration.
  2. Enter the Security Configuration name. E.g. EMR_NATIVE_WITH_PLCOUD
  3. Navigate to Authentication section and select Enable Kerberos authentication checkbox and enter the Kerberos environment details.

  4. Under the Authorization section, select Enable integration with Apache Ranger for fine-grained access control and enter the deatils as below:

    1. IAM role for Apache Ranger: “EMR_RS_DATA_ACCESS_ROLE” (Created during IAM Roles setup).

    2. IAM role for other AWS Services: “EMR_RS_USER_ACCESS_ROLE” (Created during IAM Roles setup.

    3. Ranger Policy Manager: Go to your PCloud Account > Settings > ApiKey > AWS EMR Native Ranger > Ranger Admin mTLS URL > click on Copy URL and add it in this section.

    4. Admin PEM secret: Choose ranger-admin-pub-cert using drop-down.

    5. EMRFS client PEM secret: Choose ranger-plugin-key-cert using drop-down.

    6. EMRFS policy repository: privacera_emrfs_s3

    7. Spark configurations: Select this option, if want to enable Spark Application.

    8. Spark client PEM secret: Choose ranger-plugin-key-cert using drop-down.

    9. Spark policy repository: privacera_hive

    10. Hive configurations: Select this option, if want to enable Hive Application.

    11. Hive client PEM secret: Choose ranger-plugin-key-cert using drop-down.

    12. Hive policy repository: privacera_hive

    13. CloudWatch Log Group: Select a CloudWatch log group for pushing audits if required.

      Note: The “EMR_RS_DATA_ACCESS_ROLE” should have permissions to create and PutLogEvents in this log group(This has been configured during IAM roles setup).

Create EMR Cluster#

The following CloudFormation template can be used to EMR cluster. You can modify the below template based on your requirements (if required).

Note

Common variables from the previous setup steps should be kept the same.

Sample CloudFormation Template
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "Create EMR Cluster - Native Ranger Integration with Privacera",
"Parameters": {
    "ClusterName": {
    "Description": "Name of the emr cluster",
    "Type": "String",
    "Default": "Privacera-EMR-Native-Ranger"
    },
    "EMRVersion": {
    "Description": "EMR Native Ranger integation is supported from 5.32 onwards. e.g. emr-5.32.0, emr-5.33.0, etc.",
    "Type": "String",
    "Default": "emr-5.32.0"
    },
    "MasterSecurityGroup": {
    "Description": "Security Group ID for EMR Master Node Group. e.g. sg-xxxxxxx",
    "Type": "String",
    "Default": ""
    },
    "SlaveSecurityGroup": {
    "Description": "Security Group ID for EMR Slave Node Group. e.g. sg-xxxxxxx",
    "Type": "String",
    "Default": ""
    },
    "ServiceAccessSecurityGroup": {
    "Description": "Security Group ID for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a Private Network. e.g. sg-xxxxxxx",
    "Type": "String",
    "Default": ""
    },
    "NodeSubnetId": {
    "Description": "Subnet id for the cluster nodes. e.g. subnet-xxxx",
    "Type": "String",
    "Default": ""
    },
    "SecurityConfig": {
    "Description": "SecurityConfiguration name that will be attached to the EMR Cluster. e.g emr-native-privacera-sec-conf",
    "Type": "String",
    "Default": "emr-native-privacera-sec-conf"
    },
    "HiveMetaStoreWarehouseS3Path": {
    "Description": "Hive metastore warehouse s3 path. e.g. s3://hive-warehouse/data",
    "Type": "String",
    "Default": ""
    },
    "NodeKeyPair": {
    "Description": "An existing EC2 key pair to SSH into the node of cluster. e.g. privacera-test-pair",
    "Type": "String",
    "Default": ""
    },
    "NodeMarketType": {
    "Description": "Node Instance market type. e.g. SPOT, ON_DEMAND",
    "Type": "String",
    "Default": ""
    },
    "KdcAdminPassword": {
    "Description": "The password used within the cluster for the kadmin service.",
    "Type": "String",
    "Default": ""
    },
    "CrossRealmTrustPrincipalPassword": {
    "Description": "The cross-realm trust principal password, which much be identical across realms.",
    "Type": "String",
    "Default": ""
    },
    "RangerAuditsSetupScriptUrl": {
    "Description": "Get from--> PCloud Portal >> Access Manager >> Settings >> ApiKey >> Click on Info Icon >> AWS EMR Native Ranger Plugin Section >> Ranger Audit Setup Script >> Copy URL",
    "Type": "String",
    "Default": ""
    },
    "EmrMasterNodeCount": {
    "Description": "Node count for Master. e.g. 1",
    "Type": "Number",
    "Default": 1
    },
    "EmrCoreNodeCount": {
    "Description": "Node count for Core. e.g. 1",
    "Type": "Number",
    "Default": 1
    },
    "EmrNodeInstanceType": {
    "Description": "e.g. m5.large, m5.2xlarge, r5.xlarge,etc. ",
    "Type": "String",
    "Default": ""
    },
    "EmrTerminationProtection": {
    "Description": "To enable termination protection. Can be true/false",
    "Type": "String",
    "Default": "true"
    },
    "EmrLogsPath": {
    "Description": "S3 location for emr logs storage. e.g. s3://privacera-emr/logs",
    "Type": "String",
    "Default": ""
    },
    "EmrNativePrivaceraInstanceRole": {
    "Description": "IAM Role which will be attached to all Instances in the Cluster. Should have minimal permissions. e.g. emr_native_privacera_restricted_instance_role",
    "Type": "String",
    "Default": "emr_native_privacera_restricted_instance_role"
    },
    "EmrDefaultRole": {
    "Description": "Default role attached to EMR Cluster for performing cluster related activities. This should be a pre-created one. e.g. EMR_DefaultRole",
    "Type": "String",
    "Default": "EMR_DefaultRole"
    },
    "EmrHiveMetastoreConnectionUrl": {
    "Description": "JDBC Connection URL for connecting to hive. e.g. jdbc:mysql://<jdbc-host>:3306/<hive-db-name>?createDatabaseIfNotExist=true",
    "Type": "String",
    "Default": ""
    },
    "EmrHiveMetastoreConnectionDriver": {
    "Description": "JDBC Driver Name. e.g. org.mariadb.jdbc.Driver",
    "Type": "String",
    "Default": ""
    },
    "EmrHiveMetastoreConnectionUsername": {
    "Description": "JDBC UserName",
    "Type": "String",
    "Default": ""
    },
    "EmrHiveMetastoreConnectionPassword": {
    "Description": "JDBC Password",
    "Type": "String",
    "Default": ""
    }
},
"Resources": {
    "EMRCLUSTER": {
    "Type": "AWS::EMR::Cluster",
    "Properties": {
        "Name": {
        "Ref": "ClusterName"
        },
        "KerberosAttributes": {
        "Realm": "EC2.INTERNAL",
        "KdcAdminPassword": {
            "Ref": "KdcAdminPassword"
        },
        "CrossRealmTrustPrincipalPassword": {
            "Ref": "CrossRealmTrustPrincipalPassword"
        }
        },
        "SecurityConfiguration": {
        "Ref": "SecurityConfig"
        },
        "VisibleToAllUsers": true,
        "EbsRootVolumeSize": 15,
        "Instances": {
        "MasterInstanceGroup": {
            "InstanceCount": {
            "Ref": "EmrMasterNodeCount"
            },
            "InstanceType": {
            "Fn::Sub": "${EmrNodeInstanceType}"
            },
            "Market": {
            "Fn::Sub": "${NodeMarketType}"
            },
            "Name": "Master Instance Group"
        },
        "CoreInstanceGroup": {
            "InstanceCount": {
            "Ref": "EmrCoreNodeCount"
            },
            "InstanceType": {
            "Fn::Sub": "${EmrNodeInstanceType}"
            },
            "Market": {
            "Fn::Sub": "${NodeMarketType}"
            },
            "Name": "Core Instance Group"
        },
        "Ec2KeyName": {
            "Ref": "NodeKeyPair"
        },
        "EmrManagedSlaveSecurityGroup": {
            "Fn::Sub": "${SlaveSecurityGroup}"
        },
        "EmrManagedMasterSecurityGroup": {
            "Fn::Sub": "${MasterSecurityGroup}"
        },
        "ServiceAccessSecurityGroup": {
            "Fn::Sub": "${ServiceAccessSecurityGroup}"
        },
        "Ec2SubnetId": {
            "Fn::Sub": "${NodeSubnetId}"
        },
        "TerminationProtected": {
            "Fn::Sub": "${EmrTerminationProtection}"
        }
        },
        "BootstrapActions": [
        {
            "Name": "Configure Ranger Audits for Master Node",
            "ScriptBootstrapAction": {
            "Path": "s3://elasticmapreduce/bootstrap-actions/run-if",
            "Args": [
                {
                "Fn::Sub": "instance.isMaster=true"
                },
                {
                "Fn::Sub": "wget ${RangerAuditsSetupScriptUrl}; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh"
                }
            ]
            }
        },
        {
            "Name": "Configure Ranger Audits for Worker Nodes",
            "ScriptBootstrapAction": {
            "Path": "s3://elasticmapreduce/bootstrap-actions/run-if",
            "Args": [
                {
                "Fn::Sub": "instance.isMaster=false"
                },
                {
                "Fn::Sub": "wget ${RangerAuditsSetupScriptUrl}; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh"
                }
            ]
            }
        }
        ],
        "Applications": [
        {
            "Name": "Hive"
        },
        {
            "Name": "Spark"
        },
        {
            "Name": "Zeppelin"
        },
        {
            "Name": "Livy"
        },
        {
            "Name": "Hue"
        }
        ],
        "Configurations": [
        {
            "Classification": "spark",
            "ConfigurationProperties": {
            "maximizeResourceAllocation": "true"
            },
            "Configurations": []
        },
        {
            "Classification": "spark-hive-site",
            "ConfigurationProperties": {
            "hive.metastore.warehouse.dir": {
                "Ref": "HiveMetaStoreWarehouseS3Path"
            }
            }
        },
        {
            "Classification": "hive-site",
            "ConfigurationProperties": {
            "javax.jdo.option.ConnectionURL": {
                "Fn::Sub": "${EmrHiveMetastoreConnectionUrl}"
            },
            "javax.jdo.option.ConnectionDriverName": {
                "Fn::Sub": "${EmrHiveMetastoreConnectionDriver}"
            },
            "javax.jdo.option.ConnectionUserName": {
                "Fn::Sub": "${EmrHiveMetastoreConnectionUsername}"
            },
            "javax.jdo.option.ConnectionPassword": {
                "Fn::Sub": "${EmrHiveMetastoreConnectionPassword}"
            },
            "hive.metastore.warehouse.dir": {
                "Ref": "HiveMetaStoreWarehouseS3Path"
            }
            }
        }
        ],
        "LogUri": {
        "Fn::Sub": "${EmrLogsPath}"
        },
        "JobFlowRole": {
        "Fn::Sub": "${EmrNativePrivaceraInstanceRole}"
        },
        "ServiceRole": {
        "Fn::Sub": "${EmrDefaultRole}"
        },
        "ReleaseLabel": {
        "Fn::Sub": "${EMRVersion}"
        }
    }
    }
}
}

Note

To know about how to create a stack using CloudFormation template, refer Create CloudFormation stack topic.

  1. Login to AWS Console and navigate to EMR service and click on Create Cluster.

  2. Click on Go to advanced options link.

  3. Under the Software Configuration:

    1. Select Release Version.

    2. Select additional applications as per your environment.

    If you select Hive or Spark applications, then it is mandatory to select HCatalog option.

    1. Under the Edit software settings, select the Enter configuration, and add the following text if you want to use external Hive Metastore.

    Glue Metastore is not supported.

    ```json
    [
    {
        "Classification": "hive-site",
        "Properties": {
        "javax.jdo.option.ConnectionUserName": "${user-name}",
        "javax.jdo.option.ConnectionDriverName": "${jdbc-driver}",
        "javax.jdo.option.ConnectionURL": "${jdbc-url}",
        "javax.jdo.option.ConnectionPassword": "${jdbc-password}"
        }
    }
    ]
    ```
    
  4. Click Next.

  5. Under the Hardware settings, select values Networking, Node, and Instance values as appropriate for your environment.

  6. Under the General cluster settings.

    If you want to enable Audit logging for your applications in Privacera Portal, perform the following. It will add two scripts that will Install Ranger Audits Configurations on master and worker nodes.

  7. Enter the Cluster name.

    1. Select Logging, Debugging, and Termination protection checkboxes as per your environment.

    2. Configure Ranger Audits logging for Master Node:

      1. Under Additional Options, expand Bootstrap Actions, select bootstrap action Run if and click Configure and add.

        The Add Bootstrap Action dialog appears.

      2. In this dialog, enter the name to Configure Ranger Audits for Master,

      3. Add the following script in the Optional arguments field using your own {ranger-audit-setup-script-url} script URL.

        {ranger-audit-setup-script-url}: PCloud Portal > Access Manager > Settings > ApiKey > Click on Info Icon > Ranger Audit Setup Script > Copy URL.

        instance.isMaster=true "wget <ranger-audit-setup-script-url>; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh"
        
      4. Click Add.

    3. Configure Ranger Audits for Worker nodes.

      1. Under Additional Options, expand Bootstrap Actions, select bootstrap action Run if and click Configure and add.

        The Add Bootstrap Action dialog appears.

      2. In this dialog, enter the name to Configure Ranger Audits for Master,

      3. Add the following script in the Optional arguments field using your own {ranger-audit-setup-script-url} script URL.

        {ranger-audit-setup-script-url}: PCloud Portal > Access Manager > Settings > ApiKey > Click on Info Icon > Ranger Audit Setup Script > Copy URL.

        instance.isMaster=false "wget <ranger-audit-setup-script-url>; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh"
        
      4. Click Add.

  8. Under Security Options:

    1. Enter/select Security Options as per your environment.

    2. Under the Permissions section:

      • EMR role: The EMR_EC2_Default role need to be selected.
      • EC2 instance profile: “EMR_RS_INSTANCE_ROLE” created during IAM Roles setup.
    3. Expand Security Configuration, and select the configuration which you created earlier. E.g. "EMR_NATIVE_WITH_PLCOUD".

      • Set Realm and enter a KDC admin password.
    4. Click the Create cluster.

Usage#

  • On the PrivaceraCloud Account, expand Access Manager and click Service Config from left menu.

  • (If it is not already added), click the Add Service at top-right and select Hive and emrfs_s3 from drop-down.

  • Click Save.

  1. Spark-SQL use-case

    1. SSH to EMR master node.

    2. kinit with your user.

    3. Run Spark-SQL shell using “spark-sql”.

    4. Run SQL type queries with Spark.

      Policies will be evaluated against the “privacera_hive” repository and audits can be seen under Access Manager > Audits from left menu.

  2. Spark-Shell use-case

    1. SSH to EMR master node.

    2. kinit with your user.

    3. Run Spark-shell using “spark-shell”.

    4. Run Scala queries with Spark.

      Policies will be evaluated against the “privacera_emrfs_s3” repository for any S3 access and audits can be seen under Access Manager > Audits from left menu.

  1. SSH to EMR master node.

  2. kinit with your user.

  3. Login to beeline shell using command below:

    beeline -u "jdbc:hive2://`hostname -f`:10000/default;principal=hive/`hostname -f`@EC2.INTERNAL"
    
  4. Run Hive queries.

    Policies will be evaluated against the “privacera_hive” repo and audits can be seen under Access Manager > Audits from left menu.

References#


Last update: August 20, 2021