- PrivaceraCloud Release 7.4
- Enhancements and updates in PrivaceraCloud release 7.4
- Known Issues in PrivaceraCloud 7.4
- PrivaceraCloud User Guide
- Overview of PrivaceraCloud
- Connect applications with the setup wizard
- Connect applications
- About applications
- Connect Azure Data Lake Storage Gen 2 (ADLS) to PrivaceraCloud
- Connect Amazon Textract to PrivaceraCloud
- Athena
- Privacera Discovery with Cassandra
- Connect Databricks to PrivaceraCloud
- Databricks SQL
- Databricks SQL Overview and Configuration
- Planning and general process
- Prerequisites
- Databricks SQL with Privacera Hive
- Connect Databricks SQL application
- Grant Databricks SQL permissions to PrivaceraCloud users
- Define a resource policy
- Test the policy
- Databricks SQL PolicySync fields
- Configuring column-level access control
- View-based masking functions and row-level filtering
- Create an endpoint in Databricks SQL
- Databricks SQL Fields
- Databricks SQL Hive Service Definition
- Databricks SQL Masking Functions
- Databricks SQL Encryption
- Use a custom policy repository with Databricks
- Connect Databricks SQL to Hive policy repository on PrivaceraCloud
- Databricks SQL Overview and Configuration
- Connect Databricks Unity Catalog to PrivaceraCloud
- Connect S3 to PrivaceraCloud
- Prerequisites in AWS console
- Connect S3 application to PrivaceraCloud
- Enable Privacera Access Management for S3
- Enable Data Discovery for S3
- S3 AWS Commands - Ranger Permission Mapping
- S3
- AWS Access with IAM
- Access AWS S3 buckets from multiple AWS accounts
- Add UserInfo in S3 Requests sent via Dataserver
- Control access to S3 buckets with AWS Lambda function on PrivaceraCloud
- Dremio Plugin
- DynamoDB
- Connect Elastic MapReduce from Amazon application to PrivaceraCloud
- Connect EMR application
- EMR Spark access control types
- PrivaceraCloud configuration
- AWS IAM roles using CloudFormation setup
- Create a security configuration
- Create EMR cluster
- How to configure multiple JSON Web Tokens (JWTs) for EMR
- EMR Native Ranger Integration with PrivaceraCloud
- Connect EMRFS S3 to PrivaceraCloud
- Files
- GBQ
- Google Cloud Storage
- Connect Glue to PrivaceraCloud
- Google BigQuery for PolicySync
- Connect Kinesis to PrivaceraCloud
- Connect Lambda to PrivaceraCloud
- Microsoft SQL Server
- MySQL for Discovery
- Open Source Apache Spark
- Oracle for Discovery
- PostgreSQL
- Connect Power BI to PrivaceraCloud
- Presto
- Redshift
- Snowflake
- Starburst Enterprise with PrivaceraCloud
- Starburst Enterprise Presto
- Trino
- Connect users
- Data access Users, Groups, and Roles
- UserSync
- Portal user LDAP/AD
- Datasource
- Okta Setup for SAML-SSO
- Azure AD setup
- SCIM Server User-Provisioning
- User Management
- Identity
- Access Manager
- Access Manager
- Resource Policies
- Tag Policies
- Scheme Policies
- Service Explorer
- Reports
- Audit
- About data access users, groups, and roles resource policies
- Security zones
- Discovery
- Classifications via random sampling
- Privacera Discovery scan targets
- Propagate Privacera Discovery Tags to Ranger
- Enable offline scanning on Azure Data Lake Storage Gen 2 (ADLS)
- Enable Real-time Scanning of S3 Buckets
- Enable Real-time Scanning on Azure Data Lake Storage Gen 2 (ADLS)
- Enable Discovery Realtime Scanning Using IAM Role
- Encryption
- Overview of Privacera Encryption
- Encryption schemes
- Presentation schemes
- Masking schemes
- Create scheme policies
- Privacera-supplied encryption schemes for the Privacera API
- Privacera-supplied encryption schemes for the Bouncy Castle API
- API date input formats
- Deprecated encryption formats, algorithms, and scopes
- Privacera Encryption REST API
- PEG API endpoint
- PEG REST API encryption endpoints
- Prerequisites
- Common PEG REST API fields
- Construct the datalist for the /protect endpoint
- Deconstruct the response from the /unprotect endpoint
- Example data transformation with the /unprotect endpoint and presentation scheme
- Example PEG API endpoints
- Make encryption API calls on behalf of another user
- Privacera Encryption UDF for masking in Databricks on PrivaceraCloud
- Privacera Encryption UDFs for Trino on PrivaceraCloud
- Syntax of Privacera Encryption UDFs for Trino
- Prerequisites for installing Privacera Crypto plug-in for Trino
- Download and install Privacera Crypto jar
- Set variables in Trino etc/crypto.properties
- Restart Trino to register the Privacera encryption and masking UDFs for Trino
- Example queries to verify Privacera-supplied UDFs
- Privacera Encryption UDF for masking in Trino on PrivaceraCloud
- Encryption UDFs for Apache Spark on PrivaceraCloud
- Launch Pad
- Settings
- Dashboard
- Usage statistics
- Operational status of PrivaceraCloud and RSS feed
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- PrivaceraCloud Previews
- Preview: File Explorer for S3
- Preview: File Explorer for Azure
- Preview: File Explorer for GCS
- Preview: Scan Generic Records with NER Model
- Preview: Scan Electronic Health Records with NER Model
- Preview: OneLogin setup for SAML-SSO
- Preview: Azure Active Directory SCIM Server UserSync
- Preview: OneLogin UserSync
- Preview: PingFederate UserSync
- Quickstart for Databricks Unity Catalog on PrivaceraCloud
- What do I need to do in my Databricks Workspace?
- Where is the sample dataset in my Databricks Workspace?
- What should I do in the PrivaceraCloud web portal?
- Access use-case - How do I give a user access to a table or restrict from running a SQL select query?
- Access use-case - How do I restrict a user from seeing contents of a column in the result of a SQL select query?
- Column masking use-case - How do I restrict a user from seeing contents of a column by masking the values in the result of a SQL select query?
- Access use-case - How do I disallow a user from seeing certain rows of a table?
- PrivaceraCloud documentation changelog
Connect Elastic MapReduce from Amazon application to PrivaceraCloud
This topic describes how to connect an EMR application to PrivaceraCloud for access control.
Connect EMR application
Go the Settings > Applications.
In the Applications screen, select EMR.
Enter the application Name and Description, and then click Save.
Click the toggle button to enable Access Management for your application.
EMR Spark access control types
EMR Spark supports two types of access control: Fine-Grained Access Control (FGAC) and Object Level Access Control (OLAC). Only one of them can be added during configuration.
EMR Spark OLAC
The advantages of EMR Spark OLAC:
It allows you to access existing AWS S3 resource location that you are trying to access with Spark.
It uses
privacera_s3
service for resource-based access control andprivacera_tag
service for tag-based access control.It uses the signed-authorization implementation from Privacera.
EMR Spark FGAC
When FGAC is installed and enabled, each data user query is parsed by Spark and authenticated by the PrivaceraCloud Spark Plug-In. All resources referred to by the query must be accessible to the requesting user via authentication.
It supports database, table, and column policies, in addition to row filtering and column masking.
It uses the
privacera_hive
,privacera_s3, privacera_adls
, andprivacera_files
services for resource-based access control, and theprivacera_tag
service for tag-based access control.It uses the plugin implementation from Privacera.
PrivaceraCloud configuration
Obtain EMR script download URL
Obtain your account unique call-in <emr-script-download-url>
to allow the EMR cluster to obtain additional scripts and setup from PrivaceraCloud:
In PrivaceraCloud portal, go to Settings > Applications .
Use an existing Active Api Key or create a new one. Set Expiry = Never Expires.
Click the API Key Info (i) button.
On the API Key Info page, click the COPY URL button in front of the AWS EMR Setup Script to store the
emr-script-download-url
.
AWS IAM roles using CloudFormation setup
The following two IAM roles need to be created before launching the cluster. These can be created easily with minimal permission using the Sample CloudFormation Template Roles:.
Node role: EmrPrivaceraNodeRole
App data access role: EmrPrivaceraDataAcessRole
If required, you can modify the template based on your requirements.
To create role, use the following AWS CLI CloudFormation command:
aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-role-creation --template-body file://emr-roles-creation-template.json --capabilities CAPABILITY_NAMED_IAM
For more information about how to create a stack using a CloudFormation template, see Create CloudFormation Stack.
"AWSTemplateFormatVersion": "2010-09-09", "Description": "Create roles and policies for use by Privacera-Protected EMR Clusters", "Resources": { "EmrRestrictedRole": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": { "Fn::Join": [ "", [ "EmrPrivaceraNodeRole" ] ] }, "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "ec2.amazonaws.com" ] }, "Action": [ "sts:AssumeRole" ] } ] }, "Path": "/" } }, "EmrRestrictedPolicy": { "Type": "AWS::IAM::ManagedPolicy", "Properties": { "ManagedPolicyName": { "Fn::Join": [ "", [ "EMRPrivaceraNodePolicy" ] ] }, "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Sid": "EmrServiceLimited", "Effect": "Allow", "Action": [ "glue:CreateDatabase", "glue:UpdateDatabase", "glue:DeleteDatabase", "glue:GetDatabase", "glue:GetDatabases", "glue:CreateTable", "glue:UpdateTable", "glue:DeleteTable", "glue:GetTable", "glue:GetTables", "glue:GetTableVersions", "glue:CreatePartition", "glue:BatchCreatePartition", "glue:UpdatePartition", "glue:DeletePartition", "glue:BatchDeletePartition", "glue:GetPartition", "glue:GetPartitions", "glue:BatchGetPartition", "glue:CreateUserDefinedFunction", "glue:UpdateUserDefinedFunction", "glue:DeleteUserDefinedFunction", "glue:GetUserDefinedFunction", "glue:GetUserDefinedFunctions", "ec2:Describe*", "elasticmapreduce:Describe*", "elasticmapreduce:ListBootstrapActions", "elasticmapreduce:ListClusters", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:ListInstances", "elasticmapreduce:ListSteps" ], "Resource": "*" }, { "Sid": "EmrS3Limited", "Effect": "Allow", "Action": "s3:*", "Resource": [ "arn:aws:s3:::*.elasticmapreduce/*", "arn:aws:s3:::elasticmapreduce/*", "arn:aws:s3:::elasticmapreduce", "arn:aws:s3:::infraqa-test/user/suraj/dev/emr/fgac/privacera_cust_conf.zip" ] }, { "Sid": "EmrAssumeIAM", "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": [ "arn:aws:iam::587946681758:role/infraQA_app_data_access_role" ] } ] }, "Roles": [ { "Ref": "EmrRestrictedRole" } ] } }, "EmrRoleForApps": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": { "Fn::Join": [ "", [ "EmrPrivaceraDataAcessRole" ] ] }, "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "ec2.amazonaws.com" ] }, "Action": [ "sts:AssumeRole" ] }, { "Effect": "Allow", "Principal": { "AWS": { "Fn::GetAtt": [ "EmrRestrictedRole", "Arn" ] } }, "Action": "sts:AssumeRole" } ] }, "Path": "/" } }, "DataAccessPolicy": { "Type": "AWS::IAM::ManagedPolicy", "Properties": { "ManagedPolicyName": { "Fn::Join": [ "", [ "EmrPrivaceraDataAcessPolicy" ] ] }, "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Sid": "S3DataAccess", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObjectAcl", "s3:GetObject", "s3:ListBucket", "s3:DeleteObject", "s3:DeleteBucket", "s3:ListBucketMultipartUploads", "s3:GetBucketAcl", "s3:GetBucketPolicy", "s3:ListMultipartUploadParts", "s3:AbortMultipartUpload", "s3:GetBucketLocation", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::*.elasticmapreduce/*", "arn:aws:s3:::elasticmapreduce/*", "arn:aws:s3:::elasticmapreduce", "arn:aws:s3:::infraqa-test/user/suraj/dev/emr/fgac/privacera_cust_conf.zip" ] } ] }, "Roles": [ { "Ref": "EmrRoleForApps" } ] } }, "EmrRestrictedRoleProfile":{ "Type":"AWS::IAM::InstanceProfile", "Properties":{ "InstanceProfileName":{ "Ref":"EmrRestrictedRole" }, "Roles":[ { "Ref":"EmrRestrictedRole" } ] } }, "EmrRoleForAppsProfile":{ "Type":"AWS::IAM::InstanceProfile", "Properties":{ "InstanceProfileName":{ "Ref":"EmrRoleForApps" }, "Roles":[ { "Ref":"EmrRoleForApps" } ] } } }, "Outputs": { "EMRRestrictedRole": { "Value": { "Ref": "EmrRestrictedRole" } }, "EmrRoleForApps": { "Value": { "Ref": "EmrRoleForApps" } } } }
Create a security configuration
You can create a security configuration using:
CloudFormation setup (Recommended)
AWS EMR console (Manually)
Create a security configuration using CloudFormation setup (Recommended)
A new security configuration must be created with the Kerberos server that will be connected to the EMR cluster. Using the Sample EMR security configuration template:, you can easily create this with minimal permission.
If required, you can modify the template
emr-security-config-template.json
based on your requirements.To create security configuration, use the following AWS CLI CloudFormation command:
aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-security-config-creation --template-body file://emr-security-config-template.json
For more information about how to create a stack using a CloudFormation template, see Create CloudFormation Stack.
Note
This template assumes you have a cluster-specific KDC with Cross Realm Trust enabled.
{ "AWSTemplateFormatVersion":"2010-09-09", "Description":"Create Security Configuration for use by Privacera-Protected EMR Clusters", "Resources":{ "SecurityConfiguration":{ "Type":"AWS::EMR::SecurityConfiguration", "Properties":{ "Name":"emr_sec_config", "SecurityConfiguration":{ "AuthorizationConfiguration":{ "EmrFsConfiguration":{ "RoleMappings":[ { "Role":"<app_data_access_role arn>", "IdentifierType":"User", "Identifiers":[ "hadoop;hive;presto;trino" ] } ] } } , "AuthenticationConfiguration":{ "KerberosConfiguration":{ "Provider":"ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration":{ "TicketLifetimeInHours": 24, "CrossRealmTrustConfiguration":{ "AdminServer":"", "Domain":"", "KdcServer":"", "Realm":"" } } } } } } } } }
Manually create a security configuration using AWS EMR console
To create a security configuration using the AWS EMR console:
Log into the AWS EMR console.
In the left navigation, select Security Configuration > Create New Security Configuration.
Enter a Name for Security Configuration. For example,
emr_sec_config
.Navigate to the Authentication section, check Enable Kerberos authentication, and enter the Kerberos environment details as follows:
Provider: Cluster dedicated KDC
TicketLifetime: 24 hours
Cross-realm trust
Realm: EXAMPLE.COM
Domain: example.com
Admin server: sever.admin.com
KDC server: server.example.com
Select Use IAM roles for EMRFS requests to Amazon S3.
IAM Role: select the App data access role created in AWS IAM roles using CloudFormation setup.
Under Basis for access select an identifier type ( User ) from the list and enter corresponding identifiers (
hadoop
;hive
;presto
;trino
).
Create EMR cluster
You can create EMR cluster using:
CloudFormation setup (Recommended)
Using CloudFormation EMR templates
Using Cloud formation AWS CLI
Using Cloud formation AWS Console
AWS EMR console (Manually)
Create EMR cluster using CloudFormation setup (Recommended)
Create EMR cluster using CloudFormation EMR templates
To create an EMR cluster, use the CloudFormation templates listed below. You can modify the templates to meet your needs. It is recommended to maintain the same common variables from the previous setup steps.
{ "Parameters": { "CLUSTERNAME": { "Description": "Name of the emr cluster", "Type": "String", "Default": "PCloud-EMR-Spark-Hive-Trino-OLAC" }, "EMRRegion": { "Description": "aws region name", "Type": "String", "Default": "us-east-1" }, "EMRVersion": { "Description": "Emr version", "Type": "String", "Default": "emr-6.4.0" }, "MasterSecurityGroup": { "Description": "Emr master/edge node security group", "Type": "String", "Default": "sg-0bdXXXXXXXdb" }, "SlaveSecurityGroup": { "Description": "Emr worker/slave node security group", "Type": "String", "Default": "sg-010XXXXXXX1b" }, "ServiceAccessSecurityGroup": { "Description": "Emr service access security group", "Type": "String", "Default": "sg-068XXXXXXX02" }, "Ec2SubnetId": { "Description": "Ec2 subnet id", "Type": "String", "Default": "subnet-04aXXXXXXXcd" }, "KDCName": { "Description": "KDC Name", "Type": "String", "Default": "emr_sec_config" }, "HiveMetaStoreS3Path": { "Description": "Hive metastore s3 path", "Type": "String", "Default": "s3://<BUCKET_NAME>/emr/hive_warehouse" }, "Ec2KeyName": { "Description": "Ec2 keypair name", "Type": "String", "Default": "<EMR-ACCESS-KEY>" }, "Market": { "Description": "Ec2 Instance market type", "Type": "String", "Default": "ON_DEMAND" }, "KdcAdminPassword": { "Description": "KDC admin user password", "Type": "String", "Default": "<KDC-USER-PASSWORD>" }, "CrossRealmTrustPrincipalPassword": { "Description": "KDC Cross Realm Trust Principal password", "Type": "String", "Default": "<KDC-PRINCIPAL-PASSWORD>" }, "DataserverDomain": { "Description": "Privacera Dataserver Domain", "Type": "String", "Default": ".ec2.internal" }, "PrivaceraDownloadUrl": { "Description": "Privacera Base Download Url", "Type": "String", "Default": "https://privaceracloud.com/api/public/get/emr_script/<API-KEY>" } }, "Resources": { "EMRCLUSTER": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": { "Ref": "CLUSTERNAME" }, "KerberosAttributes": { "Realm": "EC2.INTERNAL", "KdcAdminPassword": { "Ref": "KdcAdminPassword" }, "CrossRealmTrustPrincipalPassword": { "Ref": "CrossRealmTrustPrincipalPassword" } }, "SecurityConfiguration": { "Ref": "KDCName" }, "VisibleToAllUsers": true, "EbsRootVolumeSize": 15, "Instances": { "MasterInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Master Instance Group" }, "CoreInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Core Instance Group" }, "Ec2KeyName": { "Ref": "Ec2KeyName" }, "EmrManagedSlaveSecurityGroup": { "Fn::Sub": "${SlaveSecurityGroup}" }, "EmrManagedMasterSecurityGroup": { "Fn::Sub": "${MasterSecurityGroup}" }, "ServiceAccessSecurityGroup": { "Fn::Sub": "${ServiceAccessSecurityGroup}" }, "Ec2SubnetId": { "Fn::Sub": "${Ec2SubnetId}" }, "TerminationProtected": true }, "BootstrapActions": [ { "Name": "Install Spark OLAC in Master Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=true" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac" } ] } }, { "Name": "Install Spark OLAC in Core Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=false" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac" } ] } } ], "Applications": [ { "Name": "Hive" }, { "Name": "Spark" }, { "Name": "Trino" }, { "Name": "Zeppelin" }, { "Name": "Livy" }, { "Name": "Hue" } ], "Configurations": [ { "Classification": "spark", "ConfigurationProperties": { "maximizeResourceAllocation": "true" }, "Configurations": [] }, { "Classification": "spark-hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.server2.enable.doAs": "false", "parquet.column.index.access": "true", "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "trino-connector-hive", "ConfigurationProperties": { "hive.metastore": "glue", "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } }, { "Classification": "livy-conf", "ConfigurationProperties": { "livy.impersonation.enabled": "true" } }, { "Classification": "core-site", "ConfigurationProperties": { "hadoop.proxyuser.livy.groups": "*", "hadoop.proxyuser.livy.hosts": "*" } } ], "LogUri": "s3://<BUCKET_NAME>/emr/emr_logs/", "JobFlowRole": "<node_role_name>", "ServiceRole": "EMR_DefaultRole", "ReleaseLabel": { "Fn::Sub": "${EMRVersion}" } }
{ "Parameters": { "CLUSTERNAME": { "Description": "Name of the emr cluster", "Type": "String", "Default": "PCloud-EMR-Spark-Hive-PrestoSQL-OLAC" }, "EMRRegion": { "Description": "aws region name", "Type": "String", "Default": "us-east-1" }, "EMRVersion": { "Description": "Emr version", "Type": "String", "Default": "emr-6.2.1" }, "MasterSecurityGroup": { "Description": "Emr master/edge node security group", "Type": "String", "Default": "sg-0bdXXXXXXXdb" }, "SlaveSecurityGroup": { "Description": "Emr worker/slave node security group", "Type": "String", "Default": "sg-010XXXXXXX1b" }, "ServiceAccessSecurityGroup": { "Description": "Emr service access security group", "Type": "String", "Default": "sg-068XXXXXXX02" }, "Ec2SubnetId": { "Description": "Ec2 subnet id", "Type": "String", "Default": "subnet-04aXXXXXXXcd" }, "KDCName": { "Description": "KDC Name", "Type": "String", "Default": "emr_sec_config" }, "HiveMetaStoreS3Path": { "Description": "Hive metastore s3 path", "Type": "String", "Default": "s3://<BUCKET_NAME>/emr/hive_warehouse" }, "Ec2KeyName": { "Description": "Ec2 keypair name", "Type": "String", "Default": "<EMR-ACCESS-KEY>" }, "Market": { "Description": "Ec2 Instance market type", "Type": "String", "Default": "ON_DEMAND" }, "KdcAdminPassword": { "Description": "KDC admin user password", "Type": "String", "Default": "<KDC-USER-PASSWORD>" }, "CrossRealmTrustPrincipalPassword": { "Description": "KDC Cross Realm Trust Principal password", "Type": "String", "Default": "<KDC-PRINCIPAL-PASSWORD>" }, "DataserverDomain": { "Description": "Privacera Dataserver Domain", "Type": "String", "Default": ".ec2.internal" }, "PrivaceraDownloadUrl": { "Description": "Privacera Base Download Url", "Type": "String", "Default": "https://privaceracloud.com/api/public/get/emr_script/<API-KEY>" } }, "Resources": { "EMRCLUSTER": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": { "Ref": "CLUSTERNAME" }, "KerberosAttributes": { "Realm": "EC2.INTERNAL", "KdcAdminPassword": { "Ref": "KdcAdminPassword" }, "CrossRealmTrustPrincipalPassword": { "Ref": "CrossRealmTrustPrincipalPassword" } }, "SecurityConfiguration": { "Ref": "KDCName" }, "VisibleToAllUsers": true, "EbsRootVolumeSize": 15, "Instances": { "MasterInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Master Instance Group" }, "CoreInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Core Instance Group" }, "Ec2KeyName": { "Ref": "Ec2KeyName" }, "EmrManagedSlaveSecurityGroup": { "Fn::Sub": "${SlaveSecurityGroup}" }, "EmrManagedMasterSecurityGroup": { "Fn::Sub": "${MasterSecurityGroup}" }, "ServiceAccessSecurityGroup": { "Fn::Sub": "${ServiceAccessSecurityGroup}" }, "Ec2SubnetId": { "Fn::Sub": "${Ec2SubnetId}" }, "TerminationProtected": true }, "BootstrapActions": [ { "Name": "Install Spark OLAC in Master Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=true" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac" } ] } }, { "Name": "Install Spark OLAC in Core Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=false" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac" } ] } } ], "Applications": [ { "Name": "Hive" }, { "Name": "Spark" }, { "Name": "PrestoSQL" }, { "Name": "Zeppelin" }, { "Name": "Livy" }, { "Name": "Hue" } ], "Configurations": [ { "Classification": "spark", "ConfigurationProperties": { "maximizeResourceAllocation": "true" }, "Configurations": [] }, { "Classification": "spark-hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.server2.enable.doAs": "false", "parquet.column.index.access": "true", "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "prestosql-connector-hive", "ConfigurationProperties": { "hive.metastore": "glue", "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } }, { "Classification": "livy-conf", "ConfigurationProperties": { "livy.impersonation.enabled": "true" } }, { "Classification": "core-site", "ConfigurationProperties": { "hadoop.proxyuser.livy.groups": "*", "hadoop.proxyuser.livy.hosts": "*" } } ], "LogUri": "s3://<BUCKET_NAME>/emr/emr_logs/", "JobFlowRole": "<node_role_name>", "ServiceRole": "EMR_DefaultRole", "ReleaseLabel": { "Fn::Sub": "${EMRVersion}" } } } } }
{ "Parameters": { "CLUSTERNAME": { "Description": "Name of the emr cluster", "Type": "String", "Default": "PCloud-EMR-Spark-Hive-Presto-OLAC" }, "EMRRegion": { "Description": "aws region name", "Type": "String", "Default": "us-east-1" }, "EMRVersion": { "Description": "Emr version", "Type": "String", "Default": "emr-5.33.1" }, "MasterSecurityGroup": { "Description": "Emr master/edge node security group", "Type": "String", "Default": "sg-0bdXXXXXXXdb" }, "SlaveSecurityGroup": { "Description": "Emr worker/slave node security group", "Type": "String", "Default": "sg-010XXXXXXX1b" }, "ServiceAccessSecurityGroup": { "Description": "Emr service access security group", "Type": "String", "Default": "sg-068XXXXXXX02" }, "Ec2SubnetId": { "Description": "Ec2 subnet id", "Type": "String", "Default": "subnet-04aXXXXXXXcd" }, "KDCName": { "Description": "KDC Name", "Type": "String", "Default": "emr_sec_config" }, "HiveMetaStoreS3Path": { "Description": "Hive metastore s3 path", "Type": "String", "Default": "s3://<BUCKET_NAME>/emr/hive_warehouse" }, "Ec2KeyName": { "Description": "Ec2 keypair name", "Type": "String", "Default": "<EMR-ACCESS-KEY>" }, "Market": { "Description": "Ec2 Instance market type", "Type": "String", "Default": "ON_DEMAND" }, "KdcAdminPassword": { "Description": "KDC admin user password", "Type": "String", "Default": "<KDC-USER-PASSWORD>" }, "CrossRealmTrustPrincipalPassword": { "Description": "KDC Cross Realm Trust Principal password", "Type": "String", "Default": "<KDC-PRINCIPAL-PASSWORD>" }, "DataserverDomain": { "Description": "Privacera Dataserver Domain", "Type": "String", "Default": ".ec2.internal" }, "PrivaceraDownloadUrl": { "Description": "Privacera Base Download Url", "Type": "String", "Default": "https://privaceracloud.com/api/public/get/emr_script/<API-KEY>" } }, "Resources": { "EMRCLUSTER": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": { "Ref": "CLUSTERNAME" }, "KerberosAttributes": { "Realm": "EC2.INTERNAL", "KdcAdminPassword": { "Ref": "KdcAdminPassword" }, "CrossRealmTrustPrincipalPassword": { "Ref": "CrossRealmTrustPrincipalPassword" } }, "SecurityConfiguration": { "Ref": "KDCName" }, "VisibleToAllUsers": true, "EbsRootVolumeSize": 15, "Instances": { "MasterInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Master Instance Group" }, "CoreInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Core Instance Group" }, "Ec2KeyName": { "Ref": "Ec2KeyName" }, "EmrManagedSlaveSecurityGroup": { "Fn::Sub": "${SlaveSecurityGroup}" }, "EmrManagedMasterSecurityGroup": { "Fn::Sub": "${MasterSecurityGroup}" }, "ServiceAccessSecurityGroup": { "Fn::Sub": "${ServiceAccessSecurityGroup}" }, "Ec2SubnetId": { "Fn::Sub": "${Ec2SubnetId}" }, "TerminationProtected": true }, "BootstrapActions": [ { "Name": "Install Spark OLAC in Master Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=true" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac" } ] } }, { "Name": "Install Spark OLAC in Core Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=false" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac" } ] } } ], "Applications": [ { "Name": "Hive" }, { "Name": "Spark" }, { "Name": "Presto" }, { "Name": "Zeppelin" }, { "Name": "Livy" }, { "Name": "Hue" } ], "Configurations": [ { "Classification": "spark", "ConfigurationProperties": { "maximizeResourceAllocation": "true" }, "Configurations": [] }, { "Classification": "spark-hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.server2.enable.doAs": "false", "parquet.column.index.access": "true", "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "presto-connector-hive", "ConfigurationProperties": { "hive.metastore": "glue", "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } }, { "Classification": "livy-conf", "ConfigurationProperties": { "livy.impersonation.enabled": "true" } }, { "Classification": "core-site", "ConfigurationProperties": { "hadoop.proxyuser.livy.groups": "*", "hadoop.proxyuser.livy.hosts": "*" } } ], "LogUri": "s3://<BUCKET_NAME>/emr/emr_logs/", "JobFlowRole": "<node_role_name>", "ServiceRole": "EMR_DefaultRole", "ReleaseLabel": { "Fn::Sub": "${EMRVersion}" } } } } }
{ "Parameters": { "CLUSTERNAME": { "Description": "Name of the emr cluster", "Type": "String", "Default": "PCloud-EMR-Spark-Hive-Trino-FGAC" }, "EMRRegion": { "Description": "aws region name", "Type": "String", "Default": "us-east-1" }, "EMRVersion": { "Description": "Emr version", "Type": "String", "Default": "emr-6.4.0" }, "MasterSecurityGroup": { "Description": "Emr master/edge node security group", "Type": "String", "Default": "sg-0bdXXXXXXXdb" }, "SlaveSecurityGroup": { "Description": "Emr worker/slave node security group", "Type": "String", "Default": "sg-010XXXXXXX1b" }, "ServiceAccessSecurityGroup": { "Description": "Emr service access security group", "Type": "String", "Default": "sg-068XXXXXXX02" }, "Ec2SubnetId": { "Description": "Ec2 subnet id", "Type": "String", "Default": "subnet-04aXXXXXXXcd" }, "KDCName": { "Description": "KDC Name", "Type": "String", "Default": "emr_sec_config" }, "HiveMetaStoreS3Path": { "Description": "Hive metastore s3 path", "Type": "String", "Default": "s3://<BUCKET_NAME>/emr/hive_warehouse" }, "Ec2KeyName": { "Description": "Ec2 keypair name", "Type": "String", "Default": "<EMR-ACCESS-KEY>" }, "Market": { "Description": "Ec2 Instance market type", "Type": "String", "Default": "ON_DEMAND" }, "KdcAdminPassword": { "Description": "KDC admin user password", "Type": "String", "Default": "<KDC-USER-PASSWORD>" }, "CrossRealmTrustPrincipalPassword": { "Description": "KDC Cross Realm Trust Principal password", "Type": "String", "Default": "<KDC-PRINCIPAL-PASSWORD>" }, "DataserverDomain": { "Description": "Privacera Dataserver Domain", "Type": "String", "Default": ".ec2.internal" }, "PrivaceraDownloadUrl": { "Description": "Privacera Base Download Url", "Type": "String", "Default": "https://privaceracloud.com/api/public/get/emr_script/<API-KEY>" } }, "Resources": { "EMRCLUSTER": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": { "Ref": "CLUSTERNAME" }, "KerberosAttributes": { "Realm": "EC2.INTERNAL", "KdcAdminPassword": { "Ref": "KdcAdminPassword" }, "CrossRealmTrustPrincipalPassword": { "Ref": "CrossRealmTrustPrincipalPassword" } }, "SecurityConfiguration": { "Ref": "KDCName" }, "VisibleToAllUsers": true, "EbsRootVolumeSize": 15, "Instances": { "MasterInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Master Instance Group" }, "CoreInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Core Instance Group" }, "Ec2KeyName": { "Ref": "Ec2KeyName" }, "EmrManagedSlaveSecurityGroup": { "Fn::Sub": "${SlaveSecurityGroup}" }, "EmrManagedMasterSecurityGroup": { "Fn::Sub": "${MasterSecurityGroup}" }, "ServiceAccessSecurityGroup": { "Fn::Sub": "${ServiceAccessSecurityGroup}" }, "Ec2SubnetId": { "Fn::Sub": "${Ec2SubnetId}" }, "TerminationProtected": true }, "BootstrapActions": [ { "Name": "Install Spark FGAC in Master Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=true" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac" } ] } }, { "Name": "Install Spark FGAC in Core Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=false" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac" } ] } } ], "Applications": [ { "Name": "Hive" }, { "Name": "Spark" }, { "Name": "Trino" }, { "Name": "Zeppelin" }, { "Name": "Livy" }, { "Name": "Hue" } ], "Configurations": [ { "Classification": "spark", "ConfigurationProperties": { "maximizeResourceAllocation": "true" }, "Configurations": [] }, { "Classification": "spark-hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.server2.enable.doAs": "false", "parquet.column.index.access": "true", "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "trino-connector-hive", "ConfigurationProperties": { "hive.metastore": "glue", "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } }, { "Classification": "livy-conf", "ConfigurationProperties": { "livy.impersonation.enabled": "true" } }, { "Classification": "core-site", "ConfigurationProperties": { "hadoop.proxyuser.livy.groups": "*", "hadoop.proxyuser.livy.hosts": "*" } } ], "LogUri": "s3://<BUCKET_NAME>/emr/emr_logs/", "JobFlowRole": "<node_role_name>", "ServiceRole": "EMR_DefaultRole", "ReleaseLabel": { "Fn::Sub": "${EMRVersion}" } } } } }
{ "Parameters": { "CLUSTERNAME": { "Description": "Name of the emr cluster", "Type": "String", "Default": "PCloud-EMR-Spark-Hive-PrestoSQL-FGAC" }, "EMRRegion": { "Description": "aws region name", "Type": "String", "Default": "us-east-1" }, "EMRVersion": { "Description": "Emr version", "Type": "String", "Default": "emr-6.2.1" }, "MasterSecurityGroup": { "Description": "Emr master/edge node security group", "Type": "String", "Default": "sg-0bdXXXXXXXdb" }, "SlaveSecurityGroup": { "Description": "Emr worker/slave node security group", "Type": "String", "Default": "sg-010XXXXXXX1b" }, "ServiceAccessSecurityGroup": { "Description": "Emr service access security group", "Type": "String", "Default": "sg-068XXXXXXX02" }, "Ec2SubnetId": { "Description": "Ec2 subnet id", "Type": "String", "Default": "subnet-04aXXXXXXXcd" }, "KDCName": { "Description": "KDC Name", "Type": "String", "Default": "emr_sec_config" }, "HiveMetaStoreS3Path": { "Description": "Hive metastore s3 path", "Type": "String", "Default": "s3://<BUCKET_NAME>/emr/hive_warehouse" }, "Ec2KeyName": { "Description": "Ec2 keypair name", "Type": "String", "Default": "<EMR-ACCESS-KEY>" }, "Market": { "Description": "Ec2 Instance market type", "Type": "String", "Default": "ON_DEMAND" }, "KdcAdminPassword": { "Description": "KDC admin user password", "Type": "String", "Default": "<KDC-USER-PASSWORD>" }, "CrossRealmTrustPrincipalPassword": { "Description": "KDC Cross Realm Trust Principal password", "Type": "String", "Default": "<KDC-PRINCIPAL-PASSWORD>" }, "DataserverDomain": { "Description": "Privacera Dataserver Domain", "Type": "String", "Default": ".ec2.internal" }, "PrivaceraDownloadUrl": { "Description": "Privacera Base Download Url", "Type": "String", "Default": "https://privaceracloud.com/api/public/get/emr_script/<API-KEY>" } }, "Resources": { "EMRCLUSTER": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": { "Ref": "CLUSTERNAME" }, "KerberosAttributes": { "Realm": "EC2.INTERNAL", "KdcAdminPassword": { "Ref": "KdcAdminPassword" }, "CrossRealmTrustPrincipalPassword": { "Ref": "CrossRealmTrustPrincipalPassword" } }, "SecurityConfiguration": { "Ref": "KDCName" }, "VisibleToAllUsers": true, "EbsRootVolumeSize": 15, "Instances": { "MasterInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Master Instance Group" }, "CoreInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Core Instance Group" }, "Ec2KeyName": { "Ref": "Ec2KeyName" }, "EmrManagedSlaveSecurityGroup": { "Fn::Sub": "${SlaveSecurityGroup}" }, "EmrManagedMasterSecurityGroup": { "Fn::Sub": "${MasterSecurityGroup}" }, "ServiceAccessSecurityGroup": { "Fn::Sub": "${ServiceAccessSecurityGroup}" }, "Ec2SubnetId": { "Fn::Sub": "${Ec2SubnetId}" }, "TerminationProtected": true }, "BootstrapActions": [ { "Name": "Install Spark FGAC in Master Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=true" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac" } ] } }, { "Name": "Install Spark FGAC in Core Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=false" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac" } ] } } ], "Applications": [ { "Name": "Hive" }, { "Name": "Spark" }, { "Name": "PrestoSQL" }, { "Name": "Zeppelin" }, { "Name": "Livy" }, { "Name": "Hue" } ], "Configurations": [ { "Classification": "spark", "ConfigurationProperties": { "maximizeResourceAllocation": "true" }, "Configurations": [] }, { "Classification": "spark-hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.server2.enable.doAs": "false", "parquet.column.index.access": "true", "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "prestosql-connector-hive", "ConfigurationProperties": { "hive.metastore": "glue", "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } }, { "Classification": "livy-conf", "ConfigurationProperties": { "livy.impersonation.enabled": "true" } }, { "Classification": "core-site", "ConfigurationProperties": { "hadoop.proxyuser.livy.groups": "*", "hadoop.proxyuser.livy.hosts": "*" } } ], "LogUri": "s3://<BUCKET_NAME>/emr/emr_logs/", "JobFlowRole": "<node_role_name>", "ServiceRole": "EMR_DefaultRole", "ReleaseLabel": { "Fn::Sub": "${EMRVersion}" } } } } }
{ "Parameters": { "CLUSTERNAME": { "Description": "Name of the emr cluster", "Type": "String", "Default": "PCloud-EMR-Spark-Hive-Presto-FGAC" }, "EMRRegion": { "Description": "aws region name", "Type": "String", "Default": "us-east-1" }, "EMRVersion": { "Description": "Emr version", "Type": "String", "Default": "emr-5.33.1" }, "MasterSecurityGroup": { "Description": "Emr master/edge node security group", "Type": "String", "Default": "sg-0bdXXXXXXXdb" }, "SlaveSecurityGroup": { "Description": "Emr worker/slave node security group", "Type": "String", "Default": "sg-010XXXXXXX1b" }, "ServiceAccessSecurityGroup": { "Description": "Emr service access security group", "Type": "String", "Default": "sg-068XXXXXXX02" }, "Ec2SubnetId": { "Description": "Ec2 subnet id", "Type": "String", "Default": "subnet-04aXXXXXXXcd" }, "KDCName": { "Description": "KDC Name", "Type": "String", "Default": "emr_sec_config" }, "HiveMetaStoreS3Path": { "Description": "Hive metastore s3 path", "Type": "String", "Default": "s3://<BUCKET_NAME>/emr/hive_warehouse" }, "Ec2KeyName": { "Description": "Ec2 keypair name", "Type": "String", "Default": "<EMR-ACCESS-KEY>" }, "Market": { "Description": "Ec2 Instance market type", "Type": "String", "Default": "ON_DEMAND" }, "KdcAdminPassword": { "Description": "KDC admin user password", "Type": "String", "Default": "<KDC-USER-PASSWORD>" }, "CrossRealmTrustPrincipalPassword": { "Description": "KDC Cross Realm Trust Principal password", "Type": "String", "Default": "<KDC-PRINCIPAL-PASSWORD>" }, "DataserverDomain": { "Description": "Privacera Dataserver Domain", "Type": "String", "Default": ".ec2.internal" }, "PrivaceraDownloadUrl": { "Description": "Privacera Base Download Url", "Type": "String", "Default": "https://privaceracloud.com/api/public/get/emr_script/<API-KEY>" } }, "Resources": { "EMRCLUSTER": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": { "Ref": "CLUSTERNAME" }, "KerberosAttributes": { "Realm": "EC2.INTERNAL", "KdcAdminPassword": { "Ref": "KdcAdminPassword" }, "CrossRealmTrustPrincipalPassword": { "Ref": "CrossRealmTrustPrincipalPassword" } }, "SecurityConfiguration": { "Ref": "KDCName" }, "VisibleToAllUsers": true, "EbsRootVolumeSize": 15, "Instances": { "MasterInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Master Instance Group" }, "CoreInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Core Instance Group" }, "Ec2KeyName": { "Ref": "Ec2KeyName" }, "EmrManagedSlaveSecurityGroup": { "Fn::Sub": "${SlaveSecurityGroup}" }, "EmrManagedMasterSecurityGroup": { "Fn::Sub": "${MasterSecurityGroup}" }, "ServiceAccessSecurityGroup": { "Fn::Sub": "${ServiceAccessSecurityGroup}" }, "Ec2SubnetId": { "Fn::Sub": "${Ec2SubnetId}" }, "TerminationProtected": true }, "BootstrapActions": [ { "Name": "Install Spark FGAC in Master Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=true" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac" } ] } }, { "Name": "Install Spark FGAC in Core Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=false" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac" } ] } } ], "Applications": [ { "Name": "Hive" }, { "Name": "Spark" }, { "Name": "Presto" }, { "Name": "Zeppelin" }, { "Name": "Livy" }, { "Name": "Hue" } ], "Configurations": [ { "Classification": "spark", "ConfigurationProperties": { "maximizeResourceAllocation": "true" }, "Configurations": [] }, { "Classification": "spark-hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.server2.enable.doAs": "false", "parquet.column.index.access": "true", "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "presto-connector-hive", "ConfigurationProperties": { "hive.metastore": "glue", "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } }, { "Classification": "livy-conf", "ConfigurationProperties": { "livy.impersonation.enabled": "true" } }, { "Classification": "core-site", "ConfigurationProperties": { "hadoop.proxyuser.livy.groups": "*", "hadoop.proxyuser.livy.hosts": "*" } } ], "LogUri": "s3://<BUCKET_NAME>/emr/emr_logs/", "JobFlowRole": "<node_role_name>", "ServiceRole": "EMR_DefaultRole", "ReleaseLabel": { "Fn::Sub": "${EMRVersion}" } } } } }
{ "Parameters": { "CLUSTERNAME": { "Description": "Name of the emr cluster", "Type": "String", "Default": "PCloud-EMR-Hive-Trino" }, "EMRRegion": { "Description": "aws region name", "Type": "String", "Default": "us-east-1" }, "EMRVersion": { "Description": "Emr version", "Type": "String", "Default": "emr-6.4.0" }, "MasterSecurityGroup": { "Description": "Emr master/edge node security group", "Type": "String", "Default": "sg-0bdXXXXXXXdb" }, "SlaveSecurityGroup": { "Description": "Emr worker/slave node security group", "Type": "String", "Default": "sg-010XXXXXXX1b" }, "ServiceAccessSecurityGroup": { "Description": "Emr service access security group", "Type": "String", "Default": "sg-068XXXXXXX02" }, "Ec2SubnetId": { "Description": "Ec2 subnet id", "Type": "String", "Default": "subnet-04aXXXXXXXcd" }, "KDCName": { "Description": "KDC Name", "Type": "String", "Default": "emr_sec_config" }, "HiveMetaStoreS3Path": { "Description": "Hive metastore s3 path", "Type": "String", "Default": "s3://<BUCKET_NAME>/emr/hive_warehouse" }, "Ec2KeyName": { "Description": "Ec2 keypair name", "Type": "String", "Default": "<EMR-ACCESS-KEY>" }, "Market": { "Description": "Ec2 Instance market type", "Type": "String", "Default": "ON_DEMAND" }, "KdcAdminPassword": { "Description": "KDC admin user password", "Type": "String", "Default": "<KDC-USER-PASSWORD>" }, "CrossRealmTrustPrincipalPassword": { "Description": "KDC Cross Realm Trust Principal password", "Type": "String", "Default": "<KDC-PRINCIPAL-PASSWORD>" }, "DataserverDomain": { "Description": "Privacera Dataserver Domain", "Type": "String", "Default": ".ec2.internal" }, "PrivaceraDownloadUrl": { "Description": "Privacera Base Download Url", "Type": "String", "Default": "https://privaceracloud.com/api/public/get/emr_script/<API-KEY>" } }, "Resources": { "EMRCLUSTER": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": { "Ref": "CLUSTERNAME" }, "KerberosAttributes": { "Realm": "EC2.INTERNAL", "KdcAdminPassword": { "Ref": "KdcAdminPassword" }, "CrossRealmTrustPrincipalPassword": { "Ref": "CrossRealmTrustPrincipalPassword" } }, "SecurityConfiguration": { "Ref": "KDCName" }, "VisibleToAllUsers": true, "EbsRootVolumeSize": 15, "Instances": { "MasterInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Master Instance Group" }, "CoreInstanceGroup": { "InstanceCount": 1, "InstanceType": "m5.xlarge", "Market": { "Fn::Sub": "${Market}" }, "Name": "Core Instance Group" }, "Ec2KeyName": { "Ref": "Ec2KeyName" }, "EmrManagedSlaveSecurityGroup": { "Fn::Sub": "${SlaveSecurityGroup}" }, "EmrManagedMasterSecurityGroup": { "Fn::Sub": "${MasterSecurityGroup}" }, "ServiceAccessSecurityGroup": { "Fn::Sub": "${ServiceAccessSecurityGroup}" }, "Ec2SubnetId": { "Fn::Sub": "${Ec2SubnetId}" }, "TerminationProtected": true }, "BootstrapActions": [ { "Name": "Install Spark FGAC in Master Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=true" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac" } ] } }, { "Name": "Install Spark FGAC in Core Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=false" }, { "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac" } ] } } ], "Applications": [ { "Name": "Hive" }, { "Name": "Trino" } ], "Configurations": [ { "Classification": "hive-site", "ConfigurationProperties": { "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", "hive.server2.enable.doAs": "false", "parquet.column.index.access": "true", "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreS3Path" } } }, { "Classification": "trino-connector-hive", "ConfigurationProperties": { "hive.metastore": "glue", "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } } ], "LogUri": "s3://<BUCKET_NAME>/emr/emr_logs/", "JobFlowRole": "<node_role_name>", "ServiceRole": "EMR_DefaultRole", "ReleaseLabel": { "Fn::Sub": "${EMRVersion}" } } } } }
Note
For additional options in bootstrap actions, such as the Delta lake option and the Hive Custom policy service repo name. See Bootstrap actions.
Create EMR cluster using CloudFormation AWS CLI
Run the following command to create AWS EMR:
aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-creation --template-body file://<EMR-TEMPLATE-JSON-FILE-PATH>
Create EMR cluster using CloudFormation AWS console
Follow these steps to create a stack on the AWS CloudFormation console:
Login in to the AWS Management Console and navigate to Cloudformation Console.
Click the Create Stack on the Stacks page.
On the Specify template page, select Template is ready option, and then Upload a template file.
Click the Choose File button, and select your modified
emr template json
file.Click the Next button.
On the Specify stack details page, enter a stack name in the Stack name box.
Click the Next button.
Update Configure stack options as per your requirements.
Click the Next button.
Click the Create Stack button.
You will see the progress of your stack in the CloudFormation > Stacks section.
Manually create EMR cluster using AWS EMR console
Follow these steps to manually create AWS EMR cluster using AWS EMR console:
Login to AWS Management Console and navigate to EMR Console.
Click the Create cluster button.
click Go to advanced, and select Release. For example,
emr-6.4.0
Configure applications for AWS EMR cluster
Follow these steps to configure individual application for AWS EMR cluster:
Select additional applications as per your environment, such as
Spark
,Hadoop
,Hive
,Trino
orPresto
.In Edit software settings, select Enter configuration and add the following individual application's configuration array.
{ "Classification": "spark", "ConfigurationProperties": { "maximizeResourceAllocation": "true" }, "Configurations": [] }, { "Classification": "spark-hive-site", "ConfigurationProperties": { "hive.metastore.warehouse.dir": "s3://<bucket-name>/<path>" } }
{ "Classification": "hive-site", "ConfigurationProperties": { "javax.jdo.option.ConnectionURL": "", "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver", "javax.jdo.option.ConnectionUserName": "root", "javax.jdo.option.ConnectionPassword": "welcome1", "hive.server2.enable.doAs": "false", "parquet.column.index.access": "true", "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem", "hive.metastore.warehouse.dir": "s3://<bucket-name>/<path>" } }
Note
If the EMR version is 6.4.0 or above, use Trino in place of {application} in the array. Use presto-sql for older versions.
Presto and Trino/Presto-sql are incompatible. Only one at a time should be used.
EHM configuration:
{ "Classification": "<application>-connector-hive", "ConfigurationProperties": { "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } }
Glue configuration:
{ "Classification": "<application>-connector-hive", "ConfigurationProperties": { "hive.metastore": "glue", "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } }
Presto and Trino/Presto-sql are incompatible. Only one at a time should be used.
{ "Classification": "presto-connector-hive", "ConfigurationProperties": { "hive.metastore": "glue", "hive.allow-drop-table": "true", "hive.allow-add-column": "true", "hive.allow-rename-column": "true", "connector.name": "hive-hadoop2", "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml", "hive.s3-file-system-type": "EMRFS", "hive.hdfs.impersonation.enabled": "false", "hive.allow-drop-column": "true", "hive.allow-rename-table": "true" } }
Bootstrap actions
In Hardware settings, select Networking, Node, and Instance values as appropriate for your environment.
Go to General cluster settings, configure the cluster name, logging, debugging, and termination protection as needed for your environment.
Configure the General cluster settings by including two scripts that install the Privacera Signing Agent on both the master and worker nodes.
In Additional Options, expand Bootstrap Actions, select bootstrap action Run if, and then click Configure and add.
In the Bootstrap actions dialog, set the name of the master and core node to Privacera Signing Agent.
Copy the following script into Optional Arguments using your own
emr-script-download-url
script URL. See Obtain EMR script download URL.Click Add when finished.
Optional Arguments for Privacera installation script:
Master node
instance.isMaster=true "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"
Core node
instance.isMaster=false "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"
Optional Arguments for Privacera installation script with delta.
Export the following two additional variables in the Bootstrap actions to enable Delta Lake:
SPARK_DELTA_LAKE_ENABLE SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL
Master node
instance.isMaster=true "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fbac"
Core node
instance.isMaster=false "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fbac"
Note
Ensure the following:
The Delta Lake core jar is dependent on the Spark version. You must choose the correct version for your EMR.
Obtain the appropriate download URL for the Delta Lake core jar and update.
Delta Lake and Spark version compatibility. For more information about Delta Lake releases, see Compatibility with Apache Spark For example, to download the delta-core version 2.12.1.0.1, go to the following URL: repo1.maven.org/delta/delta-core 2.12-1.0.1.jar.
Optional Arguments for Privacera installation script:
Master node
instance.isMaster=true "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"
Core node
instance.isMaster=false "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"
Optional Arguments for Privacera installation script with delta.
Export the following two additional variables in the Bootstrap actions to enable Delta Lake:
SPARK_DELTA_LAKE_ENABLE SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL
Master node
instance.isMaster=true "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"
Core node
instance.isMaster=false "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"
Note
Ensure the following:
The Delta Lake core jar is dependent on the Spark version. You must choose the correct version for your EMR.
Obtain the appropriate download URL for the Delta Lake core jar and update.
Delta Lake and Spark version compatibility. For more information about Delta Lake releases, see Compatibility with Apache Spark For example, to download the delta-core version 2.12.1.0.1, go to the following URL: repo1.maven.org/delta/delta-core 2.12-1.0.1.jar.
Optional Arguments for Privacera installation script with custom Hive repository.
Master node
instance.isMaster=true "export EMR_HIVE_SERVICE_NAME=<hive_repo_name>; export EMR_TRINO_HIVE_SERVICE_NAME=<trino_hive_repo_name>; export EMR_SPARK_HIVE_SERVICE_NAME=<spark_hive_repo_name>; wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"
Core node
instance.isMaster=false "export EMR_HIVE_SERVICE_NAME=<hive_repo_name>; export EMR_TRINO_HIVE_SERVICE_NAME=<trino_hive_repo_name>; export EMR_SPARK_HIVE_SERVICE_NAME=<spark_hive_repo_name>; wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"
Note
Ensure the following:
You can customized <hive_repo_name> for the Hive application in EMR.
You can customized <spark_hive_repo_name> For the spark applications in EMR.
You can customized <trino_hive_repo_name> for the Trino application in EMR.
Configure security options in EMR cluster
Follow these steps to configure security options in EMR cluster:
In Security Options, select security options as per your environment.
Open Security Configuration, and select the configuration you created earlier, e.g., "PRIVACERA_KDC".
Then in the the following fields, enter values:
In the following fields, enter values:
Realm
KDC admin password
Click Create cluster to complete.