Skip to main content

Privacera Documentation

Set up Discovery on AWS for Privacera Platform

This topic shows you how to set up the AWS configuration for installing Privacera Discovery in a Docker and Kubernetes (EKS) environment.

IAM policies for Discovery on AWS

To use the Privacera Discovery service, ensure the following IAM policies are attached to the Privacera_PM_Role role to access the AWS services.

The policy to create AWS resources is required only during installation or when Discovery is updated through Privacera Manager. This policy gives permissions to Privacera Manager to create AWS resources like DynamoDB, Kinesis, SQS, and S3 using terraform.

  • ${AWS_REGION}: AWS region where the resources will get created.

     {
    "Version":"2012-10-17",
    "Statement":[
        {
            "Sid":"CreateDynamodb",
            "Effect":"Allow",
            "Action":[
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:ListTables",
                "dynamodb:TagResource",
                "dynamodb:UntagResource",
                "dynamodb:UpdateTable",
                "dynamodb:UpdateTableReplicaAutoScaling",
                "dynamodb:UpdateTimeToLive",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:ListTagsOfResource",
                "dynamodb:DescribeContinuousBackups"
            ],
            "Resource":"arn:aws:dynamodb:${AWS_REGION}:*:table/privacera*"
        },
        {
            "Sid":"CreateKinesis",
            "Effect":"Allow",
            "Action":[
                "kinesis:CreateStream",
                "kinesis:ListStreams",
                "kinesis:UpdateShardCount"
            ],
            "Resource":"arn:aws:kinesis:${AWS_REGION}:*:stream/privacera*"
        },
        {
            "Sid":"CreateS3Bucket",
            "Effect":"Allow",
            "Action":[
                "s3:CreateBucket",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation"
                
            ],
            "Resource":[
                "arn:aws:s3:::*"
            ]
        },
        {
            "Sid":"CreateSQSMessages",
            "Effect":"Allow",
            "Action":[
                "sqs:CreateQueue",
                "sqs:ListQueues"
            ],
            "Resource":[
                "arn:aws:sqs:${AWS_REGION}:${ACCOUNNT_ID}:privacera*"
            ]
        }
    ]
    }
 

CLI configuration for Discovery on AWS

  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.discovery.aws.yml config/custom-vars/
    vi config/custom-vars/vars.discovery.aws.yml
    
  3. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    DISCOVERY_BUCKET_NAME: "<PLEASE_CHANGE>"
    

    To configure a bucket, add the property as follows, where bucket-1 is the name of the bucket:

    DISCOVERY_BUCKET_NAME: "bucket-1"
    

    To configure a bucket containing a folder, add the property as follows:

    DISCOVERY_BUCKET_NAME: "bucket-1/folder1"
    
  4. Uncomment/Add the following variable to enable Autoscalability of Executor pods:

    DISCOVERY_K8S_SPARK_DYNAMIC_ALLOCATION_ENABLED: "true"
    
  5. (Optional) If you want to customize Discovery configuration further, you can add custom Discovery properties. For more information, refer to Set custom Discovery properties on Privacera Platform.

    For example, by default, the username and password for the Discovery service is padmin/padmin. If you choose to change it, refer to Add custom properties using Privacera Manager on Privacera Platform.

  6. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

Configuration properties for Discovery on AWS

Property

Description

Example

DISCOVERY_BUCKET_NAME

Set the bucket name where Discovery will store its metadata files

container1

[Properties of Topic and Table names](../pm-ig/customize_topic_and_tables_names.md)

Topic and Table names are assigned by default in Privacera Discovery. To customize any topic or table name, refer to the link.

Enable realtime scan

An AWS SQS queue is required, if you want to enable realtime scan on the S3 bucket.

After running the PM update command, an SQS queue will be created for you automatically with the name, privacera_bucket_sqs_{{DEPLOYMENT_ENV_NAME}}, where {{DEPLOYMENT_ENV_NAME}} is the environment name you set in the vars.privacera.yml file. This queue name will appear in the list of queues of your AWS SQS account.

If you have an SQS queue which you want to use, add the DISCOVERY_BUCKET_SQS_NAME property in the vars.discovery.aws.yml file and assign your SQS queue name.

If you want to enable realtime scan on the bucket, see Configure S3 for real-time scanning on Privacera Platform.