Skip to main content

Privacera Documentation

Connect Spark standalone to Privacera Platform using the Privacera plugin

You can use Privacera Manager to generate the setup script and Spark custom configuration for SSL/TSL to install Privacera Plugin in an open-source Spark environment.

Note

This topic is only applicable to Spark version 3.x .

Prerequisites

Ensure the following prerequisites are met:

  • A working Spark environment.

  • Privacera services must be up and running.

Configuration

  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.spark-standalone.yml config/custom-vars/
    vi config/custom-vars/vars.spark-standalone.yml
  3. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    SPARK_STANDALONE_ENABLE:"true"
    SPARK_ENV_TYPE:"<PLEASE_CHANGE>"
    SPARK_HOME:"<PLEASE_CHANGE>"
    SPARK_USER_HOME:"<PLEASE_CHANGE>"
    
  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

    After the update is complete, the setup script (privacera_setup.sh, standalone_spark_FGAC.sh, standalone_spark_OLAC.sh) and Spark custom configurations (spark_custom_conf.zip) for SSL will be generated at the path, cd ~/privacera/privacera-manager/output/spark-standalone.

  5. You can either enable FGAC or OLAC in your Spark environment.

    Enable FGAC

    To enable Fine-grained access control (FGAC), do the following:

    1. Copy standalone_spark_FGAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x standalone_spark_FGAC.sh
      
    3. Run the script to install the Privacera plugin in your Spark environment.

      ./standalone_spark_FGAC.sh

    Enable OLAC

    To enable Object level access control (OLAC), do the following:

    1. Copy standalone_spark_OLAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x standalone_spark_OLAC.sh
      
    3. Run the script to install the Privacera plugin in your Spark environment.

      ./standalone_spark_OLAC.sh
      

Configuration properties

Property

Description

Example

SPARK_STANDALONE_ENABLE

Property to enable generating setup script and configs for Spark standalone plugin installation.

true

SPARK_ENV_TYPE

Set the environment type. It can be any user-defined type.

For example, if you're working in an environment that runs locally, you can set the type as local; for a production environment, set it as prod.

local

SPARK_HOME

Home path of your Spark installation.

~/privacera/spark/spark-3.1.1-bin-hadoop3.2

SPARK_USER_HOME

User home directory of your Spark installation.

/home/ec2-user

SPARK_STANDALONE_RANGER_IS_FALLBACK_SUPPORTED

Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.

To enable the fallback, set to true; to disable, set to false.

true

Validations

To verify the successful installation of Privacera plugin, do the following:

  1. Create an S3 bucket ${S3_BUCKET} for sample testing.

  2. Download sample data using the following link and put it in the ${S3_BUCKET} at location (s3://${S3_BUCKET}/customer_data).

    wget https://privacera-demo.s3.amazonaws.com/data/uploads/customer_data_clear/customer_data_without_header.csv
    
  3. (Optional) Add AWS JARS in Spark. Download the JARS according to the version of Spark Hadoop in your environment.

    cd  <SPARK_HOME>/jars
    

    For Spark-3.1.1 - Hadoop 3.2 version,

    wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar
    wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar
    
  4. Run the following command.

    cd <SPARK_HOME>/bin
    
  5. Run the spark-shell to execute scala commands.

    ./spark-shell
    

Validations with JWT Token

  1. Run the following command.

    cd <SPARK_HOME>/bin
    
  2. Set the JWT_TOKEN.

    JWT_TOKEN="<JWT_TOKEN>"
  3. Run the following command to start spark-shell with parameters.

    ./spark-shell --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"  --conf "spark.hadoop.privacera.jwt.oauth.enable=true"

Validations with JWT token and public key

  1. Create a local file with the public key, if the JWT token is generated by private/public key combination.

  2. Set the following according to the payload of JWT Token.

    JWT_TOKEN="<JWT_TOKEN>"
    #The following variables are optional, set it only if token has it else set it empty
    JWT_TOKEN_ISSUER="<JWT_TOKEN_ISSUER>"
    JWT_TOKEN_PUBLIC_KEY_FILE="<JWT_TOKEN_PUBLIC_KEY_FILE_PATH>"
    JWT_TOKEN_USER_KEY="<JWT_TOKEN_USER_KEY>"
    JWT_TOKEN_GROUP_KEY="<JWT_TOKEN_GROUP_KEY>"
    JWT_TOKEN_PARSER_TYPE="<JWT_TOKEN_PARSER_TYPE>"
  3. Run the following command to start spark-shell with parameters.

    ./spark-shell 
    --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" 
    --conf "spark.hadoop.privacera.jwt.oauth.enable=true" 
    --conf "spark.hadoop.privacera.jwt.token.publickey=${JWT_TOKEN_PUBLIC_KEY_FILE}" 
    --conf "spark.hadoop.privacera.jwt.token.issuer=${JWT_TOKEN_ISSUER}"
    --conf "spark.hadoop.privacera.jwt.token.parser.type=${JWT_TOKEN_PARSER_TYPE}" 
    --conf "spark.hadoop.privacera.jwt.token.userKey=${JWT_TOKEN_USER_KEY}" 
    --conf "spark.hadoop.privacera.jwt.token.groupKey=${JWT_TOKEN_GROUP_KEY}"

Use cases

  1. Add a policy in Access Manager with read permission to ${S3_BUCKET}.

    val file_path = "s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv"
    val df=spark.read.csv(file_path)
    df.show(5)
    
  2. Add a policy in Access Manager with delete and write permission to ${S3_BUCKET}.

    df.write.format("csv").mode("overwrite").save("s3a://${S3_BUCKET}/csv/customer_data.csv")