Skip to content

Privacera Plugin in Spark Standalone#

This section covers how you can use Privacera Manager to generate the setup script and Spark custom configuration for SSL/TSL to install Privacera Plugin in an open-source Spark environment.

Prerequisites

Ensure the following prerequisites are met:

  • A working Spark environment.
  • Privacera services must be up and running.

Configuration

  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.spark-standalone.yml config/custom-vars/
    vi config/custom-vars/vars.spark-standalone.yml
    
  3. Edit the following properties. For property details and description, click here.

    SPARK_STANDALONE_ENABLE: "true"
    SPARK_ENV_TYPE: "<PLEASE_CHANGE>"
    SPARK_HOME: "<PLEASE_CHANGE>"
    SPARK_USER_HOME: "<PLEASE_CHANGE>"
    
  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

    After the update is complete, the setup script (privacera_setup.sh, standalone_spark_FGAC.sh, standalone_spark_OLAC.sh) and Spark custom configurations (spark_custom_conf.zip) for SSL will be generated at the path, cd ~/privacera/privacera-manager/output/spark-standalone.

  5. In your Spark environment, either you can enable FGAC or OLAC. Use the following tabs:

    To enable Fine-grained access control (FGAC), do the following:

    If the Spark Version is 2.x.x

    1. Copy privacera_setup.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x privacera_setup.sh
      
    3. Run the script to install the Privacera plugin in your Spark environment.

      ./privacera_setup.sh
      

    If the Spark Version is 3.x.x

    1. Copy standalone_spark_FGAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x standalone_spark_FGAC.sh
      
    3. Run the script to install the Privacera plugin in your Spark environment.

      ./standalone_spark_FGAC.sh
      

    To enable Object level access control (OLAC), do the following:

    If the Spark Version is 3.x.x

    1. Copy standalone_spark_OLAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x standalone_spark_OLAC.sh
      
    3. Run the script to install the Privacera plugin in your Spark environment.

      ./standalone_spark_OLAC.sh
      

Validations

To verify the successful installation of Privacera plugin, do the following:

  1. Create an S3 bucket ${S3_BUCKET} for sample testing.

  2. Download sample data using the following link and put it in the ${S3_BUCKET} at location (s3://${S3_BUCKET}/customer_data).

    wget https://privacera-demo.s3.amazonaws.com/data/uploads/customer_data_clear/customer_data_without_header.csv
    
  3. (Optional) Add AWS JARS in Spark. Download the JARS according to the version of Spark Hadoop in your environment.

    cd  <SPARK_HOME>/jars
    

    For Spark-3.1.1 - Hadoop 3.2 version,

    wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar
    wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar
    

    For Spark-2.4.5 - Hadoop 2.7 version,

    wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar
    wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
    
  4. Run the following command.

    cd <SPARK_HOME>/bin
    
  5. Run the spark-shell to execute scala commands.

    ./spark-shell
    

Validations with JWT Token

  1. Run the following command.

    cd <SPARK_HOME>/bin
    
  2. Set the JWT_TOKEN.

    JWT_TOKEN="<JWT_TOKEN>"
    
  3. Run the following command to start spark-shell with parameters.

    ./spark-shell --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"  --conf "spark.hadoop.privacera.jwt.oauth.enable=true"
    

Validations with JWT Token and Public Key

  1. Create a local file with the public key, if the JWT token is generated by private/public key combination.

  2. Set the following according to the payload of JWT Token.

    JWT_TOKEN="<JWT_TOKEN>"
    #The following variables are optional, set it only if token has it else set it empty 
    JWT_TOKEN_ISSUER="<JWT_TOKEN_ISSUER>"
    JWT_TOKEN_PUBLIC_KEY_FILE="<JWT_TOKEN_PUBLIC_KEY_FILE_PATH>"
    JWT_TOKEN_USER_KEY="<JWT_TOKEN_USER_KEY>"
    JWT_TOKEN_GROUP_KEY="<JWT_TOKEN_GROUP_KEY>"
    JWT_TOKEN_PARSER_TYPE="<JWT_TOKEN_PARSER_TYPE>"
    
  3. Run the following command to start spark-shell with parameters.

    ./spark-shell --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" --conf "spark.hadoop.privacera.jwt.oauth.enable=true" --conf "spark.hadoop.privacera.jwt.token.publickey=${JWT_TOKEN_PUBLIC_KEY_FILE}" --conf "spark.hadoop.privacera.jwt.token.issuer=${JWT_TOKEN_ISSUER}" 
    --conf "spark.hadoop.privacera.jwt.token.parser.type=${JWT_TOKEN_PARSER_TYPE}" --conf "spark.hadoop.privacera.jwt.token.userKey=${JWT_TOKEN_USER_KEY}"
    --conf "spark.hadoop.privacera.jwt.token.groupKey=${JWT_TOKEN_GROUP_KEY}"
    

Use Cases

  1. Add a policy in Access Manager with read permission to ${S3_BUCKET}.

    val file_path = "s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv"
    val df=spark.read.csv(file_path)
    df.show(5)
    
  2. Add a policy in Access Manager with delete and write permission to ${S3_BUCKET}.

    df.write.format("csv").mode("overwrite").save("s3a://${S3_BUCKET}/csv/customer_data.csv")
    

Last update: August 13, 2021