Connect Spark standalone to Privacera Platform using the Privacera plugin

You can use Privacera Manager to generate the setup script and Spark custom configuration for SSL/TSL to install Privacera Plugin in an open-source Spark environment.

Note

This topic is only applicable to Spark version 3.x .

Prerequisites

Ensure the following prerequisites are met:

A working Spark environment.
Privacera services must be up and running.

Configuration

SSH to the instance as USER.

Run the following commands.

cd ~/privacera/privacera-manager
cp config/sample-vars/vars.spark-standalone.yml config/custom-vars/
vi config/custom-vars/vars.spark-standalone.yml

Edit the following properties. For property details and description, refer to the Configuration Properties below.

SPARK_STANDALONE_ENABLE:"true"
SPARK_ENV_TYPE:"<PLEASE_CHANGE>"
SPARK_HOME:"<PLEASE_CHANGE>"
SPARK_USER_HOME:"<PLEASE_CHANGE>"

Run the following commands.
```
cd ~/privacera/privacera-manager
./privacera-manager.sh update
```
After the update is complete, the setup script (privacera_setup.sh, standalone_spark_FGAC.sh, standalone_spark_OLAC.sh) and Spark custom configurations (spark_custom_conf.zip) for SSL will be generated at the path, cd ~/privacera/privacera-manager/output/spark-standalone.
You can either enable FGAC or OLAC in your Spark environment.
Enable FGAC
To enable Fine-grained access control (FGAC), do the following:
1. Copy standalone_spark_FGAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.
2. Add permissions to execute the script.
```
chmod +x standalone_spark_FGAC.sh
```
3. Run the script to install the Privacera plugin in your Spark environment.
```
./standalone_spark_FGAC.sh
```
Enable OLAC
To enable Object level access control (OLAC), do the following:
1. Copy standalone_spark_OLAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.
2. Add permissions to execute the script.
```
chmod +x standalone_spark_OLAC.sh
```
3. Run the script to install the Privacera plugin in your Spark environment.
```
./standalone_spark_OLAC.sh
```

Configuration properties

Property	Description	Example
`SPARK_STANDALONE_ENABLE`	Property to enable generating setup script and configs for Spark standalone plugin installation.	true
`SPARK_ENV_TYPE`	Set the environment type. It can be any user-defined type. For example, if you're working in an environment that runs locally, you can set the type as local; for a production environment, set it as prod.	local
`SPARK_HOME`	Home path of your Spark installation.	~/privacera/spark/spark-3.1.1-bin-hadoop3.2
`SPARK_USER_HOME`	User home directory of your Spark installation.	/home/ec2-user
`SPARK_STANDALONE_RANGER_IS_FALLBACK_SUPPORTED`	Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user. To enable the fallback, set to true; to disable, set to false.	true

Validations

To verify the successful installation of Privacera plugin, do the following:

Create an S3 bucket ${S3_BUCKET} for sample testing.
Download sample data using the following link and put it in the ${S3_BUCKET} at location (s3://${S3_BUCKET}/customer_data).
```
wget https://privacera-demo.s3.amazonaws.com/data/uploads/customer_data_clear/customer_data_without_header.csv
```

(Optional) Add AWS JARS in Spark. Download the JARS according to the version of Spark Hadoop in your environment.

cd  <SPARK_HOME>/jars

For Spark-3.1.1 - Hadoop 3.2 version,

wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar

Run the following command.
```
cd <SPARK_HOME>/bin
```
Run the spark-shell to execute scala commands.
```
./spark-shell
```

Validations with JWT Token

Run the following command.
```
cd <SPARK_HOME>/bin
```
Set the JWT_TOKEN.
```
JWT_TOKEN="<JWT_TOKEN>"
```

Run the following command to start spark-shell with parameters.

./spark-shell --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"  --conf "spark.hadoop.privacera.jwt.oauth.enable=true"

Validations with JWT token and public key

Create a local file with the public key, if the JWT token is generated by private/public key combination.

Set the following according to the payload of JWT Token.

JWT_TOKEN="<JWT_TOKEN>"
#The following variables are optional, set it only if token has it else set it empty
JWT_TOKEN_ISSUER="<JWT_TOKEN_ISSUER>"
JWT_TOKEN_PUBLIC_KEY_FILE="<JWT_TOKEN_PUBLIC_KEY_FILE_PATH>"
JWT_TOKEN_USER_KEY="<JWT_TOKEN_USER_KEY>"
JWT_TOKEN_GROUP_KEY="<JWT_TOKEN_GROUP_KEY>"
JWT_TOKEN_PARSER_TYPE="<JWT_TOKEN_PARSER_TYPE>"

Run the following command to start spark-shell with parameters.

./spark-shell 
--conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" 
--conf "spark.hadoop.privacera.jwt.oauth.enable=true" 
--conf "spark.hadoop.privacera.jwt.token.publickey=${JWT_TOKEN_PUBLIC_KEY_FILE}" 
--conf "spark.hadoop.privacera.jwt.token.issuer=${JWT_TOKEN_ISSUER}"
--conf "spark.hadoop.privacera.jwt.token.parser.type=${JWT_TOKEN_PARSER_TYPE}" 
--conf "spark.hadoop.privacera.jwt.token.userKey=${JWT_TOKEN_USER_KEY}" 
--conf "spark.hadoop.privacera.jwt.token.groupKey=${JWT_TOKEN_GROUP_KEY}"

Use cases

Add a policy in Access Manager with read permission to ${S3_BUCKET}.

val file_path = "s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv"
val df=spark.read.csv(file_path)
df.show(5)

Add a policy in Access Manager with delete and write permission to ${S3_BUCKET}.

df.write.format("csv").mode("overwrite").save("s3a://${S3_BUCKET}/csv/customer_data.csv")

Privacera Documentation

Table of ContentsTable of Contents