Connect Amazon EKS to Privacera Platform using Privacera plugin

You can use Privacera Manager to generate the setup script and Spark custom configuration for SSL to install the Privacera plugin in Spark on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

Prerequisites

You are running Spark on an EKS cluster.
Privacera services are up and running.
Authenticate Privacera Platform services using JSON Web Tokens

Procedure

SSH to the instance as USER.

Run the following commands.

cd ~/privacera/privacera-manager
cp config/sample-vars/vars.spark-standalone.yml config/custom-vars/
vi config/custom-vars/vars.spark-standalone.yml

Edit the following properties. For property details and description, see Spark configuration properties.

SPARK_STANDALONE_ENABLE:"true"
SPARK_ENV_TYPE:"<PLEASE_CHANGE>"
SPARK_HOME:"<PLEASE_CHANGE>"
SPARK_USER_HOME:"<PLEASE_CHANGE>"

Run the following commands:
```
cd ~/privacera/privacera-manager
./privacera-manager.sh update
```
After the update is complete, the Spark custom configuration (spark_custom_conf.zip) for SSL will be generated at the path, cd ~/privacera/privacera-manager/output/spark-standalone.
Create the Spark Docker Image
1. Run the following commands to export PRIVACERA_BASE_DOWNLOAD_URL:
```
export PRIVACERA_BASE_DOWNLOAD_URL=<PRIVACERA_BASE_DOWNLOAD_URL>
```
2. Create a folder.
```
mkdir -p ~/privacera-spark-plugin
cd ~/privacera-spark-plugin
```
3. Download and extract package using wget.
```
wget ${PRIVACERA_BASE_DOWNLOAD_URL}/spark-plugin/k8s-spark-pkg.tar.gz -O k8s-spark-pkg.tar.gz
tar xzf k8s-spark-pkg.tar.gz
rm -r k8s-spark-pkg.tar.gz
```
4. Copy spark_custom_conf.zip file from the Privacera Manager output folder into the files folder.
```
cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf.zip files/spark_custom_conf.zip
```
5. Run the following commands to export SPARK_VERSION
```
export SPARK_VERSION=<SPARK_VERSION>
```
  Note
  Specify the Apache Spark version, such as 3.1.2, 3.2.2, or 3.3.0.
  The default version is 3.1.2.
6. You can either built OLAC Docker image or FGAC Docker image.
  OLAC
  To built the OLAC Docker image, use the following command:
```
./build_image.sh ${PRIVACERA_BASE_DOWNLOAD_URL} OLAC
```
  FGAC
  To built the FGAC Docker image, use the following command:
```
./build_image.sh ${PRIVACERA_BASE_DOWNLOAD_URL} FGAC
```

Test the Spark Docker image.

Create a S3 bucket ${S3_BUCKET} for sample testing.
Download sample data using the following link and put it in the ${S3_BUCKET} at location s3://${S3_BUCKET}/customer_data.
```
wget https://privacera-demo.s3.amazonaws.com/data/uploads/customer_data_clear/customer_data_without_header.csv
```

Start Docker in an interactive mode.

IMAGE=privacera-spark-plugin:latest
docker run  --rm -i -t ${IMAGE} bash

Start spark-shell inside the Docker container.

JWT_TOKEN="<PLEASE_CHANGE>"
cd /opt/privacera/spark/bin
./spark-shell \
--conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"\
--conf "spark.hadoop.privacera.jwt.oauth.enable=true"

Run the following command to read the S3 file:

val df= spark.read.csv("s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv")

Exit the Docker shell.
```
exit
```

Publish the Spark Docker Image into your Docker Registry.
- For HUB, HUB_USERNAME, and HUB_PASSWORD, use the Docker hub URL and login credentials.
- For ENV_TAG, its value can be user-defined depending on your deployment environment such as development, production or test. For example, ENV_TAG=dev can be used for a development environment.
```
HUB=<PLEASE_CHANGE>
HUB_USERNAME=<PLEASE_CHANGE>
HUB_PASSWORD=<PLEASE_CHANGE>
ENV_TAG=<PLEASE_CHANGE>
DEST_IMAGE=${HUB}/privacera-spark-plugin:${ENV_TAG}
SOURCE_IMAGE=privacera-spark-plugin:latest
docker login -u ${HUB_USERNAME} -p ${HUB_PASSWORD}${HUB}
docker tag ${SOURCE_IMAGE}${DEST_IMAGE}
docker push ${DEST_IMAGE}
```

Deploy the Spark Plugin on your EKS cluster.

SSH to the EKS cluster where you want to deploy Spark.

Run the following commands to export PRIVACERA_BASE_DOWNLOAD_URL:

export PRIVACERA_BASE_DOWNLOAD_URL=<PRIVACERA_BASE_DOWNLOAD_URL>

Create a folder.

mkdir ~/privacera-spark-plugin
cd ~/privacera-spark-plugin

Download and extract package using wget.

wget ${PRIVACERA_DOWNLOAD_URL}/spark-plugin/k8s-spark-deploy.tar.gz -O k8s-spark-deploy.tar.gzwget ${PRIVACERA_DOWNLOAD_URL}/plugin/spark/k8s-spark-deploy.tar.gz -O k8s-spark-deploy.tar.gz
tar xzf k8s-spark-deploy.tar.gz
rm -r k8s-spark-deploy.tar.gz
cd k8s-spark-deploy/

Open penv.sh and substitute the values of the following properties:

Property	Description	Example
`SPARK_NAME_SPACE`	Kubernetes namespace	privacera-spark-plugin-test
`SPARK_PLUGIN_ROLE_BINDING`	Spark role Binding	privacera-sa-spark-plugin-role-binding
`SPARK_PLUGIN_SERVICE_ACCOUNT`	Spark services account	privacera-sa-spark-plugin
`SPARK_PLUGN_ROLE`	Spark services account role	privacera-sa-spark-plugin-role
`SPARK_PLUGIN_APP_NAME`	Spark services account role	privacera-sa-spark-plugin-role
`SPARK_PLUGIN_IMAGE`	Docker image with hub	myhub.docker.com}/privacera-spark-plugin:prod-olac
`SPARK_DOCKER_PULL_SECRET`	Secret for docker-registry	spark-plugin-docker-hub

Run the following command to replace the properties value in the Kubernetes deployment .yml file:
```
mkdir -p backup
cp *.yml backup/
./replace.sh
```

Run the following command to create the Kubernetes resources:

kubectl apply -f namespace.yml
kubectl apply -f service-account.yml
kubectl apply -f role.yml
kubectl apply -f role-binding.yml

Run the following command to create the secret for docker-registry:

kubectl create secret docker-registry spark-plugin-docker-hub --docker-server=<PLEASE_CHANGE> --docker-username=<PLEASE_CHANGE>  --docker-password='<PLEASE_CHANGE>' --namespace=<PLEASE_CHANGE>

Run the following commands to export SPARK_NAME_SPACE.
```
export SPARK_NAME_SPACE=<PLEASE_CHANGE>
```
Run the following command to deploy a sample Spark application:
Note
This is an sample file used for deployment. As per your use case, you can create Spark deployment file and deploy a Docker image.
```
kubectl apply -f privacera-spark-examples.yml -n ${SPARK_NAME_SPACE}
```
This will deploy spark application in Kubernetes pod with Privacera plugin and it will keep the pod running, so that you can use it in interactive mode.

Spark configuration properties

Property	Description	Example
`SPARK_STANDALONE_ENABLE`	Property to enable generating setup script and configs for Spark standalone plugin installation.	`true`
`SPARK_ENV_TYPE`	Set the environment type. It can be any user-defined type. For example, if you're working in an environment that runs locally, you can set the type as local; for a production environment, set it as prod.	`local`
`SPARK_HOME`	Home path of your Spark installation.	`~/privacera/spark/spark-3.1.1-bin-hadoop3.2`
`SPARK_USER_HOME`	User home directory of your Spark installation.	`/`home/ec2-user
`SPARK_STANDALONE_RANGER_IS_FALLBACK_SUPPORTED`	Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user. To enable the fallback, set to true; to disable, set to false.	`true`

Spark validation

Get all the resources.
```
kubectl get all -n ${SPARK_NAME_SPACE}
```
Copy POD ID that you will need for spark-master connection.
Get the cluster info.
```
kubectl cluster-info
```
Copy Kubernetes control plane URL from the above output that we need during spark-shell command, for example ( https://xxxxxxxxxxxxxxxxxxxxxxx.yl4.us-east-1.eks.amazonaws.com).
Note
When using the URL for EKS_SERVER, prefix the property value with k8s://. The following is an example of the property:
```
EKS_SERVER="k8s://https://xxxxxxxxxxxxxxxxxxxxxxx.yl4.us-east-1.eks.amazonaws.com"
```

Connect to Kubernetes master node.

kubectl -n ${SPARK_NAME_SPACE}exec -it  <POD_ID>  -- bash

Set the following properties:

SPARK_NAME_SPACE="<PLEASE_CHANGE>"
SPARK_PLUGIN_SERVICE_ACCOUNT="<PLEASE_CHANGE>"
SPARK_PLUGIN_IMAGE="<PLEASE_CHANGE>"
SPARK_DOCKER_PULL_SECRET="spark-plugin-docker-hub"
EKS_SERVER="<PLEASE_CHANGE>"
JWT_TOKEN="<PLEASE_CHANGE>"

Run the following commands to open spark-shell. The command contains all the setup which is required to open the spark-shell.

cd /opt/privacera/spark/bin
./spark-shell --master ${EKS_SERVER}\
--deploy-mode client \
--conf spark.kubernetes.authenticate.serviceAccountName=${SPARK_PLUGIN_SERVICE_ACCOUNT}\
--conf spark.kubernetes.namespace=${SPARK_NAME_SPACE}\
--conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
--conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${SPARK_PLUGIN_SERVICE_ACCOUNT}\
--conf spark.kubernetes.container.image=${SPARK_PLUGIN_IMAGE}\
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.container.image.pullSecrets=${SPARK_DOCKER_PULL_SECRET}\
--conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"\
--conf "spark.hadoop.privacera.jwt.oauth.enable=true"\
--conf spark.driver.bindAddress='0.0.0.0'\
--conf spark.driver.host=$SPARK_PLUGIN_POD_IP\
--conf spark.port.maxRetries=4\
--conf spark.kubernetes.driver.pod.name=$SPARK_PLUGIN_POD_NAME

Run the following command using spark-submit with JWT authentication.

./spark-submit \
--master ${EKS_SERVER}\
--name spark-cloud-new \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.serviceAccountName=${SPARK_PLUGIN_SERVICE_ACCOUNT}\
--conf spark.kubernetes.namespace=${SPARK_NAME_SPACE}\
--conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
--conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${SPARK_PLUGIN_SERVICE_ACCOUNT}\
--conf spark.kubernetes.container.image=${SPARK_PLUGIN_IMAGE}\
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.container.image.pullSecrets=${SPARK_DOCKER_PULL_SECRET}\
--conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"\
--conf spark.driver.bindAddress='0.0.0.0'\
--conf spark.driver.host=$SPARK_PLUGIN_POD_IP\
--conf spark.port.maxRetries=4\
--conf spark.kubernetes.driver.pod.name=$SPARK_PLUGIN_POD_NAME\
--class com.privacera.spark.poc.SparkSample \
<your-code-jar/file>

To check the read access on the S3 file, run the following command in the open spark-shell:

val df= spark.read.csv("s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv")
df.show()

To check the write access on the S3 file, run the following command in the open spark-shell:
```
df.write.format("csv").mode("overwrite").save("s3a://${S3_BUCKET}/output/k8s/sample/csv")
```
Check the Audit logs on the Privacera Portal.
To verify the spark-shell setup, open another SSH connection for Kubernetes cluster and run the following command to check the running pods:
```
kubectl get pods -n ${SPARK_NAME_SPACE}
```
You will see the spark executor pods -exec-x. For example, spark-shell-xxxxxxxxxxxxxxxx-exec-1 and spark-shell-xxxxxxxxxxxxxxxx-exec-2.

Privacera Documentation

Table of ContentsTable of Contents

Connect Amazon EKS to Privacera Platform using Privacera plugin

Note

Note

Spark configuration properties

Spark validation

Note