Skip to content

Open Source Spark#

You first obtain an account-specific script from your PrivaceraCloud account, followed by adding a startup step to open source Spark.

Three configurations are available depending on your requirement. Fine-Grained Access Control [FGAC] and Object-Level Access Control [OLAC] are supported in each of the configurations:

Obtain Installation Script#

Obtain the account unique <privacera-plugin-script-download-url>. This script and other commands run in your Spark command shell to complete the PrivaceraCloud installation.

Steps:

  1. Go to Settings > API Key.

  2. Use an existing active API Key or generate a new one.

    Note

    Make sure the Expiry column is set to "Never Expires".

  3. Click the i icon to get the scripts.

  4. On the Plugins Setup Script, click the COPY URL button. Save this value on your Spark server. It is needed as the <privacera-plugin-script-download-url> in the next step.

Configure Privacera Plugin on Local/Virtual Machine#

OLAC Setup#

  1. OLAC is supported only with JWT token authentication.

    Your Dataserver application should be configured with JWT Token support. Create a new Dataserver, if it does not exist. See Data Access Server.

  2. Add the following properties in your Dataserver application to enable JWT authorization.

    privacera.jwt.oauth.enable=true
    privacera.jwt.token.issuer=<PLEASE_CHANGE>
    privacera.jwt.token.publickey=<PLEASE_CHANGE>
    privacera.jwt.token.secret=<PLEASE_CHANGE>
    privacera.jwt.token.subject=<PLEASE_CHANGE>
    privacera.jwt.token.userKey=<PLEASE_CHANGE>
    privacera.jwt.token.groupKey=<PLEASE_CHANGE>
    privacera.jwt.token.parser.type=<PLEASE_CHANGE>
    

    Property Description Example
    privacera.jwt.oauth.enable Property to enable JWT auth in Privacera services. privacera.jwt.oauth.enable=true
    privacera.jwt.token.issuer Property to enter the URL of the identity provider. privacera.jwt.token.issuer=https://you-idp-domain.com
    privacera.jwt.token.publickey The JWT token public key in String format (Need to delete all newlines). -----BEGIN PUBLIC KEY-----MIIBIjANB-----END PUBLIC KEY-----
    privacera.jwt.token.secret [Optional] Add this If the JWT token has been encrypted using secret, use this property to set the secret. privacera.jwt.token.secret=privacera-api
    privacera.jwt.token.subject [Optional] Add this If JWT Token has a subject. privacera.jwt.token.subject=api-token
    privacera.jwt.token.userKey Property to define a unique userKey whose value will be used in user for Ranger policies. client-id
    privacera.jwt.token.groupKey Property to define a unique groupKey whose value will be used in group for Ranger policies. scope
    privacera.jwt.token.parser.type JWT Parser Type. Values can be PING_IDENTITY or KEYCLOAKS.

    PING_IDENTITY: When groupKey is an array
    KEYCLOAKS: When groupKey is space separator
    privacera.jwt.token.parser.type=KEYCLOAKS

    After adding the properties, run the Dataserver, and then proceed to the next step.

  3. SSH to the instance where Spark is installed and you want to install Privacera Plugin.

  4. Create a directory ~/privacera and download the script. Replace <privacera-plugin-script-download-url> with the Privacera Plugin download URL.

    mkdir ~/privacera/spark-plugin-install
    cd ~/privacera/spark-plugin-install
    wget <privacera-plugin-script-download-url> -O privacera_plugin.sh
    
  5. Create a file privacera_env.sh which will contain the parameters required for your plugin installation.

    vi privacera_env.sh
    

    Add the following properties:

    PLUGIN_TYPE="spark"
    SPARK_PLUGIN_TYPE="OLAC"
    SPARK_HOME="<PLEASE_CHANGE>"
    SPARK_CLUSTER_NAME="privacera-spark"
    
    Property Description
    PLUGIN_TYPE Type of Privacera Plugin which you want to install.
    SPARK_PLUGIN_TYPE Spark Plugin type OLAC. JWT Authentication will be enabled by default.
    SPARK_HOME This is the home directory of your Spark installation. For example, the directory path can be /home/user/spark.
    SPARK_CLUSTER_NAME Cluster Name which will show up in the Privacera Ranger Audits page.
  6. Run the script.

    chmod +x privacera_plugin.sh
    ./privacera_plugin.sh
    

    The script will set up the Privacera Plugin in the OLAC mode.

FGAC Setup#

  1. FGAC is recommended to be used with JWT authentication enabled.

    Note

    If JWT authentication is disabled, access control will fall on the system user or proxy user.

  2. SSH to the instance where Spark is installed and you want to install Privacera Plugin.

  3. Create a directory ~/privacera and download the script. Replace <privacera-plugin-script-download-url> with the Privacera Plugin download URL.

    mkdir ~/privacera/spark-plugin-install
    cd ~/privacera/spark-plugin-install
    wget <privacera-plugin-script-download-url> -O privacera_plugin.sh
    
  4. Create a file privacera_env.sh which will contain the parameters required for your plugin installation.

    vi privacera_env.sh
    

    Add the following properties:

    PLUGIN_TYPE="spark"
    SPARK_PLUGIN_TYPE="FGAC"
    SPARK_HOME="<PLEASE_CHANGE>"
    SPARK_CLUSTER_NAME="privacera-spark"
    
    Property Description
    PLUGIN_TYPE Type of Privacera Plugin which you want to install.
    SPARK_PLUGIN_TYPE Spark Plugin type FGAC.
    SPARK_HOME This is the home directory of your Spark installation. For example, the directory path can be /home/user/spark.
    SPARK_CLUSTER_NAME Cluster Name which will show up in the Privacera Ranger Audits page.

    Add the following properties when JWT auth is enabled:

    JWT_OAUTH_ENABLE="true"
    JWT_ISSUER="<PLEASE_CHANGE>"
    JWT_PUBLIC_KEY="<PLEASE_CHANGE>"
    #JWT_SECRET="<PLEASE_CHANGE>"
    #JWT_SUBJECT="<PLEASE_CHANGE>"
    JWT_USERKEY="<PLEASE_CHANGE>"
    JWT_GROUPKEY="<PLEASE_CHANGE>"
    JWT_PARSER_TYPE="<PLEASE_CHANGE>"
    
    Property Description Example
    JWT_OAUTH_ENABLE To enable JWT authentication. JWT_OAUTH_ENABLE="true"
    JWT_ISSUER The URL of the identity provider. JWT_ISSUER="https://your-idp-domain.com"
    JWT_PUBLIC_KEY The JWT token public key in String format.
    JWT_SECRET Uncomment and add value if the JWT token has been encrypted using secret. JWT_SECRET="privacera-secret"
    JWT_SUBJECT Uncomment and add value if JWT Token has a subject. JWT_SUBJECT="api-token"
    JWT_USERKEY Property to define a unique userKey whose value will be used in user for Ranger policies. JWT_USERKEY="client_id"
    JWT_GROUPKEY Property to define a unique groupKey whose value will be used in group for Ranger policies. JWT_GROUPKEY="scope"
    JWT_PARSER_TYPE JWT Parser Type. Values can be PING_IDENTITY or KEYCLOAKS. JWT_PARSER_TYPE="KEYCLOAKS"
  5. Run the script.

    chmod +x privacera_plugin.sh
    ./privacera_plugin.sh
    

    The script will set up the Privacera Plugin in the FGAC mode.

Configure Privacera Plugin in an Existing Docker File#

If you have an existing Open Source Spark setup running on Kubernetes, you can update your existing Docker file used to create Spark image to add steps for installing Privacera Plugin.

OLAC Setup#

  1. OLAC is supported only with JWT token authentication.

    Your Dataserver application should be configured with JWT Token support. Create a new Dataserver, if it does not exist. See Data Access Server.

  2. Add the following properties in your Dataserver application to enable JWT authorization.

    privacera.jwt.oauth.enable=true
    privacera.jwt.token.issuer=<PLEASE_CHANGE>
    privacera.jwt.token.publickey=<PLEASE_CHANGE>
    privacera.jwt.token.secret=<PLEASE_CHANGE>
    privacera.jwt.token.subject=<PLEASE_CHANGE>
    privacera.jwt.token.userKey=<PLEASE_CHANGE>
    privacera.jwt.token.groupKey=<PLEASE_CHANGE>
    privacera.jwt.token.parser.type=<PLEASE_CHANGE>
    

    Property Description Example
    privacera.jwt.oauth.enable Property to enable JWT auth in Privacera services. privacera.jwt.oauth.enable=true
    privacera.jwt.token.issuer Property to enter the URL of the identity provider. privacera.jwt.token.issuer=https://you-idp-domain.com
    privacera.jwt.token.publickey The JWT token public key in String format (Need to delete all newlines). -----BEGIN PUBLIC KEY-----MIIBIjANB-----END PUBLIC KEY-----
    privacera.jwt.token.secret [Optional] Add this If the JWT token has been encrypted using secret, use this property to set the secret. privacera.jwt.token.secret=privacera-api
    privacera.jwt.token.subject [Optional] Add this If JWT Token has a subject. privacera.jwt.token.subject=api-token
    privacera.jwt.token.userKey Property to define a unique userKey whose value will be used in user for Ranger policies. client-id
    privacera.jwt.token.groupKey Property to define a unique groupKey whose value will be used in group for Ranger policies. scope
    privacera.jwt.token.parser.type JWT Parser Type. Values can be PING_IDENTITY or KEYCLOAKS.
    PING_IDENTITY: When groupKey is an array
    KEYCLOAKS: When groupKey is space separator
    privacera.jwt.token.parser.type=KEYCLOAKS

    After adding the properties, run the Dataserver, and then proceed to the next step.

  3. SSH to the instance where Spark is installed and you want to install Privacera Plugin.

  4. Copy the following to your Docker file. Set the PCLOUD_PLUGIN_SCRIPT_DOWNLOAD_URL property. To get the Privacera Plugin download URL, from Obtain Installation Script.

    ######## Install Privacera Spark Plugin Start ###########
    
    # ENV SPARK_HOME /opt/apache/spark
    RUN apt-get -y install zip unzip wget
    ENV PCLOUD_PLUGIN_SCRIPT_DOWNLOAD_URL="<PLEASE_CHANGE>"
    ENV PLUGIN_TYPE="spark"
    ENV SPARK_PLUGIN_TYPE="OLAC"
    ENV SPARK_CLUSTER_NAME="privacera-spark"
    RUN echo "Downloading Script from $PCLOUD_PLUGIN_SCRIPT_DOWNLOAD_URL"
    RUN wget ${PCLOUD_PLUGIN_SCRIPT_DOWNLOAD_URL} -O privacera_plugin.sh
    RUN chmod +x privacera_plugin.sh
    RUN ./privacera_plugin.sh
    
    ######## Install Privacera Spark Plugin End ###########
    
  5. Save the Docker file and build the image. You will now have a Docker image for Open Source Spark With Privacera Plugin enabled.

FGAC Setup#

  1. FGAC is recommended to be used with JWT authentication enabled.

    Note

    If JWT authentication is disabled, access control will fall on the system user or proxy user.

  2. SSH to the instance where Spark is installed and you want to install Privacera Plugin.

  3. Copy the following to your Docker file. Set the PCLOUD_PLUGIN_SCRIPT_DOWNLOAD_URL property. To get the Privacera Plugin download URL, from Obtain Installation Script. And for the JWT properties, refer the table below.

    ######## Install Privacera Spark Plugin Start ###########
    
    # ENV SPARK_HOME /opt/apache/spark
    RUN apt-get -y install zip unzip wget
    ENV PCLOUD_PLUGIN_SCRIPT_DOWNLOAD_URL="<PLEASE_CHANGE>"
    ENV PLUGIN_TYPE="spark"
    ENV SPARK_PLUGIN_TYPE="FGAC"
    ENV SPARK_CLUSTER_NAME="privacera-spark"
    ENV JWT_OAUTH_ENABLE "true"
    ENV JWT_ISSUER=<PLEASE_CHANGE>
    ENV JWT_PUBLIC_KEY=<PLEASE_CHANGE>
    ENV JWT_SECRET=<PLEASE_CHANGE>
    ENV JWT_SUBJECT=<PLEASE_CHANGE>
    ENV JWT_USERKEY=<PLEASE_CHANGE>
    ENV JWT_GROUPKEY=<PLEASE_CHANGE>
    ENV JWT_PARSER_TYPE=<PLEASE_CHANGE>
    RUN echo "Downloading Script from $PCLOUD_PLUGIN_SCRIPT_DOWNLOAD_URL"
    RUN wget ${PCLOUD_PLUGIN_SCRIPT_DOWNLOAD_URL} -O privacera_plugin.sh
    RUN chmod +x privacera_plugin.sh
    RUN ./privacera_plugin.sh
    
    ######## Install Privacera Spark Plugin End ###########
    
    Property Description Example
    JWT_OAUTH_ENABLE To enable JWT authentication. JWT_OAUTH_ENABLE="true"
    JWT_ISSUER The URL of the identity provider. JWT_ISSUER="https://your-idp-domain.com"
    JWT_PUBLIC_KEY The JWT token public key in String format.
    JWT_SECRET Uncomment and add value if the JWT token has been encrypted using secret. JWT_SECRET="privacera-secret"
    JWT_SUBJECT Uncomment and add value if JWT Token has a subject. JWT_SUBJECT="api-token"
    JWT_USERKEY Property to define a unique userKey whose value will be used in user for Ranger policies. JWT_USERKEY="client_id"
    JWT_GROUPKEY Property to define a unique groupKey whose value will be used in group for Ranger policies. JWT_GROUPKEY="scope"
    JWT_PARSER_TYPE JWT Parser Type. Values can be PING_IDENTITY or KEYCLOAKS. JWT_PARSER_TYPE="KEYCLOAKS"
  4. Save the Docker file and build the image. You will now have a Docker image for Open Source Spark With Privacera Plugin enabled.

Configure Privacera Plugin using Privacera Scripts#

The scripts will help you create an Open Source Spark image with Privacera Plugin and push it to the specified Docker Hub which can be used to run Spark with Privacera.

OLAC Setup#

  1. OLAC is supported only with JWT token authentication.

    Your Dataserver application should be configured with JWT Token support. Create a new Dataserver, if it does not exist. See Data Access Server.

  2. Add the following properties in your Dataserver application to enable JWT authorization.

    privacera.jwt.oauth.enable=true
    privacera.jwt.token.issuer=<PLEASE_CHANGE>
    privacera.jwt.token.publickey=<PLEASE_CHANGE>
    privacera.jwt.token.secret=<PLEASE_CHANGE>
    privacera.jwt.token.subject=<PLEASE_CHANGE>
    privacera.jwt.token.userKey=<PLEASE_CHANGE>
    privacera.jwt.token.groupKey=<PLEASE_CHANGE>
    privacera.jwt.token.parser.type=<PLEASE_CHANGE>
    

    Property Description Example
    privacera.jwt.oauth.enable Property to enable JWT auth in Privacera services. privacera.jwt.oauth.enable=true
    privacera.jwt.token.issuer Property to enter the URL of the identity provider. privacera.jwt.token.issuer=https://you-idp-domain.com
    privacera.jwt.token.publickey The JWT token public key in String format (Need to delete all newlines). -----BEGIN PUBLIC KEY-----MIIBIjANB-----END PUBLIC KEY-----
    privacera.jwt.token.secret [Optional] Add this If the JWT token has been encrypted using secret, use this property to set the secret. privacera.jwt.token.secret=privacera-api
    privacera.jwt.token.subject [Optional] Add this If JWT Token has a subject. privacera.jwt.token.subject=api-token
    privacera.jwt.token.userKey Property to define a unique userKey whose value will be used in user for Ranger policies. client-id
    privacera.jwt.token.groupKey Property to define a unique groupKey whose value will be used in group for Ranger policies. scope
    privacera.jwt.token.parser.type JWT Parser Type. Values can be PING_IDENTITY or KEYCLOAKS.
    PING_IDENTITY: When groupKey is an array
    KEYCLOAKS: When groupKey is space separator
    privacera.jwt.token.parser.type=KEYCLOAKS

    After adding the properties, run the Dataserver, and then proceed to the next step.

  3. SSH to the instance where you want to install Privacera Plugin.

  4. Create a directory ~/privacera and download the script. Replace <privacera-plugin-script-download-url> with the Privacera Plugin download URL.

    mkdir ~/privacera/spark-plugin-install
    cd ~/privacera/spark-plugin-install
    wget <privacera-plugin-script-download-url> -O privacera_plugin.sh
    
  5. Create a file privacera_env.sh which will contain the parameters required for your plugin installation.

    vi privacera_env.sh
    

    Add the following properties:

    PLUGIN_TYPE="spark_k8s"
    SPARK_PLUGIN_TYPE="OLAC"
    HUB="<PLEASE_CHANGE>"
    HUB_USERNAME="<PLEASE_CHANGE>"
    HUB_PASSWORD="<PLEASE_CHANGE>"
    ENV_TAG="<PLEASE_CHANGE>"
    
    Property Description
    PLUGIN_TYPE Type of Privacera Plugin which you want to install.
    SPARK_PLUGIN_TYPE Spark Plugin type OLAC. JWT Authentication will be enabled by default.
    HUB The Docker hub URL where you want the image to be pushed.
    HUB_USERNAME Docker hub username.
    HUB_PASSWORD Docker hub password.
    ENV_TAG Docker image tag.
  6. Run the script.

    chmod +x privacera_plugin.sh
    ./privacera_plugin.sh
    

    The script will build the Spark image with Privacera Spark plugin and publish it to the Docker hub.

FGAC Setup#

  1. FGAC is recommended to be used with JWT authentication enabled.

    Note

    If JWT authentication is disabled, access control will fall on the system user or proxy user.

  2. SSH to the instance where you want to install Privacera Plugin.

  3. Create a directory ~/privacera and download the script. Replace <privacera-plugin-script-download-url> with the Privacera Plugin download URL.

    mkdir ~/privacera/spark-plugin-install
    cd ~/privacera/spark-plugin-install
    wget <privacera-plugin-script-download-url> -O privacera_plugin.sh
    
  4. Create a file privacera_env.sh which will contain the parameters required for your plugin installation.

    vi privacera_env.sh
    

    Add the following properties:

    PLUGIN_TYPE="spark_k8s"
    SPARK_PLUGIN_TYPE="FGAC"
    SPARK_HOME="<PLEASE_CHANGE>"
    SPARK_CLUSTER_NAME="privacera-spark"
    
    Property Description
    PLUGIN_TYPE Type of Privacera Plugin which you want to install.
    SPARK_PLUGIN_TYPE Spark Plugin type FGAC.
    SPARK_HOME This is the home directory of your Spark installation. For example, the directory path can be /home/user/spark.
    SPARK_CLUSTER_NAME Cluster Name which will show up in the Privacera Ranger Audits page.

    Add the following properties when JWT auth is enabled:

    JWT_OAUTH_ENABLE="true"
    JWT_ISSUER="<PLEASE_CHANGE>"
    JWT_PUBLIC_KEY="<PLEASE_CHANGE>"
    #JWT_SECRET="<PLEASE_CHANGE>"
    #JWT_SUBJECT="<PLEASE_CHANGE>"
    JWT_USERKEY="<PLEASE_CHANGE>"
    JWT_GROUPKEY="<PLEASE_CHANGE>"
    JWT_PARSER_TYPE="<PLEASE_CHANGE>"
    
    Property Description Example
    JWT_OAUTH_ENABLE To enable JWT authentication. JWT_OAUTH_ENABLE="true"
    JWT_ISSUER The URL of the identity provider. JWT_ISSUER="https://your-idp-domain.com"
    JWT_PUBLIC_KEY The JWT token public key in String format.
    JWT_SECRET Uncomment and add value if the JWT token has been encrypted using secret. JWT_SECRET="privacera-secret"
    JWT_SUBJECT Uncomment and add value if JWT Token has a subject. JWT_SUBJECT="api-token"
    JWT_USERKEY Property to define a unique userKey whose value will be used in user for Ranger policies. JWT_USERKEY="client_id"
    JWT_GROUPKEY Property to define a unique groupKey whose value will be used in group for Ranger policies. JWT_GROUPKEY="scope"
    JWT_PARSER_TYPE JWT Parser Type. Values can be PING_IDENTITY or KEYCLOAKS. JWT_PARSER_TYPE="KEYCLOAKS"

    Add the following Docker Hub properties:

    HUB="<PLEASE_CHANGE>"
    HUB_USERNAME="<PLEASE_CHANGE>"
    HUB_PASSWORD="<PLEASE_CHANGE>"
    ENV_TAG="<PLEASE_CHANGE>"
    
    Property Description
    HUB The Docker hub URL where you want the image to be pushed.
    HUB_USERNAME Docker hub username.
    HUB_PASSWORD Docker hub password.
    ENV_TAG Docker image tag.
  5. Run the script.

    chmod +x privacera_plugin.sh
    ./privacera_plugin.sh
    

    The script will build the Spark image with Privacera Spark plugin and publish it to the Docker hub.

Deploy Spark on EKS Cluster#

  1. SSH to the instance where you want to deploy Spark on the EKS cluster.

  2. Get the Privacera Plugin download URL and set it in the following property. See Obtain Installation Script.

    export PRIVACERA_DOWNLOAD_URL="<PLEASE_CHANGE>"
    
  3. Create spark-k8s-artifacts folder.

    mkdir ~/privacera/spark-k8s-artifacts
    cd ~/privacera/spark-k8s-artifacts
    
  4. Download and extract packages.

    wget ${PRIVACERA_DOWNLOAD_URL}/plugin/spark/k8s-spark-deploy.tar.gz -O k8s-spark-deploy.tar.gz
    tar xzf k8s-spark-deploy.tar.gz
    rm -r k8s-spark-deploy.tar.gz
    cd k8s-spark-deploy/
    
  5. Open penv.sh file and substitute the values of the following properties. Refer to the table below:

    Property Description Example
    SPARK_NAME_SPACE Kubernetes namespace privacera-spark-plugin-test
    SPARK_PLUGIN_IMAGE Docker image with hub ${HUB}/privacera-spark-plugin:${ENV_TAG}
    SPARK_DOCKER_PULL_SECRET Secret for docker-registry spark-plugin-docker-hub
    SPARK_PLUGIN_ROLE_BINDING Spark role Binding privacera-sa-spark-plugin-role-binding
    SPARK_PLUGIN_SERVICE_ACCOUNT Spark services account privacera-sa-spark-plugin
    SPARK_PLUGN_ROLE Spark services account role privacera-sa-spark-plugin-role
    SPARK_PLUGIN_APP_NAME Spark plugin application name privacera-spark-examples
  6. Run the following command to replace the property values in EKS deployment YAML file.

    mkdir -p backup
    cp *.yml backup/
    ./replace.sh
    
  7. Run the following command to create EKS resources.

    kubectl apply -f namespace.yml 
    kubectl apply -f service-account.yml 
    kubectl apply -f role.yml
    kubectl apply -f role-binding.yml
    
  8. Run the following command to create secret for docker-registry.

    kubectl create secret docker-registry spark-plugin-docker-hub --docker-server=<PLEASE_CHANGE> --docker-username=<PLEASE_CHANGE>  --docker-password='<PLEASE_CHANGE>' --namespace=<PLEASE_CHANGE>
    
  9. Run the following command to deploy a sample Spark application. Replace ${SPARK_NAME_SPACE} with the Kubernetes namespace.

    kubectl apply -f privacera-spark-examples.yml -n ${SPARK_NAME_SPACE}
    

    Note

    This is a sample file used for deployment. As per your use case, you can create a Spark deployment file and deploy a Docker image.

    This will deploy a Spark application in EKS pod with Privacera plugin and it will keep the pod running, so that you can use it in interactive mode.

Validation#

To validate your Spark deployment, refer to Privacera Plugin in Spark on EKS - Validation


Last update: February 22, 2022