Skip to main content

Privacera Platform master publication

StreamSets Data Collector (SDC) and Privacera Encryption

:

This topic provides instruction on how to install and configure the Privacera StreamSets plugin for Ranger and Privacera Encryption.

Enable Encryption for SDC

To enable Privacera Encryption for the StreamSets Data Collector (SDC), do the following:

  1. Run the following command:

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.crypto.streamset.yml custom-vars/vars.crypto.streamset.yml
    
  2. Update Privacera Manager:

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
    
Configure Encryption for SDC
  1. Copy the StreamSets Privacera package.

    1. If you have StreamSets and Privacera Manager running on different systems, copy the following two files from ~/privacera/privacera-manager/output/streamset/ on the Privacera Manager host machine:

      • privacera-streamset.tar.gz

      • crypto-config

      If you have JCEKS enabled, copy the following file from the location, ~/privacera/privacera-manager/config/keystores/ of the Privacera Manager host machine:

      • cryptoprop.jceks

    2. If you have StreamSets and Privacera Manager running on same system, do the following:

      cp ~/privacera/privacera-manager/output/streamset/privacera-streamset.tar.gz ~/privacera/downloads
      cp -r ~/privacera/privacera-manager/output/streamset/crypto-config ~/privacera/downloads/crypto-config
      

      If you have JCEKS enabled, do the following:

      cp ~/privacera/privacera-manager/config/keystores/cryptoprop.jceks ~/privacera/downloads/crypto-config/
      
  2. Extract the StreamSets Privacera package.

    cd ~/privacera/downloads
    mkdir streamsets
    tar xfz ~/privacera/downloads/privacera-streamset.tar.gz -C streamsets
    
  3. Access the StreamSets installation directory as root user.

    sudo su
    
  4. Set the StreamSets installation directory.

    export STREAMSET_HOME=/opt/streamset/streamsets-datacollector-3.13.0
    
  5. Copy the Privacera library into the StreamSets data collector user-libs directory:

    cp -r streamsets/privacera-streamset/ $<STREAMSET_HOME>/user-libs/
    
  6. Copy the configuration files.

    cp -r crypto-config $<STREAMSET_HOME>/../crypto-config
    
  7. Define a security policy.

    cat << EOF >> $<STREAMSET_HOME>/etc/sdc-security.policy 
    grant <
    permission java.io.FilePermission "/opt/privacera/-", "read";
    permission java.io.FilePermission "/opt/streamset/-", "read,write";
    permission java.net.SocketPermission "*", "connect,accept,listen,resolve";
    >;
    EOF                              
  8. Stop StreamSets.

    kill -9 $(ps aux | grep 'sdc'| awk '<print $2>')
  9. Restart StreamSets.

    ulimit -n 32768
    nohup $<STREAMSET_HOME>/bin/streamsets dc &
    
  10. Verify the logs to make sure that StreamSets is running.

    tail -f $<STREAMSET_HOME>/log/sdc.log
    
Verify StreamSets setup

To verify that Privacera Encryption is now working with the StreamSets Data Collector (SDC), follow these steps:

  1. Configure a sample pipeline to encrypt a local file. You can use the following sample. Import this sample pipeline into StreamSets. For more information, see Sample pipeline.

  2. Access the StreamSets installation directory as root user.

    sudo su
    
  3. Create data directories.

    DATA_DIR=/opt/streamset/
    cd $<DATA_DIR>
    mkdir -p customer_data/input 
    mkdir -p customer_data/output
    mkdir -p customer_data/input_error
    mkdir -p customer_data/output/encrypted_error
    
  4. Create a sample data file:

    cat << EOF > customer_data/input/customer_data_with_header.csv 
    id,name,ssn,email_address,amount
    1,Tamara,898453744,aphillips@vang.info,162454.67
    2,Richard,65511350,vreynolds@gmail.com,602.89
    3,Tanya,634090950,harringtonwilliam@diaz-king.com,48712.67
    4,Richard,829439881,martinvalerie@yahoo.com,5122.02
    5,Raymond,227804351,sarachavez@yahoo.com,97963.857
    6,Melissa,553465892,kevinwillis@gmail.com,36654.806
    7,Deborah,782539839,brittney24@yahoo.com,19.231
    8,Rodney,515337130,jenniferkelly@davis-bond.biz,65083.651
    9,Katherine,137057143,jperkins@gmail.com,4822.343
    10,David,432941241,wmccann@hotmail.com,4069.34
    EOF
  5. Create a metadata file to map the input dataset columns to Privacera Encryption schema columns:

    cat << EOF > customer_data/customer_data.meta
    COLUMN_NAME|SCHEME_NAME
    id|
    name|SYSTEM_PERSON_NAME
    ssn|SYSTEM_SSN
    email_address|SYSTEM_EMAIL
    amount|
    EOF

    To run the sample pipeline, make sure you have the Privacera user created in your Ranger and it has permissions on the KMS keys starting with pmsk*.

Add permission for keys in Ranger
  1. Log in to the Ranger UI as an administrator and create the Privacera user. You can grant permissions to the Privacera user on keys.

  2. Log in to Ranger with keyadmin credentials and click on privacera_kms.

  3. Create or update policy for Privacera user.

  4. Run the StreamSets pipeline preview and verify the encrypted value on the right side of the table.