Skip to content

Streamsets#

Described here is how to install and configure the Streamsets plugin for Privacera Encryption and Ranger.

Prerequisites#

You should already have a working Streamsets installation.

Privacera Encryption in Streamsets Data Collector (SDC)#

Enable Encryption for SDC#

  1. Run the following command:

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.crypto.streamset.yml custom-vars/vars.crypto.streamset.yml
    
  2. Run the update.

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
    

Configure Encryption for SDC#

  1. Copy the Streamsets Privacera package.

    1. If you have Streamsets and Privacera Manager running on different systems, copy the following two files from the location, ~/privacera/privacera-manager/output/streamset/ of the Privacera Manager host machine:

      • privacera-streamset.tar.gz
      • crypto-config

      If you have JCEKS enabled, copy the following file from the location, ~/privacera/privacera-manager/config/keystores/ of the Privacera Manager host machine:

      • cryptoprop.jceks
    2. If you have Streamsets and Privacera Manager running on same systems, do the following:

      cp ~/privacera/privacera-manager/output/streamset/privacera-streamset.tar.gz ~/privacera/downloads
      cp -r ~/privacera/privacera-manager/output/streamset/crypto-config ~/privacera/downloads/crypto-config
      

      If you have JCEKS enabled, do the following:

      cp ~/privacera/privacera-manager/config/keystores/cryptoprop.jceks ~/privacera/downloads/crypto-config/
      
  2. Extract the Streamsets Privacera package.

    cd ~/privacera/downloads
    mkdir streamsets
    tar xfz ~/privacera/downloads/privacera-streamset.tar.gz -C streamsets
    
  3. Access the Streamsets installation directory as root user.

    sudo su
    
  4. Set Streamsets installation directory.

    export STREAMSET_HOME=/opt/streamset/streamsets-datacollector-3.13.0
    
  5. Copy the Privacera library into the Streamsets data collector user-libs directory:

    cp -r streamsets/privacera-streamset/ ${STREAMSET_HOME}/user-libs/
    
  6. Copy the configuration files.

    cp -r crypto-config ${STREAMSET_HOME}/../crypto-config
    
  7. Define security policy.

    cat << EOF >> ${STREAMSET_HOME}/etc/sdc-security.policy 
    grant {
    permission java.io.FilePermission "/opt/privacera/-", "read";
    permission java.io.FilePermission "/opt/streamset/-", "read,write";
    permission java.net.SocketPermission "*", "connect,accept,listen,resolve";
    };
    EOF
    
  8. Stop the Streamsets.

    kill -9 $(ps aux | grep 'sdc' | awk '{print $2}')
    
  9. Restart Streamsets.

    ulimit -n 32768
    nohup ${STREAMSET_HOME}/bin/streamsets dc &
    
  10. Verify the logs to make sure that Streamsets is running.

    tail -f ${STREAMSET_HOME}/log/sdc.log
    

Verification#

  1. Configure a sample pipeline to encrypt a local file. You can use the following sample. Import this sample pipeline into Streamsets.

    Sample pipeline

  2. Access the Streamsets installation directory as root user.

    sudo su
    
  3. Create data directories.

    DATA_DIR=/opt/streamset/
    cd ${DATA_DIR}
    mkdir -p customer_data/input 
    mkdir -p customer_data/output
    mkdir -p customer_data/input_error
    mkdir -p customer_data/output/encrypted_error
    
  4. Create a sample data file:

    cat << EOF > customer_data/input/customer_data_with_header.csv 
    id,name,ssn,email_address,amount
    1,Tamara,898453744,aphillips@vang.info,162454.67
    2,Richard,65511350,vreynolds@gmail.com,602.89
    3,Tanya,634090950,harringtonwilliam@diaz-king.com,48712.67
    4,Richard,829439881,martinvalerie@yahoo.com,5122.02
    5,Raymond,227804351,sarachavez@yahoo.com,97963.857
    6,Melissa,553465892,kevinwillis@gmail.com,36654.806
    7,Deborah,782539839,brittney24@yahoo.com,19.231
    8,Rodney,515337130,jenniferkelly@davis-bond.biz,65083.651
    9,Katherine,137057143,jperkins@gmail.com,4822.343
    10,David,432941241,wmccann@hotmail.com,4069.34
    11,Joshua,321606633,woodcrystal@parker.org,250357.06
    12,Edward,647791349,darlenecross@robinson.com,42653.59
    13,Mark,716038074,warrenlynn@yahoo.com,306.67
    14,Zachary,498128789,sheltonrobert@davidson-gray.com,35210.45
    15,Kevin,202146696,sruiz@yahoo.com,77743.523
    16,Michele,820733453,melissa22@melendez.biz,7423.56
    17,Cheryl,90449548,domingueztracy@hotmail.com,7520.45
    18,Nicholas,536554960,arobles@russell.com,4802.56
    19,Tina,259977164,aparks@gmail.com,59092.573
    20,Katherine,816643380,chelsea38@kennedy.com,10587.207
    21,Daniel,584629448,adrian91@hotmail.com,4945.78
    22,Jennifer,573263022,susansmith@kennedy.com,23389.602
    23,Natalie,244502868,mitchellaaron@yahoo.com,77547.97
    24,Charles,155538632,suarezharold@rocha.info,6036.093
    25,John,174339159,nsilva@contreras.com,48.45
    26,Nancy,558410411,brettthomas@adams.com,580.469
    27,Robert,817383397,sean82@hotmail.com,4473.48
    28,Charles,783127079,umerritt@davis-harris.com,3744.67
    EOF
    
  5. Create a metadata file to map the input dataset columns to Privacera Encryption schema columns:

    cat << EOF > customer_data/customer_data.meta
    COLUMN_NAME|SCHEME_NAME
    id|
    name|SYSTEM_PERSON_NAME
    ssn|SYSTEM_SSN
    email_address|SYSTEM_EMAIL
    amount|
    EOF
    

    To run the sample pipeline, make sure you have the Privacera user created in your Ranger and it has permissions on the KMS keys starting with pmsk*.

Ranger Configuration: Add Permission for Keys#

  1. Login to the Ranger UI as an administrator and create the Privacera user. You can grant permissions to the Privacera user on keys.

    img src="assets/create_privacera_user.png"

  2. Login to Ranger with keyadmin credentials and click on privacera_kms.

    img src="assets/privacera_kms.png"

  3. Create or update policy for Privacera user.

    img src="assets/policy_privacera_user.png"

  4. Now run the Streamsets pipeline preview and verify the encrypted value on right side of the table as shown in the screenshot below.

    img src="assets/preview_streamsets.png"


Last update: July 23, 2021