Skip to main content

Privacera Documentation

Table of Contents

Hive UDFs for encryption on Privacera Platform

This topic provides instruction on how to enable encryption using a Hive user-defined function (UDF).

Add Privacera UDF in Hive

To add a Privacera user-defined function (UDF) in Hive, follow these steps:

  1. Log in to the Privacera Portal.

  2. From the navigation menu, select Encryption & Masking > Encryption.

  3. Under the Diagnostics tab, click Encryption.

  4. Click Create for UDF - Protect, UnProtect.

  5. Click Yes.

    The UDF is created.

Confirm Privacera UDF in Hive

  1. SSH to the instance.

  2. Do kinit for <user>.

  3. Connect to beeline.

    # Example
    beeline -u "jdbc:hive2://<hostname>:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n $<user>
    
  4. Enter the following two commands to describe the functions created by Privacera:

    • DESCRIBE FUNCTION EXTENDED privacera.protect;

    • DESCRIBE FUNCTION EXTENDED privacera.unprotect;

    Note

    If the UDF has not been created successfully, you will see the error message “Function ‘privacera.protect’ does not exist”.

Test Privacera UDF in Hive

  1. Log in to Privacera Portal.

  2. From the navigation menu, select Encryption & Masking > Schemes .

  3. Click Add.

  4. Create a scheme by entering the following details:

    • Scheme Name: SWIFT_ID

    • Format type: FPE_ALPHA_NUMERIC

    • Scope: All

    • Algorithm: FPE

  5. Click Save.

  6. Check if the KMS key is generated for the scheme:

    1. Log in to Ranger at http://<hostname>:6080 with the username keyadmin.

    2. Click Encryption > Key Manager.

  7. SSH to the instance.

  8. Do kinit for <user>.

  9. Connect to beeline.

    #Example
    beeline -u "jdbc:hive2://<hostname>:2181/;
    serviceDiscoveryMode=zooKeeper;
    zooKeeperNamespace=hiveserver2" -n $<user>
    
  10. After connecting to beeline enter the following commands.

    select privacera.protect("TEXT" ,"SWIFT_ID");
    select privacera.unprotect(privacera.protect("TEXT" ,"SWIFT_ID"), "SWIFT_ID");
    
  11. Check the KMS audits in Ranger.

    • Log in to Ranger http://<hostname>:6080 with the username keyadmin.

    • Click Audit > Access.

    • If the UDF fails, the message "denied access" is recorded in Ranger KMS audits.

      If access is denied, you need to give permission to `$<user>`.

    • If the UDF result is successful, the audits are shown as Allowed.

    • To check crypto UDF logs, run this command:

      sudo su
      tail -f /var/log/hive/hiveserver2.log
      

Give users access to UDFs

Users need access to the UDFs.

Set up Privacera Encryption

Check if the shaded JAR is needed

You might need to use a shaded JAR depending on certain conditions described below.

To check if the shaded JAR is needed, go to the Hive libraries directory:

cd /opt/cloudera/parcels/CDH/lib/hive/lib/
ls | grep http

You need to install the shaded JAR if:

  • The version of the http client is less than 4.5.6.

  • The UDF throws a NoClassDefFound exception for class org/apache/http/config/Lookup.

To install the shaded JAR, see Install the shaded JAR.

Create Privacera UDF in CDH/CDP

To install the Privacera Encryption JAR in CDH/CDP cluster, do the following:

  1. Download the JAR to the cluster node.

    sudo su - privacera
    exportPRIVACERA_BASE_DOWNLOAD_URL=$<PRIVACERA_BASE_DOWNLOAD_URL>
    wget $<PRIVACERA_BASE_DOWNLOAD_URL>/privacera-crypto-jar-with-dependencies.jar -O privacera-crypto-jar-with-dependencies.jar
    
  2. Upload the JAR into an HDFS location where Hive can access it. The following are the commands to upload the JAR into HDFS using Hadoop CLI:

    kinit -kt /opt/privacera/keytab/privacera.headless.keytab privacera
    hadoop fs -ls
    hadoop fs -mkdir -p /privacera/crypto/jars
    hadoop fs -put privacera-crypto-jar-with-dependencies.jar /privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar
    
  3. Create configuration files.

    hadoop fs -mkdir -p /privacera/crypto/configs
    
    vi crypto.properties
    
    privacera.portal.base.url=http://$<PRIVACERA_SERVER>:6868
    privacera.portal.username=$<USER_NAME>
    privacera.portal.password=$<PASSWORD>
    #Mode of encryption/decryption rpc/native
    privacera.crypto.mode=native
    
    cp crypto.properties  crypto_default.properties
    
  4. Upload the configuration files to DBFS.

    hadoop fs -put crypto.properties /privacera/crypto/configs/crypto.properties
    hadoop fs -put crypto_default.properties /privacera/crypto/configs/crypto_default.properties
    
  5. Install the Crypto JAR into the Cluster.

    1. SSH to the cluster node.

    2. Do kinit for $<user> (if Kerberos is enabled).

    3. Connect to beeline.

    4. If you have Kerberos-enabled cluster, use the following command to login to beeline:

      Update the values for for the variables <HIVE_SERVER2_HOSTNAME> and <REALM> below.

      HIVE_SERVER2_HOSTNAME=<<10.1.1.2>>REALM=<<EXAMPLE.PRIVACERA.US>>beeline -u "jdbc:hive2://$<HIVE_SERVER2_HOSTNAME>:10000/default;principal=hive/$<HIVE_SERVER2_HOSTNAME>@$<REALM>"              
  6. Add Privacera crypto UDF JAR. You need to run multiple SQL queries in Databricks cluster to create privacera encryption functions.

    • SQL query to create the Privacera unprotect UDF.

      add jar hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar;
      
  7. Create the Privacera encryption UDF. You need to run multiple SQL queries in beeline to create Privacera encryption UDF.

    • SQL query to create Privacera unprotect UDF:

      create database if not exists privacera ;
      drop functionif exists privacera.unprotect;
      create function privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar';
      
    • SQL query to create Privacera protect UDF:

      drop functionif exists privacera.protect;
      create function privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar';
      

Install the shaded JAR

To install the shaded Privacera Encryption JAR in a CDH/CDP cluster, follow these steps:

  1. Download JAR in cluster node.

    sudo su - privacera
    exportPRIVACERA_BASE_DOWNLOAD_URL=$<PRIVACERA_BASE_DOWNLOAD_URL>
    wget $<PRIVACERA_BASE_DOWNLOAD_URL>/privacera-crypto-jar-with-dependencies-shaded.jar -O privacera-crypto-jar-with-dependencies-shaded.jar
    
  2. Upload JAR into HDFS location from where Hive can access it. Following are the commands to upload JAR into HDFS using Hadoop CLI.

    kinit -kt /opt/privacera/keytab/privacera.headless.keytab privacera
    hadoop fs -ls /
    hadoop fs -mkdir -p /privacera/crypto/jars
    hadoop fs -put privacera-crypto-jar-with-dependencies-shaded.jar /privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar
    

    3. Create configuration files.

    hadoop fs -mkdir -p /privacera/crypto/configs
    vi crypto.properties
    privacera.portal.base.url=http://$<PRIVACERA_SERVER>:6868
    privacera.portal.username=$<USER_NAME>
    privacera.portal.password=$<PASSWORD>
    
    #Mode of encryption/decryption rpc/native
    privacera.crypto.mode=native
    cp crypto.properties  crypto_default.properties
    
  3. Upload the configuration files to DBFS.

    hadoop fs -put crypto.properties /privacera/crypto/configs/crypto.properties
    hadoop fs -put crypto_default.properties /privacera/crypto/configs/crypto_default.properties
    
  4. Install Crypto JAR into cluster.

    1. SSH to the cluster node.

    2. Do kinit for <user> (if Kerberos is enabled).

    3. Connect to beeline.

    4. If you have Kerberos-enabled cluster, use the following command to login to beeline:

      Update the values for <HIVE_SERVER2_HOSTNAME> and <REALM> as per your environment.

      HIVE_SERVER2_HOSTNAME=<<10.1.1.2>>REALM=<<EXAMPLE.PRIVACERA.US>>beeline -u "jdbc:hive2://$<HIVE_SERVER2_HOSTNAME>:10000/default;principal=hive/$<HIVE_SERVER2_HOSTNAME>@$<REALM>"
      
  5. Add Privacera Crypto UDF JAR. You need to run multiple SQL queries in Databricks cluster to create Privacera encryption functions.

    • SQL query to create the Privacera unprotect UDF.

      add jar hdfs:///privacera/crypto//privacera-crypto-jar-with-dependencies-shaded.jar;
      
  6. Create the Privacera encryption UDF. You need to run multiple SQL queries in beeline to create Privacera encryption UDF.

    • SQL query to create Privacera unprotect UDF .

      create database if not exists privacera ;
      drop functionif exists privacera.unprotect;
      CREATE FUNCTION privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar';
    • SQL query to create Privacera protect UDF.

      drop functionif exists privacera.protect;
      CREATE FUNCTION privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar';

Hive TEZ: Add properties file in beeline

If you are using Hive TEZ, before executing the UDFs, you must add the cryptop.properties file in beeline with the following command:

add file hdfs:///privacera/crypto/configs/crypto.properties;

Sample queries to verify setup

  • Sample query to run encryption:

    select privacera.protect("test_data","$<SCHEME_NAME>") limit 10;
  • Sample query to run encryption and decryption in one query to verify setup:

    select privacera.unprotect(privacera.protect("test_data","$<SCHEME_NAME>"),"$<SCHEME_NAME>") limit 10;
  • For authorization and leveraging Hive Masking policies, you need to install the Hive plug-in in hiveserver2.

If you do not want to install the Hive plug-in, you can authorize use of the keys based on KMS.

  • Create a view on top of your raw table:

    create view secure_view as select col1, privacera.protect(col2, ‘SSN’) as col2 from db.table;
    select * from secure_view;
    
  • If the user is not present in Privacera, they can still access the protect function. It is recommended to use the Hive plug-in as they can control the access to the resource using Ranger Policies and it makes it easier to manage them with the simple UI.