TagSync using Apache Ranger on Privacera Platform

Privacera Discovery allows you to classify your data using tags. Tags can be used in access policies to manage access to sensitive data.

Apache Ranger requires the tagged information while applying a policy. This topic describes how you can propagate the tag details from Discovery to Apache Ranger.

Enable TagSync

Configure following properties in the Application Properties UI to enable TagSync in the Privacera Portal:

Go to Settings > Data Source Registration.
Select the required application, click edit icon .
Go to APPLICATION PROPERTIES.
Enable the toggle for Enable Ranger TagSync.
Under Send Inherited Table Tags To Ranger, type true.

Properties to add based on service type

Apart from above properties, you need to add the additional properties based on service type in Application Properties UI. These properties will help to verify TagSync in Apache Ranger using the Ranger utility script.

Go to Add Custom Properties. Under Custom Properties, add properties for service_name and cluster_name as shown in the following examples.

service_name=privacera_s3
cluster_name=privacera

The value of service_name depends on the application that you want to apply TagSync to. The following is a list of services and values for each application:

service_name=privacera_s3
cluster_name=privacera

Redshift

service_name=privacera_redshift
cluster_name=privacera

PostgreSQL

service_name=privacera_postgres
cluster_name=privacera

Snowflake

service_name=privacera_snowflake
cluster_name=privacera

DynamoDB

service_name=privacera_dynamodb
cluster_name=privacera

MSSQL/Synapse

service_name=privacera_mssql
cluster_name=privacera

MySql/MariaDB/AuroraDB/Databricks Spark SQL/EMR Hive

service_name=privacera_hive
cluster_name=privacera

Databricks Unity Catalog

service_name=privacera_databricks_unity_catalog
cluster_name=privacera

TagSync validation scenarios

TagSync can be validated in the following scenarios:

Note

Allowed and rejected tags will not be synced to Apache Ranger.

Auto scanning

On the Classifications page, files are classified with system classified tags. After classification, all system-classified and manually accepted tags are synced to Apache Ranger.

Parent-Child Level TagSync in Apache Ranger:

Based on database applications or file systems, following is the criteria to sync parent and child tags:

Database applications

Example 1. Scenario

If the resource is a database, then the database gets classified as:

Database, tag1, tag2, etc.

In Ranger, child entries are created as below:

(Database): tag1, tag2, etc.

Example 2. Scenario

If the resource is a table, the classification is as shown as below:

(Database, table), tag1, tag2, etc. then in Ranger child level entry can be seen as below:

In Ranger, child level entry can be seen as below:

(Database, table): tag1, tag2, etc.

Example 3. Scenario

If the resource is a column, on the UI the classification is as shown below:

(Database, table, column), tag1, tag2, etc.

In Ranger, only column level tags will be synced:

(Database, table, column), tag1, tag2. etc.

File System

For a folder or file, all the tag levels are allowed.
For a field, only the same tag level is allowed.

Meta tagging

Meta tags are applied at the table, file or folder level. They are also synced to Apache Ranger at the table, file or folder level. Only system classified and manually classified tags are synced to Apache Ranger.

Folder tagging

By default folder tagging feature is not enabled, you can enable folder tagging at the application settings using Folder name tagging toggle button. Folder tagging includes folder names during scanning and tags the folders based on dictionary values.

Example 4. Scenario

Create a new dictionary with following fields:

Name: Enter the dictionary name.
Type: Select the tagging type from the dropdown menu.
Apply For: Select metaname.
Tags: Add existing or new tag names.

Save and add the folder names that you wish to tag. The names should match either folder, file, or field name in the scanned files.

Add S3 resources on any file or folder, system will add a tag on the folder with values that are matching from the dictionary and that are present in the path.

On the Classification page, you can see folder resource along with tags.

Open scan summary, under tagged resource tab you will see all tagged folders with scan reason as Resource is folder.

Check for tags in ranger using tag sync tool, you need to add all necessary fields in application s3 settings to enable ranger tag sync.

Post-processing tags

System classified and manually classified tags that are applied using post processing rules are synced to Apache Ranger.

Re-evaluate

In the case of re-evaluation, system classified and manually classified datazone tags are synced to Apache Ranger. Resources that are deleted through datazone policies will be removed from Apache Ranger as well.

Add or edit tags

You can add or edit tags manually on the original classified resources from following pages:

Classifications: From the navigation menu, select Data Inventory > Classifications.
Resource Detail: From the navigation menu, select Data Inventory > Classifications. Select a resource and click Resource Detail.
Data Explorer: From the navigation menu, select Data Inventory > Data Explorer.
Data Zone Dashboard: From the navigation menu, select Compliance Workflow > Data Zone Dashboard.

When a user adds tags manually from the pages listed above, the tag status is set by default to “Accepted : Manually classified” and it will be synced to Apache Ranger.

Add a resource

You can manually add tags to unclassified resources. When you add such resources and add a tag to them, the tag status is set by default to “Accepted : Manually classified” and it will be synced to Apache Ranger.

To add resource, select Data Inventory > Classifications from the navigation menu and click Add Resource.

Tag status changes

Tag status changes will affect TagSync. Only system classified and manually accepted tags will be synced to Apache Ranger. The following are few scenarios for tag status changes:

If the status of a tag is changed from system classified to rejected or allowed, then the tag will be removed from Apache Ranger.
If the status of the tag is changed from manually accepted to allowed or rejected, then the tag will be removed from Apache Ranger.
If the tag status resets to system classified from rejected or allowed, then the tag be synced Apache Ranger.
If the tag status is changed to manually classified from rejected or allowed, then the tag will be synced to Apache Ranger.
If the tag status is changed from system classified to manually classified, then the synced tags in Apache Ranger will remain unchanged.

Remove tags

You can manually remove added tags if you have rejected them. If you remove a tag from a resource using the Add/Edit option, then the tag will be removed from Apache Ranger as soon as you reject it.

Remove resources

If a resource is added manually and has only manually classified tags, then after your reject the last tag the resource will be removed from Apache Ranger.

If a resource has system classified tags and you reject the last tag, the resource will be removed from Apache Ranger as last TagSync for the same resource will get removed.

Rescan of same file

If you rescan a resource that is already synced with Apache Ranger and no changes were made to rules or datazone policies, then TagSync will remain unchanged.
If post-processing rules are disabled, then rescanning a file will remove post-processing tags.
If a datazone tag is disabled or a resource removed from a datazone, then the datazone tag will be removed from Apache Ranger upon rescan.
If a meta tag rule or a meta tag is disabled, then the meta tag will be removed from Apache Ranger upon rescan.
If a status change is applied before a rescan of a file, as per status change TagSync will also affect.

Validate TagSync in Apache Ranger

You can view tags that are getting pushed to Apache Ranger using curl commands as well as using the Ranger tag utility script.

Validate TagSync using curl command

curl -i -L -k -u admin:${PRIVACERA_PASSWORD} -H "Content-type: application/json" -X GET 
https://${PRIVACERA_HOST}:6182/service/tags/resources/service/privacera_postgres

The above curl command will give the list of resources that are synced to Apache Ranger, but the response of this curl command is not in a readable format. Therefore , it is recommended to use the Ranger tag utility to check TagSync.

Validate TagSync using the Ranger Tag Utility

The following is a Python script created to communicate with all Ranger API methods. This will return the response in a readable format:

Run the following command to download required files:

wget https://privacera.s3.amazonaws.com/public/pm-demo-data/ranger_tag_utility.py -O ranger_tag_utility.py

Download the file on your local system and execute the following command to view the TagSync response.

SSL instance

python3 ranger_tag_utility.py     --operation list_tags     --host ${PRIVACERA_HOST}    --port 6182     --username 
${RANGER_USERNAME}     --password ${RANGER_PASSWORD}     --servicename privacera_redshift    --ssl True     --verifyssl False

Non-SSL instance

python3 ranger_tag_utility.py     --operation list_tags     --host ${PRIVACERA_HOST}     --port 6080     --username 
${RANGER_USERNAME}     --password ${RANGER_PASSWORD}     --servicename privacera_maprfs     --ssl True     --verifyssl False

(Optional) Change the service name as per the application.

Output

Received Tag Data for path : ['/testdir/sample_files/file_format/avro/test.avro'] => tags :: ['SSN', 'PERSON_NAME', 'AU_BAN', 'TEST_DATAZONE', 'POST_PROCESS']
Received Tag Data for path : ['/testdir/sample_files/file_format/avro/test.snappy.avro'] => tags :: ['US_ADDRESS', 'SSN', 'US_PHONE_NUMBER', 'AU_BAN', 'PERSON_NAME', 'TEST_DATAZONE', 'POST_PROCESS']
Received Tag Data for path : ['/testdir/sample_files/file_format/avro/test1.avro'] => tags :: ['SSN', 'US_PHONE_NUMBER', 'PERSON_NAME', 'US_ADDRESS', 'AU_BAN', 'TEST_DATAZONE', 'POST_PROCESS']
Received Tag Data for path : ['/testdir/sample_files/file_format/avro/twitter.avro'] => tags :: ['PERSON_NAME', 'TEST_DATAZONE', 'POST_PROCESS']
Received Tag Data for path : ['/testdir/sample_files/file_format/avro/twitter.snappy.avro'] => tags :: ['PERSON_NAME', 'TEST_DATAZONE', 'POST_PROCESS']

Privacera Documentation

Table of ContentsTable of Contents