Skip to main content

Privacera Documentation

Disallowed Tags policy

This policy helps to monitor and raises an alert if any PII tags are identified. You can add multiple tags by clicking enter after each value.

The Disallowed Tags policy has the following fields:

  • Name: The name of the Disallowed Movement policy.

  • Type: The type of policy.

  • Alert Level : The alert level: high, medium, or low.

  • Description: The description of the Disallowed Movement policy.

  • Disallowed Tags: Allows you to add multiple tags to be disallowed.

Add Disallowed Tags policy

If you are creating Disallowed Movement and Disallowed Tags policies, then you can capture data zone movement using Spark. Data Zone movement can be captured in HDFS to S3.

To capture Data Zone movement using Spark, follow these steps:

Note

These data zones are examples. You should create your own.

  1. Create directories in HDFS and add the file in one of the HDFS locations:

    hdfs dfs -mkdir /colour/purple 
    hdfs dfs -mkdir /colour/pink
    hdfs dfs -put /finance_us.csv /colour/purple/
    
  2. Add both the created directories in Include resource of HDFS.

  3. Create two Data Zones and add the two folders in those two Data Zones' Resources.

    • SourceDz: It should have resource e.g. /colour/purple/ and also the Data Zone tag.

    • DestinationDz: It should have resource e.g. /colour/pink/ and also the policies configured for disallowed movement and disallowed tags.

  4. Set the Application property as follows:

    Generate Alert All Part Files = false

    Note

    If you set Generate Alert All Part Files to false, the system generates an alert for the first two part files. If you set this property to true, the system generates an alert for all part files.

  5. Go to the terminal and log into Spark shell as follows:

    spark-shell --packages com.databricks:spark-csv_2.10:1.5.0  scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/colour/purple/finance_us.csv")  scala> df.coalesce(1).write.mode ("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/colour/pink/finance_us_11")  scala> df.repartition(4).write.mode ("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/colour/pink/finance_us_100") 
    

    The following output is displayed:

    • Kafka Topics: Check the Kafka topics audit consumption for Alerts and Lineage.

    • Alerts Details: Check the Alerts Details tab on the resource details for this resource.

    • Lineage: Check the Lineage for this resource.

    • Alerts Generated for part file : Check the Data Zone Graph for the alerts generation for the part files in DestinationDz.