Skip to content

Disallowed Tags Policy#

This policy helps to monitor and raise alert if any PII tags are identified. You can add multiple tags by clicking enter after each value.

Add Disallowed Tags Policy#

The Disallowed Tags Policy has the following fields:

  • Name: This field indicates name of Disallowed Movement policy.

  • Type: This field indicates type of policy.

  • Alert Level : This field indicates alert level: High, Medium, or Low.

  • Description: This field indicates description for Disallowed Movement policy.

  • Disallowed Tags: This field allows you to add multiple tags to be disallowed.

If you are creating Disallowed Movement and Disallowed Tags policy, then you can capture data zone movement using Spark. Data Zone movement can be captured in HDFS to S3. |

Follow these instructions to capture Data Zone movement using Spark:

Note

These data zone examples are for reference. You should create your own.

  1. Create directories in HDFS and add the file in one of the HDFS locations.

    hdfs dfs -mkdir /colour/purple hdfs dfs -mkdir /colour/pink hdfs dfs -put /finance_us.csv /colour/purple/ 
    
  2. Add both the created directories in Include resource of HDFS.

  3. Create two Data Zones and add the two folders in those two Data Zone’s Resources.

  4. SourceDz: It should have resource e.g. /colour/purple/ and also the Data Zone tag.

  5. DestinationDz: It should have resource e.g. /colour/pink/ and also the policies configured for disallowed movement and disallowed tags.

  6. Set the Application property as follows:

  7. Generate Alert All Part Files = false

    Note

    If you set Generate Alert All Part Files to false, the system generates an alert for the first two part files. If you set this property to true, the system generates an alert for all part files.

  8. Go to the terminal and login to spark shell as follows:

    spark-shell --packages com.databricks:spark-csv_2.10:1.5.0  scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/colour/purple/finance_us.csv")  scala> df.coalesce(1).write.mode ("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/colour/pink/finance_us_11")  scala> df.repartition(4).write.mode ("overwrite").format("com.databricks.spark.csv").option("header", "true").save("/colour/pink/finance_us_100") 
    
  9. The output shows the following:

    • Kafka Topics: Check the Kafka topics audit consumption for Alerts and Lineage.

    • Alerts Details: Check the Alerts Details tab on the resource details for this resource.

    • Lineage: Check the Lineage for this resource.

    • Alerts Generated for part file : Check the Data Zone Graph for the alerts generation for the part files in DestinationDz.


Last update: August 24, 2021