Skip to main content

PrivaceraCloud Documentation

Table of Contents

Preview: Scan Generic Records with NER Model

:

Note

Contact Privacera Support to request enabling this feature.

For background, see Generic Models.

Based on Natural Language Processing (NLP), the Generic Named Entity Recognition (NER) model detects named entities like person name, organization, and location. This model is intended only for use with unstructured text files or unstructured fields in structured files as it works with contextual information present in the text surrounding the target tags.

If a structured file has fields with long sentences for which prediction is needed via NLP, you can set the UNSTRUCTURED_FIELD_IN_STRUCTURED_FILE_ENABLED parameter to true. However, setting this parameter to true might result in reduced speed for classification. The time required for classification depends on the number of unstructured field records with five or more words.

Supported tags

Generic_NER_ML_MODEL supports the following tags:

  • PERSON_NAME

  • ORGANIZATION

  • LOCATION

  • ACCOUNT 

  • ZipCode

  • Credit Card

  • EMAIL

  • US_DLICENSE

  • UK_US_PASSPORT

  • VIN

  • MEXICAN_CURP_NUMBER

  • MEXICAN_PASSPORT_NUMBER

  • SPAIN_SSN

  • SPAIN_PASSPORT

  • SPAIN_DRIVERS_LICENSE

  • SPAIN_DNI

  • CANADA_DRIVERS_LICENSE

  • CANADA_PASSPORT

  • CANADA_SIN

Tags

By default, tags supported by Generic_NER_ML_MODEL are not present on the portal UI. If you want your scans to detect and showcase these tags on the user portal, you need to add them explicitly under the Tags tab.

  1. CANADA_DRIVERS_LICENSE

  2. CANADA_PASSPORT

  3. CANADA_SIN

  4. MEXICAN_CURP_NUMBER

  5. MEXICAN_PASSPORT_NUMBER

  6. SPAIN_SSN

  7. SPAIN_PASSPORT

  8. SPAIN_DRIVERS_LICENSE

  9. SPAIN_DNI

Parameter

Data Type

Default

Description

UNSTRUCTURED_FIELD_IN_STRUCTURED_FILE_ENABLED

Boolean

False

Setting this parameter to true enables scanning of unstructured fields or columns within structured files.

NLP_WORD_PROXIMITY_LENGTH

Integer

10

This parameter sets the total length of words to be considered for contextual information around PII information.

NLP_LOG_LEVEL

String

INFO

This parameter sets the log level in the background process used for NLP.