Skip to main content

Privacera Documentation

Types of models

Privacera supports different types of models. You can filter the list of models using the search model option. This tab also displays the present number of record count.

Generic models

These are various general model parameters you can use to tailor matching of data.

Parameter

Data Type

Default

Description

INCLUDE_PATTERN_<#>

String

None

Patterns to be matched.

Can contain more than one pattern by changing the value of the <#> variable. For example: INCLUDE_PATTERN_1, INCLUDE_PATTERN_2, INCLUDE_PATTERN_3.

EXCLUDE_PATTERN_<#>

String

None

Patterns to be excluded from matching.

Can contain more than one pattern by changing the value of the <#> variable. For example, EXCLUDE_PATTERN_1, EXCLUDE_PATTERN_2, EXCLUDE_PATTERN_3.

ONLY_DIGITS

Boolean

FALSE

Indicates whether matching should use only the digits. Setting this parameter TRUE removes all non-numeric characters in the string before matching. For example, 1234-5 is treated as 12345.

CHECK_DIGIT_CODE_VALIDATE

String

None

Indicates whether to evaluate a checksum digit based on the last digit. Valid values:

  • LUHN

  • ABA

  • CUSIP

  • DIHEDRAL

  • IBAN

  • UK_NHS

  • MOD11

  • ISBN10

DO_LOOKUP

Boolean

FALSE

Indicates whether to use patterns specified by the LOOKUP_PATTERN parameter. If this parameter is set to TRUE, the patterns specified in LOOKUP_PATTERN are used.

LOOKUP_DICT

String

None

A dictionary name or key. See Dictionaries.

LOOKUP_PATTERN

String

None

Pattern for matching. See Patterns.

ISO3166_CC_VALIDATE_FLAG

Boolean

FALSE

Indicates whether to use Privacera-defined matching to validate an ISO two-character country code. If this parameter is set to TRUE, ISO3166_CC_PATTERN is used.

ISO3166_CC_PATTERN

None

A valid pattern for matching country codes. See Patterns.

ISO3166_CC_LOOKUP_KEY

None

Name of a defined dictionary. See Dictionaries.

Credit card model

The credit card model detects credit card numbers. It validates numbers based on the issuing network, length, and Luhn checksum.

Parameter

Type

Default

Meaning

CC_PATTERN

String

Privacera-supplied pattern for credit card numbers with range of digits, space or hyphen separated.

Credit card pattern, if you want to override the supplied pattern.

DEFAULT_TYPES

Boolean

True

Validate against known issuing network prefixes.

LUHN_CHECK

Boolean

True

Validate the Luhn checksum on the credit card number.

Supported credit card types

Credit Card Type

Description

Examples

American Express (AMEX) Card

Starts with starting with 34 or 37 and having 15 digits.

34xxxxxxxxxxxxx

37xxxxxxxxxxxxx

JCB

  • Starts with 2131 or 1800 and followed by 11 digits.

  • Starts with 35 followed by 14 digit.

2131xxxxxxxxxxx

35xxxxxxxxxxxxxx

Maestro

Starts with 5018, 5020, 5038, 6304, 6759, 6761, 6763 followed by 8 to 15 digits

6761xxxxxxxx

Master Card

  • Starts with 51 to 55 and having 14 digits

  • Starts with 2221 and having 12 digits

  • Starts with 27 and followed by 13 digits.

51xxxxxxxxxxxx

2221xxxxxxxx

27xxxxxxxxxxx

Visa Card

Starts with 4 and followed by 13 or 16 digits.

4xxxxxxxxxxxx

4xxxxxxxxxxxxxxx

Diners Club Card

Starts with 300 to 305 or 3095 or 36 or 38 or 39 and followed by 14 digits.

300xxxxxxxxxxx

3095xxxxxxxxxx

VPay (Visa) Card

Starts with and followed by 13 or 19 digits.

4xxxxxxxxxxxx

4xxxxxxxxxxxxxxxxxx

Regular expressions to match credit card numbers

Models for credit cards can define additional custom regular expressions to match against credit card types and numbers not explicitly supported by this model. Data that matches these regexes and passes the Luhn check is tagged as CC.

These additional regular expressions are entered into the Properties field when you create your model, as described in Create models.

Some examples of regexes for credit cards:

  • Match JCB credit card numbers:

    ADDITIONAL_REGEX_JCB: ^((?:2131|1800|35\d{3})\d{11})$

  • Match Maestro credit card numbers:

    ADDITIONAL_REGEX_MAESTRO: ^((?:5018|5020|5038|6304|6759|6761|6763)\d{8,15})$

Regex property name

The property name in Privacera must have the following prefix:

ADDITIONAL_REGEX.

This can be followed by some identifying string for your needs.

Regex property value

  • The regex value must indicate the beginning and end of the regexes by following this structure, as shown in the examples:

    ^(your_regexes_here)$

  • You should thoroughly test your_regexes_here before you put them into a Privacera Discovery model to verify that they return the desired results.

Interaction of regexes and Luhn checksum

If a regex matches but the Luhn checksum fails, the matched credit card number might not be tagged as CC. Verifying the Luhn checksum is enabled by default. So if the data is not tagged as CC as expected, you can disable verifying the Luhn checksum by setting the following property:

LUHN_CHECK:false

Note

Disabling the Luhn checksum is not recommended, because the credit card numbers should be checked for compliance to the number formats and algorithms.

Date of birth model

The Date of Birth model detects various date formats.

Parameter

Type

Default

Meaning

MIN_AGE_YEARS

Integer

5

Age lower threshold.

MAX_AGE_YEARS

Integer

100

Age upper threshold.

USE_ALGO

Boolean

True

Tagging is done based on an algorithm to detect random distribution.

DATE_REGEX_var1

String

Pattern that matches a custom date format var1.

DATE_FORMAT_var1

String

Date Format that matches the pattern for var1.

Pre-configured date formats are:

  • International YYYYMD format with 4 digit year

  • US MDY with 4 digit or 2 digit year

  • Month abbreviated MDY

Additional formats can be configured. For example, configure a regex and a Java date format:

Parameter

Type

DATE_REGEX_1

\d{4} \d{2} \d{2}

DATE_FORMAT_1

yyyy MM dd

EIN model

The EIN model detects Employer Identification Number using patterns and digit validation.

Parameter

Type

Default

Meaning

EIN_PATTERN

String

Default

EIN digit pattern if you want to override the default pattern.

VALIDATIONS

Boolean

True

Age upper threshold.

STRICT_PATTERN

Boolean

True

Allow match only if EIN has exact format.

Geo latitude and longitude model

The geo model detects latitude and longitude coordinates. It can validate these values based on a geographical area.

Parameter

Type

Default

Meaning

MIN_LAT

Double

US min latitude

Lower limit (southern) on latitude.

MAX_LAT

Double

US max latitude

Upper limit (northern) on latitude.

MIN_LONG

Double

US min longitude

Lower limit (west) on longitude.

MAX_LONG

Double

US max longitude

Upper limit (east) on longitude.

MIN_FRACTIONAL_DIGITS

Integer

3

Minimum number of digits after the decimal point.

IMEI model

The IMEI model detects International Mobile Equipment Identity numbers that are used to identify mobile phones. It validates the Luhn checksum and the length of the IMEI.

ITIN model

The ITIN model detects Individual Tax Identifier Numbers (identifiers of individual taxpayers). It validates the format and digits of the ITIN.

Parameter

Type

Default

Meaning

ITIN_PATTERN

String

Default

ITIN digit pattern if you want to override the default pattern.

STRICT_PATTERN

Boolean

True

Allow match only if ITIN has exact format.

MIME model

The MIME model detects a file based on its Multipurpose Internet Mail Extensions type. The MIME type is detected using a combination of file extension and magic bytes in the header of the file. The detected MIME type is then looked up in a dictionary of MIME types.

Parameter

Type

Default

Meaning

LOOKUP_DICT

String

Identifier of dictionary of MIME types.

There are two pre-configured MIME model instances.

  • For detecting executable files: LOOKUP_DICT=EXEC_MIME_KEYWORD.

  • For detecting image files: LOOKUP_DICT=IMAGE_MIME_KEYWORD.

Phone number model

The Phone Number model detects phone numbers. It validates the format of the phone numbers based on the country for which it is configured.

Parameter

Type

Default

Meaning

COUNTRY_CODE

String

US

Two-character country code.

SSN model

The SSN model detects US Social Security Numbers. It validates the format and checks against a blacklist of SSN numbers.

Parameter

Type

Default

Meaning

SSN_PATTERN

String

Default

Override the default SSN pattern.

VALIDATIONS

Boolean

True

Validate against known blacklist of SSNs.

STRICT_PATTERN

Boolean

False

Allow match only if SSN has exact format.

USE_9_DIGIT_PATTERN

Boolean

False

Match against any nine digit number without format.

USE_4_DIGIT_PATTERN

Boolean

False

Match against any four digit number without format. Disables validation with blacklist of SSN.

STRICT_EXT_PATTERN

Boolean

True

Allow match only if SSN has exact format that is hyphen-, dot-, or space-separated.

Examples of Invalid SSNs

The SSN model would determine that the following SSNs are invalid.

  • SSN starting with 9 or 666 or 000 or 98765432.

  • SSN with 00 as the 4th and 5th digits.

  • SSN with 0000 as the sixth through ninth digits.

  • Any SSN like these:

    • 123456789

    • 111111111

    • 222222222

    • 333333333

    • 444444444

    • 555555555

    • 666666666

    • 777777777

    • 888888888

    • 999999999

VIN model

The VIN model detects Vehicle Identification Numbers. It validates the length and the VIN checksum.

Zip model

The Zip model detects US Zip codes. It detects both 5 digit and 5+4 digit variations and validates against a dictionary of US Zip codes.

Parameter

Type

Default

Meaning

ZIP_DICT_KEY

String

US_ZIP_LOOKUP

Key of the US Zip dictionary.

ZIP_PATTERN

String

Default

Validates content regular expression for list of ZIP codes.

STRICT_PATTERN

Boolean

False

Allow match only if Zip code has exact format. If set to true then only nine digits containing '-' and starting with five digits are considered a Zip code.