Data Masking of Personal Identificable Information

Due to the sensitivity of processing Personally Identifiable Information (PII) and Sensitive Personal Information (SPI), it is essential to implement measures that protect it from unauthorized access and misuse, as this data can be used to identify, contact, or locate an individual. Examples include email and home addresses, social security numbers, passport IDs, credit card numbers, medical information, just to name a few.

As of now, masking is only available for virtual agents using Syntphony NLP.

How to protect users information

When data masking is enabled, users can toggle a switch on entity and answer cells to select which data to mask.

During runtime, the platform uses unmasked data for internal processing but displays masked text to maintain security. The "text" field is masked while the unmasked data is stored in the "entity" field, protecting sensitive information. This masking behavior is also applied to generative AI services, where all data is masked by default.

Entity cells

When masking an entity, all values contained within it will be masked. Masking is applied to the cognitive engine, Syntphony CAI's database, logs, dashboards, and dialogue simulator. The value contained in the user's message will be replaced with the entity's name. The rest of the user's interaction is preserved and won't be masked. Masking is available for synonym and pattern entities.

To mask an entity:

  1. Open the modal to create or edit an entity

  2. Activate the toggle the switch as it comes disabled as default

Answer cells

For the virtual agent responses, you can flag answers that need to be masked using the technical field. Additionally, a button will be available to mask transactional answers, with specific provisions for Voice Gateway interactions. In the case of audio inputs requiring masking, a prior indication is necessary to inform users that the next input must be masked, ensuring that no user input records are left unprotected.

Gen AI cell

If $text is used in the Gen AI cell and Rephrasing in the answer, code, rule, and service (webhook and REST connector) cells, the value will be masked.

On the other hand, if $entities['CAR'][0].originalValue is used in the Gen AI cell and Rephrasing in the answer, code, rule, and service (webhook and REST connector) cells, the value will not be masked.

Masking will be applied to the text field throughout the Dialog Manager, while the value in the entity remains stored.

Examples:

  1. Using $text:

Gen AI Cell:

{
  "prompt": "How do I use the $text feature?"
}

Answer/Code/Rule/Service Cell:

{
  "message": "You can use the $text feature by following these steps..."
}

Result: The value of $text will be masked, appearing as *** in the answer.

  1. Using $entities['CAR'][0].originalValue:

Gen AI Cell:

{
  "prompt": "How do I use the $entities['CAR'][0].originalValue feature?"
}

Answer/Code/Rule/Service Cell:

{
  "message": "You can use the $entities['CAR'][0].originalValue feature by following these steps..."
}

Result: The value of $entities['CAR'][0].originalValue will not be masked, appearing as the original value in the answer.

Summary:

  • $text: When using $text, the platform will mask the value to protect sensitive information.

  • $entities['CAR'][0].originalValue: When using $entities['CAR'][0].originalValue, the platform will not mask the value, allowing it to appear as the original value.

By implementing these rules, the platform ensures that sensitive information is appropriately masked or displayed based on the context of its usage. This helps in maintaining data privacy while allowing flexibility in how information is handled within different cells.

Analytics

When masking mode is activated, data won’t be stored in Syntphony CAI nor will it be available in the Dashboards.

In analytics dashboards, masked entity names are displayed to maintain data confidentiality. Once activated, selected values are replaced with *asterisks* at the beginning and end, effectively concealing the actual data.

Masking is applied to all entities, excluding system entities, ensuring comprehensive protection.

Masked answers are displayed with their respective IDs to maintain confidentiality. Again, selected values are replaced with *asterisks*, effectively concealing the actual data.

Logs are masked in the text field to ensure that sensitive information is not exposed. For external services, all generative AI interactions will use masked data by default. In service cells, while text is masked, entities remain unmasked to allow for necessary processing. Zero-shot and few-shot masking follow the same protocols as NLP, as zero-shot classification uses NLP for entity detection.

By implementing these comprehensive masking features, the platform ensures robust protection of PII and SPI, safeguarding user data while maintaining the functionality and usability of the virtual agent and associated services.

Last updated