eva NLP
deploy
deploy
  • eva 3.3.0.0 NLP
  • Deploy and Development Guide
  • Page 1
Powered by GitBook
On this page
  • 3. JSON Contracts
  • 3.1 Training
  • 3.2 Training status
  • 3.3 Inference
  • 4. Deploy
  • 4.1 Essential settings:
  • 4.2 Essential Eva NLP settings:
  • 4.3 eva NLP extra settings:
  • 4.4 Additional settings:
  • 4.5 Cassandra settings:
  • 4.6 Cluster Settings in GCP
  • eva NLP manual deploy
  • Minimum Requirements
  • Configuring Eva NLP
  • Setting up NFS
  • Setup EVA NLP modules
  • Setup EVA NLP endpoints
  • Setup EVA NLP trainer daemon
  • 5. EVA NLP - Errors
  • 5.1 EVA NLP Training
  • 5.1.1 Submission
  • 5.1.2 JSON
  • 5.1.4 JSON
  • 5.2 EVA NLP Inference
  • 6. eva NLP masking
  • 6.1 General characteristics
  • 6.2 eva NLP Masking design
  • 6.3 Masking definition
  • 6.4 Masking procedure
  • 6.5 System entities offered by the NLPs
  • 7. JSON contracts
  • 7.1 URL
  • 7.2 Header
  • 7.4 Request response

Deploy and Development Guide

Previouseva 3.3.0.0 NLPNextPage 1

Last updated 3 years ago

3. JSON Contracts

The eva NLP training, training status, and inference JSON contracts are described below.

3.1 Training

URL:

Headers:

Request body:

name: bot name

version: bot version

lang: bot language

slot: deploy environment (homologation, production or test)

dataset: dataset for Eva NLP training

intents: list of intents that makes the training dataset

label: intent name

examples: list of examples by intent

entities: list of entities (synonym or pattern)

name: entity name

type: entity type

values: list of entities values

Request response:

bot: bot name

in_queue_for_slot: deploy environment in the training queue

lang: bot language

version: bot version

3.2 Training status

URL:

Headers:

Request body:

name: bot name

slot: deploy environment (homologation, production or test)

Request response:

dataset_version: bot name

model_metrics: deploy environment (homologation, production or test)

acc: intent classifier accuracy value

loss: intent classifier loss value

ready: shows if the intent classifier is ready to answer requests

training_progress: training progress

training_state: shows if the training is done

3.3 Inference

URL:

Headers:

Request body:

Request response:

entities: list of entities extracted by the entity extractor

text: text where the entity was found

entity: type of the found entity

intent: message intent

confidence: entity classifier confidence predictor

4. Deploy

Make sure that you have a reserved static IP to use for eva NLP internet API

As it is a pipeline with many possible configurations, it will be shown in parts.

4.1 Essential settings:

cloud_provider

Name of the provider that manages the K8s cluster.

cluster

K8s cluster name.

zone

Zone where the K8s cluster is found.

gce_proj

Name of the project name where the Project is found.

k8s_api_server

K8s cluster IP adress.

native_nfs

Enables the creation of a NFS (Network File System) to Eva NLP. Only uncheck it if there is a created NFS.

services_ip_range

Range of IP adresses that can be used to create services in K8s.

public_ip

Enables the creation of an Ingress. Only uncheck if you don’t want Eva NLP acessible from internet.

external_ip_name

Static IP reserved for Ingress

4.2 Essential Eva NLP settings:

enable_predict

Enables Eva NLP prediction service and its dependencies.

enable_masking

Enables Eva NLP masking service and its dependencies.

enable_cassandra

Enables Cassandra as Eva NLP trained models storage option.

4.3 eva NLP extra settings:

engine_tag

EVA NLP ENGINE image repository tag

daemon_tag

EVA NLP DAEMON image repository tag

max_bots

Max number of bots in Eva NLP

num_replicas

Number of replicas for each Eva NLP module

parallel_trainings

Number of allowed parallel trainings

4.4 Additional settings:

docker_registry

Registry address where Eva NLP images are found.

registry_secret

Secret that has to be used in the registry authentication. (This is an optional parameter)

namespace

Namespace that will be used in Kubernetes

nfs_ip

Determines a static address to the NFS, overwriting the services_ip_range option. (This is an optional parameter)

purge_noronha

Enables the purge of all Noronha modules that control Eva NLP’s defined and trained models.

purge_eva NLP

Enables the purge of all Eva NLP’s configuration and modules.

4.5 Cassandra settings:

cassandra_replicas

Initial number of Cassandra pods (and the writing replication factor)

cassandra_cpu

Absolute number of requested CPUs for each Cassandra pod.

cassandra_memory

RAM allocated to each Cassandra pod, expressed in GB

cassandra_heap_size

Memory heap allocated to each Cassandra pod, expressed in MB.

cassandra_storage_size

Storage space to each Cassandra pod, expressed in GB.

4.6 Cluster Settings in GCP

To get the GCP values for configuration cluster, follow the step-by-step procedure below. The images are for illustration purposes. Use with your values.

Cluster

To find your cluster, click the hamburguer menu, then Kubernetes engine and Clusters.

Zone

Your cloud zone is in the same page of your cluster.

gce_proj

k8s_api_server

Your k8s_api_server is in the same page of your cluster.

services_ip_range

Your services_ip_range is in the same page of your cluster.

external_ip_name

To get your external_ip_range, click the hamburger menu, then VPC network and then, External IP addresses. Select one that is not being used by other applications.

eva NLP manual deploy

The following steps will provide pertinent information about the procedures to install EVA NLP SaaS manually on a Kubernetes cluster.

Important: change the {CURRENT_VERSION} placeholder to the current eva NLP version.

Whenever any other placeholder appears, change them according to the information from your cluster provider.

Important:

  • All yaml files should be executed in the exact order as they are presented in this document

  • Each field that should be edited has a comment on the side, indicating that you should fill the parameters with the information from your cluster provider.

Minimum Requirements

Server Requirements - Kubernetes Cluster 1.14.10 - Number of nodes: 4 - CPU node pool: 3 - vCPU: 8 - Memory requirements: 30GB - GPU node pool: 1 - vCPU: 8 - Memory: 30GB - GPU board: Preferable NVIDIA Tesla T4

Software Requirements - Python >= 3.6 - Tensorflow 1.15.1 - Pytorch >= 1.5.0

Configuring Eva NLP

Clone EVA NLP yaml files from the following repository

By using the following command:

git clone https://{git_username}:{git_password}@gitlab.eva.bot/asseteva/eva-eva NLP-k8s

To make it easier to configure EVA NLP, we are providing an auxiliary shell script that you can use to set up all necessary configurations for each yaml file.

To run our script:

Usage: ./prepare-yaml.sh [ARGS]

This script prepares YAML files for installing Eva NLP in your K8s cluster.

Arguments

Explanation

cloud-provider

Name of cloud provider (GCP, AWS, Azure)

k8s-api-server

Address to the K8s cluster's API server

docker-registry

Docker registry used by the K8s cluster

engine-tag

Docker tag that will be used for the eva NLP-engine image

services-ip-range

Internal IP range for services in your K8s cluster (overwritten by argument 'nfs-ip')

external-ip-name

Name of the static ip reservation for setting up Eva NLP ingress

max-bots

Maximum number of coexisting bots

num-replicas

Number of containers for each productive deployment

parallel-trainings

Maximum number of concurrent trainings

registry-secret

(Optional) Name of the K8s secret that holds your docker registry's credentials

nfs-ip

(Optional) Internal IP for a new NFS service in your K8s cluster

help

Show this message and exit

Running the script, all necessary YAML files will be edited and saved in a "install" directory with the defined parameters and be ready to be applied.

The next session explains each YAML file specifications and in the order that each should be applied.

Setting up EVA NLP core configuration

This yaml file provides all API objects needed to setup EVA NLP in kubernetes.

File: core.yaml

--- apiVersion: v1 kind: Namespace metadata: name: [namespace]

# if you change the namespace, all the other yaml files should be modified as well.

--- apiVersion: v1 kind: ServiceAccount metadata: name: eva NLP-account namespace: {namespace}

--- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: eva NLP-role namespace: {namespace} rules: - apiGroups: ["", "extensions", "apps", "autoscaling"] resources: ["pods", "services", "deployments", "secrets", "pods/exec", "pods/status", "pods/log", "persistentvolumeclaims", "namespaces", "horizontalpodautoscalers"] verbs: ["get", "create", "delete", "list", "update", "watch", "patch"]

--- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: eva NLP-role namespace: {namespace} subjects: - kind: ServiceAccount name: eva NLP-account namespace: {namespace} roleRef: kind: ClusterRole name: eva NLP-role apiGroup: rbac.authorization.k8s.io

--- apiVersion: v1 kind: Secret metadata: name: eva NLP-account namespace: {namespace} annotations: kubernetes.io/service-account.name: eva NLP-account type: kubernetes.io/service-account-token

To apply, use this command:

kubectl apply -f core.yaml

Setting up NFS

EVA NLP uses a NFS filesystem to keep all EVA NLP related logs and for shared containers within the cluster.

File: nfs.yaml

--- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: eva NLP-nfs namespace: {namespace} spec: accessModes: - ReadWriteOnce resources: requests: storage: 128Gi storageClassName: {storage_class}

# edit the storage class for provisioning disk on demand (Azure: default | Others: standard)

--- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: eva NLP-nfs namespace: {namespace} spec: selector: matchLabels: role: eva NLP-nfs template: metadata: labels: role: eva NLP-nfs spec: containers: - name: eva NLP-nfs image: gcr.io/google_containers/volume-nfs:0.8 args: - /nfs ports: - name: nfs containerPort: 2049 - name: mountd containerPort: 20048 - name: rpcbind containerPort: 111 securityContext: privileged: true volumeMounts: - mountPath: /nfs name: mypvc volumes: - name: mypvc persistentVolumeClaim: claimName: eva NLP-nfs

--- apiVersion: v1 kind: Service metadata: name: eva NLP-nfs namespace: {namespace} spec: clusterIP: {nfs_server}

# edit the nfs internal ip (if this one is already taken)

ports: - name: nfs port: 2049 - name: mountd port: 20048 - name: rpcbind port: 111 selector: role: eva NLP-nfs

To apply, use the command:

kubectl apply -f nfs.yaml

Wait for at least 10 minutes for stabilization of the deployment

Register eva NLP-conf.yaml and nha.yaml files as kubernetes configmaps

There are two configuration files that must be applied as configmaps:

eva NLP-conf.yaml: EVA NLP related configuration attributes.

--- eva NLP: max_bots: {n_bots}

# edit the maximum number of coexisting bots

n_tasks_hml: {n_tasks}

# edit the number of containers for each productive deployment

max_concurrent_trainings: {n_trains}

# edit the maximum number of concurrent trainings

daemon_delay: 5 deploy_timeout_seconds: 900

nha.yaml: Noronha related configuration attributes.

web_server: type: gunicorn threads: enabled: true high_cpu: true

router: port: 30081

mongo: port: 30018

lightweight_store: enabled: {enable_cassandra}

# true to enable lightweight storage, false otherwise.

native: false port: 9042 type: cass keyspace: nha_db hosts: ['cassandra'] replication_factor: 3

# this is used for write/read consistency. Edit if you need, but it cannot be higher than the number of Cassandra pods available. See cass-statefulset.yaml

logger: level: DEBUG pretty: true directory: /logs file_name: eva NLP.log

docker: target_registry: {docker_registry}

# edit the docker registry used by the k8s cluster

registry_secret: {registry_secret}

# edit the name of the k8s secret that holds your docker registry's credentials

container_manager: type: kube namespace: eva NLP api_timeout: 600 healthcheck: enabled: true start_period: 120 interval: 60 retries: 12 storage_class: {storage_class}

# edit the storage class for provisioning disk on demand (Azure: default | Others: standard)

nfs: server: {nfs_server}

# edit the nfs server ip address (same as in nfs.yaml)

path: /nfs/nha-vols resource_profiles: nha-train: requests: memory: 5120 cpu: 2 limits: memory: 8192 cpu: 4 nha-bert: auto_scale: true minReplicas: 2 targetCPUUtilizationPercentage: 80 requests: memory: 10240 cpu: 4 limits: memory: 12288 cpu: 4 nha-w2v: requests: memory: 3072 cpu: 1 limits: memory: 12288 cpu: 1 nha-predict: auto_scale: true targetCPUUtilizationPercentage: 80 requests: memory: 1024 cpu: 1 limits: memory: 2048 cpu: 1 nha-ner: enable_gpu: {use_gpu}

# true to enable gpu utilization, false otherwise.

auto_scale: true minReplicas: 2 targetCPUUtilizationPercentage: 80 requests: memory: 6144 cpu: 2 limits: memory: 24576 cpu: 4

To apply, use the commands:

kubectl -n eva NLP create configmap eva NLP-conf --from-file=eva NLP-conf.yaml kubectl -n eva NLP create configmap eva NLP-nha-conf --from-file=nha.yaml

Setup EVA NLP modules

Applying this yaml will start a job to create all modules EVA NLP needs for inference. --- apiVersion: batch/v1 kind: Job metadata: name: eva NLP-saas namespace: eva NLP spec: template: metadata: labels: app: eva NLP-saas spec: imagePullSecrets: - name: {registry_secret}

# edit the name of the k8s secret that holds the docker registry credentials.

serviceAccountName: eva NLP-account restartPolicy: OnFailure volumes: - name: eva NLP-nfs nfs: server: {nfs_server}

# edit the nfs server ip address (same as in nfs.yaml)

path: /nfs - name: eva NLP-conf configMap: name: eva NLP-conf - name: eva NLP-nha-conf configMap: name: eva NLP-nha-conf - name: dockersock hostPath: path: /var/run/docker.sock containers: - name: eva NLP-saas image: {docker_registry}/eva NLP-daemon:{daemon_version}

# edit the docker registry used by the k8s cluster

imagePullPolicy: Always command: ["./sh/entrypoint.sh", "--component", "saas", "--setup", "<predictor>", "<masker>"]

# edit which inference components will be deployed. Options are --predictor and --masker

env: - name: DOCKER_TAG value: {engine_version}

# edit the docker tag used for eva NLP-engine image

- name: DOCKER_REGISTRY value: {docker_registry}

# edit the docker registry used by the k8s cluster

- name: K8S_API_SERVER value: Erro! A referência de hiperlink não é válida.

# edit the address to the k8s cluster's api server

volumeMounts: - name: eva NLP-nfs mountPath: /nfs - name: eva NLP-conf mountPath: /app/eva NLP-conf.yaml subPath: eva NLP-conf.yaml - name: eva NLP-nha-conf mountPath: /app/nha.yaml subPath: nha.yaml - name: dockersock mountPath: "/var/run/docker.sock"

To apply, use the following command:

kubectl apply -f deploy_saas.yaml

Wait at least 10 minutes for stabilization of the deployment

Setup EVA NLP endpoints

--- apiVersion: apps/v1 kind: Deployment metadata: name: eva NLP-endpoints namespace: eva NLP spec: selector: matchLabels: app: eva NLP-endpoints template: metadata: labels: app: eva NLP-endpoints spec: imagePullSecrets: - name: {registry_secret}

# edit the name of the k8s secret that holds your docker registry's credentials

serviceAccountName: eva NLP-account volumes: - name: eva NLP-conf configMap: name: eva NLP-conf - name: eva NLP-nha-conf configMap: name: eva NLP-nha-conf containers: - name: eva NLP-endpoints image: {docker_registry}/eva NLP-daemon:{daemon_tag}

# edit the docker registry used by the k8s cluster

imagePullPolicy: Always command: ["./sh/entrypoint.sh", "--component", "endpoints"] env: - name: DOCKER_TAG value: {engine_tag}

# edit the docker tag used for eva NLP-engine image

- name: K8S_API_SERVER value: Erro! A referência de hiperlink não é válida.

# edit the address to the k8s cluster's api server

volumeMounts: - name: eva NLP-conf mountPath: /app/eva NLP-conf.yaml subPath: eva NLP-conf.yaml - name: eva NLP-nha-conf mountPath: /app/nha.yaml subPath: nha.yaml ports: - containerPort: 9993

--- apiVersion: v1 kind: Service metadata: name: eva NLP-endpoints namespace: eva NLP spec: type: NodePort ports: - port: 9993 targetPort: 9993 name: "9993" selector: app: eva NLP-endpoints

To apply, use the command:

kubectl apply -f deploy_endpoints.yaml

Setup EVA NLP trainer daemon

--- apiVersion: apps/v1 kind: Deployment metadata: name: eva NLP-trainer namespace: eva NLP spec: selector: matchLabels: app: eva NLP-trainer template: metadata: labels: app: eva NLP-trainer spec: imagePullSecrets: - name: {registry_secret}

# edit the name of the k8s secret that holds the docker registry credentials

serviceAccountName: eva NLP-account volumes: - name: eva NLP-nfs nfs: server: {nfs_server}

# edit the nfs server ip address (same as in nfs.yaml)

path: /nfs - name: eva NLP-conf configMap: name: eva NLP-conf - name: eva NLP-nha-conf configMap: name: eva NLP-nha-conf - name: dockersock hostPath: path: /var/run/docker.sock containers: - name: eva NLP-trainer image: {docker_registry}/eva NLP-daemon:{daemon_version}

# edit the docker registry used by the k8s cluster

imagePullPolicy: Always command: ["./sh/entrypoint.sh", "--component", "trainer"] env: - name: DOCKER_TAG value: {engine_version>

# edit the docker tag used for eva NLP-engine image

- name: DOCKER_REGISTRY value: {docker_registry}

# edit the docker registry used by the k8s cluster

- name: K8S_API_SERVER value: Erro! A referência de hiperlink não é válida.

# edit the address to the k8s cluster's api server

volumeMounts: - name: eva NLP-nfs mountPath: /nfs - name: eva NLP-conf mountPath: /app/eva NLP-conf.yaml subPath: eva NLP-conf.yaml - name: eva NLP-nha-conf mountPath: /app/nha.yaml subPath: nha.yaml - name: dockersock mountPath: "/var/run/docker.sock"

To apply, use the command:

kubectl apply -f deploy_trainer.yaml

Setup Noronha Lightweight Storage (Cassandra)

EVA NLP uses Cassandra as database for models and other binaries as storage for fast model loading and inferencing.

File: cass-service.yaml

apiVersion: v1 kind: Service metadata: labels: app: cassandra name: cassandra namespace: eva NLP spec: ports: - port: 9042 selector: app: cassandra

To apply, use the command:

kubectl apply -f cass-service.yaml

File: cass-peer-service.yaml

apiVersion: v1 kind: Service metadata: labels: app: cassandra name: cassandra-peers namespace: eva NLP spec: clusterIP: None ports: - port: 9042 selector: app: cassandra

To apply, use the command:

kubectl apply -f cass-peer-service.yaml

File: cass-storageclass.yaml

kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: cass-hdd provisioner: kubernetes.io/gce-pd reclaimPolicy: Retain parameters: type: pd-standard

To apply, use the command:

kubectl apply -f cass-storageclass.yaml

File: cass-statefulset.yaml

Default values are provided, but can be modified as needed.

apiVersion: apps/v1 kind: StatefulSet metadata: name: cassandra namespace: eva NLP labels: app: cassandra spec: serviceName: cassandra-peers replicas: 3

# desired number of Cassandra pods. It can be edited, but cannot be lower than the replication factor used by Noronha. See nha.yaml

selector: matchLabels: app: cassandra template: metadata: labels: app: cassandra spec: terminationGracePeriodSeconds: 1800 containers: - name: cassandra image: gcr.io/google-samples/cassandra:v13 imagePullPolicy: Always ports: - containerPort: 7000 name: intra-node - containerPort: 7001 name: tls-intra-node - containerPort: 7199 name: jmx - containerPort: 9042 name: cql resources: limits: cpu: {number of cpus. The default is 1} # edit this if needed memory: {Gigabytes. The default is 4Gi} # edit this if needed requests: cpu: {number of cpus. The default is 1} # edit this if needed memory: {Gigabytes. The default is 4Gi} # edit this if needed securityContext: capabilities: add: - IPC_LOCK lifecycle: preStop: exec: command: - /bin/sh - -c - nodetool drain env: - name: MAX_HEAP_SIZE value: {default is 256M} # edit this if needed - name: HEAP_NEWSIZE value: 100M - name: CASSANDRA_SEEDS value: "cassandra-0.cassandra-peers.eva NLP.svc.cluster.local" - name: CASSANDRA_CLUSTER_NAME value: "Eva NLPCass" - name: CASSANDRA_DC value: "DC1-Eva NLPCass" - name: CASSANDRA_RACK value: "Rack1-Eva NLPCass" - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP readinessProbe: exec: command: - /bin/bash - -c - /ready-probe.sh initialDelaySeconds: 15 timeoutSeconds: 5 volumeMounts: - name: cassandra-data mountPath: /cassandra_data volumeClaimTemplates: - metadata: name: cassandra-data spec: accessModes: [ "ReadWriteOnce" ] storageClassName: cass-hdd resources: requests: storage: {gigabytes. The default is 30Gi} # edit this if needed

To apply, use the command:

kubectl apply -f cass-statefulset.yaml

5. EVA NLP - Errors

This is a guide on EVA NLP common errors.

5.1 EVA NLP Training

5.1.1 Submission

When submitting a new training, HTTP errors may occur:

Status code

Message

Caused by

Action

400

Intents in Eva NLP must have at least 5 examples to be trained.

One of the dataset intentions' has less than 5 examples

Enrich dataset

400

Bots in Eva NLP must have at least 5 intents to be trained.

Dataset has less than 5 intentions

Enrich dataset

400

The bot {bot_name} is being trained. Please wait.

Training already in progress

Wait for training to finish

400

The bot {bot_name} is waiting to be trained. Please wait.

Bot already in queue for training

Wait for training to finish

400

(What is the name of your bot?)

Missing bot name

Add name to request

400

(What language your bot speaks?)

Missing bot language

Add language to request

400

(What is the environment of your bot?)

Missing bot slot

Add slot (hml, rev, prd) to request

400

You have reached your bot limit. Delete a bot or increase your cap.

Bot limit reached

Contact EVA NLP

500

Any

Unexpected error

Contact EVA NLP

502

Any

Endpoint is unavailable

Contact EVA NLP

5.1.2 JSON

Contract valid for status codes 400 and 500:

Endpoint: /api/trainTextModel

{ "exception": "<message>" }

5.1.3 Ongoing

When there is a training in progress, only unexpected errors may occur. If this occurs, train again. If the error persists, contact Eva NLP team.

Trainings' now have their status code and message added to the endpoint /api/getStatus

Code

Message

0

Success

70

Failed to load training resources

70

Failed to prepare dataset (Training failed. Check your intents and entities, then try again)

70

Failed to generate domain vocabulary

70

Failed to save files that do not require training

70

Failed to train neural network

70

Failed to publish model version

70

Training failed. Please try again

5.1.4 JSON

Contract of endpoint /api/getStatus with training code and message.

Successful training example:

{ "dataset_version": 3, "model_metrics": { "acc": 0.7307692170143127, "loss": 0.4958965480327606 }, "ready": true, "training_code": 0, "training_message": "Succeeded", "training_progress": 1.0, "training_state": "finished" }

Failed training example:

{ "dataset_version": 2, "model_metrics": {}, "ready": false, "training_code": 70, "training_message": "Failed to load training resources", "training_progress": 0.55, "training_state": "failed" }

5.2 EVA NLP Inference

HTTP Errors might occur when sending a request to EVA NLP inference service.

Status code

Message

Caused by

Action

500

No ModelVersions found with query {'name': '<bot_name>', 'model': 'bot-engine'}

Requested bot doesn't exist

Train a bot with the desired name

500

Error while making an inference : <error_details>

Unexpected EVA NLP error

Try again. If it persists, contact EVA NLP

501

Any

Unexpected error

Try again in a few minutes. If it persists, contact EVA NLP

5.2.1 JSON

Contract for status code 500.

Endpoint: /predict?project=eva NLP&deploy=predictor&model_version=<bot_name>-<bot_slot>

{ "err":{ "error":"<error_type>", "message":"<error_message>" }, "metadata":{ "datetime":"(yyyy-MM-dd HH:mm:SS)" } }

6. eva NLP masking

eva NLP masking is the eva NLP module that hides its users sensible data by anonymizing different kinds of data that cannot be shared with third partner services, such as the cognitive engines Watson, Dialogflow e Luis.

To carry out the “masking” process, eva NLP masking bases itself in the identification of named entities, which is a subtask of the information extraction area that locates and classifies named entities in text documents. It scans a text looking for snippets that fits predefined categories such as persons, numbers, addresses, phone numbers, ID numbers, companies, places, etc.

The following figure shows how entities are extracted from a text. Each text snippet that represents a class is highlighted with a different color.

In the example, the entities found were:

  1. Entity: iPhone, Category: PRODUCT

  2. Entity: Apple, Category: ORG (Company)

  3. Entity: Boston, Category: GPE (Place)

  4. Entity: last Monday, Category: DATE

6.1 General characteristics

  • eva NLP masking identifies confidential content and replaces it with generic information.

  • It can be deployed in any cloud environment.

  • All modules are implemented in a DataOps architecture, with inference horizontal scale so the deploy process can be safe and fast.

  • Eva NLP masking works as an independent module from Eva NLP and it doesn’t need training.

  • It supports English, Spanish and Portuguese.

6.2 eva NLP Masking design

The following image shows a typical inference flow between an eva chatbot and the NLP agents with an active Eva NLP masking module.

  • The user sends a message to the chatbot.

  • The chatbot sends the input sentence to the Eva NLP masking API.

  • The Eva NLP masking module locates entities of interest using the entities extractor model.

  • The input sentence has its sensitive content replaced by its respective alias.

  • The masked sentence is sent back to eva and then eva sends it to a NLP agent.

6.3 Masking definition

Once the entities are found, the masking process is executed. This means that the text content from those entities are replaced by a fictitious text, preserving the original text semantics. So, if a “PERSON” entity type is found, such as John Kendrick, this text snippet will be replaced by an equivalent class text, or alias, such as John Smith. How this process occurs will be shown in the next session.

Currently, Eva NLP masking has three types of predefined entities for masking:

  • PERSON: entities representing real persons or fictitious characters.

  • GEO-LOCATION: entities representing cities, subnational divisions and countries.

  • ADDRESS: entities representing addresses, public spaces and tourist spots.

Eva NLP masking supports three languages: English, Spanish and Portuguese. It is necessary to feed aliases to each language and entity class.

The following aliases are presetted for English:

  • Ryan Smith – PERSON

  • Ohio – GEO-LOCATION

  • Connecticut Avenue – ADDRESS

The following aliases are presetted for Spanish:

  • Alejandro Ortega – PERSON

  • Salamanca – GEO-LOCATION

  • calle de las Hortensias – ADDRESS

The following aliases are presetted for Portuguese:

  • Elisa Santos – PERSON

  • Acre – GEO-LOCATION

  • avenida Independência – ADDRESS

6.4 Masking procedure

Here is shown how the masking works. The Eva NLP masking API is used to send a request in JSON format, containing in its body the text to be analyzed and its language.

When the request is received, Eva NLP masking executes the masking process and return an answer in JSON format.

In the Eva NLP returned answer, some fields of interest can be seen, such as entities, anonymized_text and original_text.

In entities there is a list of identified entities. For each found element there is the original text snippet (text), the type of entity (type) and the alias that replaced the original text.

The field original_text shows the original text sent in the request to Eva NLP masking API, which was “My name is John Kendrick and I live at 2080 Earnhardt Drive, Louisville”.

The anonymized_text field shows the text after the masking process. The text became “My name is Ryan Smith and I live at Connecticut Avenue, Ohio”.

6.5 System entities offered by the NLPs

In the tables below there is a comparison between system entities offered by Eva NLP masking, Watson, Dialogflow and Luis.

7. JSON contracts

The Eva NLP masking inference process JSON contracts are described below:

7.1 URL

http://eva NLP-nlp.eva.bot/predict?project-eva NLP&deploy=masking-hml

project: Eva NLP project name (internal metadata configuration, this value has to be set).

deploy: Eva NLP deploy name (internal metadata configuration, this value has to be set).

7.2 Header

{

“Content-Type”: “application/json”

}

7.3 Request body

{

“lang”: “en-us”,

“message”: “My name is John Kendrick and I live at 2080 Earnhrdt Drive, Louiville”

}

lang: the text language that the masking module will deal.

message: input message that will be masked.

7.4 Request response

{

“result”: {

“entities”: [

{

“text”: “John Kendrick”,

“entity”: “PERSON”,

“alias”: “Ryan Smith”

},

{

"text": "2080 Earnhardt Drive",

"entity": "ADDRESS",

"alias": "Connecticut Avenue"

},

{

"text": "Louisville",

"entity": "GEO-LOCATION",

"alias": "Ohio"

},

],

"anonymized_text": "My name is Ryan Smith and I live at Connecticut Avenue, Ohio",

"original_text": "My name is John Kendrick and I live at 2080 Earnhardt Drive, Louisville"

},

"metadata": {

"datetime": "2020-03-25 17:23:09",

"model_version": [

"bert:multi:2020-03-05 18:14:29"

]

}

}

result: object containing all identified and extracted entities.

text: text containing the original entity.

entities: entity list.

entity: entity’s class.

alias: defined alias for this entity class.

anonymized_text: anonymous input text with all aliases presetted for the found entities.

original_text: original input text.

metadata: Eva NLP metadata information.

model_version: list of models used for the inference service.

eva NLP can be installed in a Kubernetes cluster automatically using a Jenkins pipeline: . In theory, eva NLP can be installed in any cloud environment using this guide. This deploy was homologated in Google Cloud Platform.

To find your gce_proj, click on your project name and its ID will be at its right.

https://jenkins.eva.bot/job/eva NLP/job/eva NLP-deploy/
https://gitlab.eva.bot/asseteva/eva-eva NLP-k8s.git
Texto preto sobre fundo branco Descrição gerada automaticamente
Tela de computador com texto preto sobre fundo branco Descrição gerada automaticamente
Tela de celular com texto preto sobre fundo branco Descrição gerada automaticamente
Texto preto sobre fundo branco Descrição gerada automaticamente
Uma imagem contendo screenshot, texto Descrição gerada automaticamente
Tela de celular com publicação numa rede social Descrição gerada automaticamente
Texto preto sobre fundo branco Descrição gerada automaticamente
Texto preto sobre fundo branco Descrição gerada automaticamente
Tela de celular com publicação numa rede social Descrição gerada automaticamente
Tela de celular com texto preto sobre fundo branco
Tela de computador com texto preto sobre fundo branco Descrição gerada automaticamente
Tela de computador com texto preto sobre fundo branco Descrição gerada automaticamente
Tela de celular com texto preto sobre fundo branco

Descrição gerada automaticamente