eva 3.3.0.0 NLP
1. Introduction
h1
h2
h3
A bot, or virtual assistant, is a software that manages message exchanges between a person and a computer, simulating human conversation. This simulation becomes closer to reality by using artificial intelligence, represented by natural language processing and machine learning techniques that enables a computer to identify the conversational context.
Natural Language Processing is a sub-area of artificial intelligence and linguistics that studies computer capabilities in generating and comprehending natural human language. Its main techniques are information extraction, resume creation, orthographic correction, text standardization, morphologic, syntactic and semantic analysis, automatic translation, meaning interpretation and feeling analysis.
Machine learning is the skill to make computers learn without explicitly programming them. In a bot, it is the skill of getting a correct answer to a question that wasn`t programmed. The most used machine learning algorithms in bots are the ones of supervised learning. Their objective is, after training, to find a mathematical function able to map entries to correct exits. An example would be a facial recognition system that, after training using multiple pictures of a person, can recognize that person and differentiate from other people if pictures from other persons were fed during training.
A conversation between a person and a bot works based in a scenario flowchart. After a message is sent by a user, the bot should be able to answer it properly, leading it or not to a dialog. As an example, a user can say “hello” to a bot, that would answer “hi, what can I do for you?”. If the user wants to check an account, the bot must ask account data to fulfill its task. Here, the bot was able to understand the user message and redirected the user to the pre-defined flow.
To understand how a bot work, there are three central concepts: intent, entity and dialog.
Intent: objective of a sentence. A computer will map the correct exit through an intent. This means that what the user expects to happen will be mapped from an input sentence. For example, for the sentence “I want to deposit $500 in my account”, the intent would be “deposit”.
Entity: text snippets that complement the intent meaning. They can be people names, places, numbers, data, products and services that are part of the client business context. In the sentence “I want to deposit $500 in my account”, “500” would be a money entity representing the value to be deposited. Entities are useful because they make the user message more specific and can disambiguate sentences that can be attributed to more than one intents.
Dialog: the synthesis of intents and entities. With those information it is possible to build a conversational flow and train it.
After receiving a message from a user, the bot activates its cognitive engine, which is responsible for the preliminary processing, entity extraction and intent classification. The preliminary processing of a message aims to make it the most formal possible. Here the orthographic corrections are made, the shortened words are extended and the lower and upper cases are identified. After that, machine learning is applied to identify and classify intents and natural language processing finds and extracts entities.
With a defined intent and the entities found by its cognitive engine, the bot can send the correct message to the user and redirect him to the next flow scenario.
2. Cognitive Language by everis - Eva NLP
One of eva NLP features is its scalability, as the solution Works over the Noronha platform, a DataOps framework also developed by everis. Noronha manages the necessary computational resources for eva NLP safe operation by increasing or decreasing them based on requisition demand, allowing that the solution works as SaaS (Software as a Service).
2.1 eva NLP workings
A bot needs a cognitive engine to understand human language. This engine must learn through training to be able to infer.
eva NLP training is performed when the bot is created in the cockpit, eva’s web interface. When creating a bot, the editor must supply a dataset with at least 5 (five) intents, with 5 (five) examples of sentences each. Those data must be part of the client business universe and shouldn`t contain sentences that might generate ambiguity when classifying intents.
The editor can also create custom entities that can be of two types: pattern and synonym.
The synonym entities are words that the client wants eva NLP to identify mandatorily as an entity. The world “happiness”, for example, can be an emotion type entity. Pattern entities are defined by regular expressions, or the language where the expected textual pattern can be found during the analysis so the data don’t have to be specified, but its format pattern. A regular pattern recognized as an email would be:
([\w\-\.]+)@(([\w+.-]*)+)([a-zA-Z]{2,4})
There is a third type of entity, the system entity, which is pre-defined in eva NLP. Currently, eva NLP supports person and company names, addresses, countries, subnational divisions, numbers, monetary values and dates.
Bot training is done in a specific language, as it is necessary to have a Word2Vec base in the same language of the intent dataset. During the training, the Word2Vec base has the numeric representation of its words changed so they suit the client data, in a process known as extension. Words of the intent set that didn’t exist in the standard Word2Vec are stored in a separate file called domain_vocab. A LSTM neural network is used so an intent predictive model can be generated.
The inference method is performed by converting the sentence that will be analyzed to Word2Vec and then sending the same sentence in a numeric vector format so the classifier can indicate the intent that the sentence belongs.
2.2 eva NLP architecture
To orchestrate containers in a cloud environment, Noronha framework uses the Kubernetes system. eva NLP training is performed in a container that communicates with Cassandra database container. Inside Cassandra, the assets generated by training, such as intents classifying model, domain vocabulary, custom entities and bot ID are stored.
Cassandra is part of eva NLP architecture because the bot assets created in the training must be present in the cluster memory so the NLP can predict intents and extract entities. However, it isn’t interesting to keep them in memory indefinitely because when a bot is idle, its assets will be preserved only in local memory. When the bot is active again, the prediction system detect which bot resources are not in memory and requests Cassandra to retrieve them from the database and store them in memory again. It is also possible to check training status through the access of one of eva NLP`s endpoints.
The image below shows eva NLP’s inference process architecture:
Last updated