Nevod entity recognition is 30 times faster than Microsoft Recognizers Text cover image

Ivan Shimko - May 12, 2021

Nevod entity recognition is 30 times faster than Microsoft Recognizers Text

Nevod pattern-based entity recognition with zero coding is 30 times faster than Microsoft Recognizers Text, while having 98% compatibility according to unit tests.

Microsoft Recognizers Text

Microsoft Recognizers Text is a library for recognition and resolution of text entities like numbers, units, date/time, etc. The library is based on regular expressions and hand-coded algorithmic postprocessing. It is able to recognize date/time, numbers, URLs, emails, phone numbers and more. For date/time entities, it provides resolution - conversion of matched phrases in input text into programming language “date-time” object. Microsoft Recognizers Text is used in such services as LUIS: Language Understanding Intelligent Service, Microsoft Bot Framework and Text Analytics Cognitive Service.

While having good accuracy and multiple languages support, Microsoft Recognizers Text has poor performance on text articles with more than 100-200 words, for example in our test set of 20 articles with the total size of 230k characters it takes 0.5 second to process a single file in a single language (English) and recognize common entities in it (see Nevod section for comparison details). The reason of such performance is that Microsoft Recognizers Text uses sets of regular expressions which are sequentially applied to input text. In some cases, results produced by regular expressions are then processed by other regular expressions in combination with algorithmic postprocessing. This pipeline acts as a black box and, while producing high quality results, it is difficult to use it as part of a custom solutions due to poor performance and lack of tuning.

Nevod

We have taken test cases specs, provided in Microsoft Recognizers Text library, to develop Nevod patterns that combine the results of Microsoft Recognizers Text with the advantage of Nevod declarative patterns (that require no program coding) and high performance. Nevod pattern recognition was compared solely to text extraction functionality of Microsoft Recognizers Text. The resolution of entities (conversion of text fragments into programming language objects) was excluded from comparison.

In total, 98% pass rate was achieved for the following entities: date, time, date and time, URL, email, IP address, phone number, GUID, #hashtag and @mention:

Performance was measured against test set of 20 news articles with the total size of 230k characters:

You can try Nevod patterns for recognition of text entities yourself in the Nevod playground or via running the free negrep utility on your local computer.

Nevod entity recognition is 30 times faster than Microsoft Recognizers Text

Microsoft Recognizers Text

Nevod

Зачем нужен язык программирования с русской лексикой?

Nevod is now open-source

Nevod entity recognition is 30 times faster than Microsoft Recognizers Text

Nevod is now patented

Negrep is a free Nevod based grep

Nezaboodka™ is now an official trademark

Using Nevod for Categorization of Job Posts

The Single Biggest Problem in Big Data Analytics

Basic Big Data Terminology

The Future of Big Data Analytics is an Excel-Like Spreadsheet

Reactronic - The Concept of Transactionally-Reactive State Management

Nevod Technology is in the Top of Hacker News at Y Combinator!

Modern database should natively support Transactionally-Reactive Programming

Using Nevod for Text Analytics in Healthcare

Nevod: Technology for Pattern-Based Natural Language Processing

Nezaboodka has become a resident of Belarus High-Tech Park

Nezaboodka has applied for an NLP patent

Nezaboodka DBMS is now patented