Using Nevod for Categorization of Job Posts cover image
Ivan Shimko - Oct 22, 2019

Using Nevod for Categorization of Job Posts

There are millions of vacancies posted on job boards daily. Effective processing of vacancies requires automatic and real-time categorization of job posts. Nowadays, job boards use NLP and hyping ML techniques to categorize vacancies collected from multiple sources.

As per our analysis that was performed over a data set of 1.5 million records collected from open sources, the quality of categorization is not good enough, both from accuracy and performance perspective.

Let us take nurse vacancies as an example. The first noticeable thing in categorization done by job boards is that all the management and HR jobs very often are mixed up.

Here is how categorization fine tuning can be done with Nevod. Let’s start with a simple Nevod pattern:

#Target = {"nurse", "nursing"};

The pattern Target here defines a variation of two strings: “nurse” and “nursing”.

Let’s update the pattern a bit to extract not only the keyword but also the whole headline this keyword belongs to:

#Target = Headline @having {"nurse", "nursing"};
Headline = Start ... {End, LineBreak};

Now the pattern Target references the pattern Headline representing the first line and limits it to have any of the keywords we have defined. The pattern Headline defines a sequence of tokens starting at the beginning of text and ending with an end of text or a line break.

In addition to nurse vacancies it will also match such posts as “Executive Nurse Recruiter”, “Director of Nursing Services”, “Chief Nursing Officer” etc.

Although these posts more likely fall into “Human Resources” and “Management/Executive”, there is no easy and reliable way to “prohibit” ML-driven classifiers to include such vacancies, albeit it is so obvious for humans.

Let’s see how easy it is to exclude “Executive Nurse Recruiter” and “Director of Nursing Services” from the result with the help of the Nevod pattern language:

#Target = (Headline @having Nursing) @outside Management;
Headline = Start ... {End, LineBreak};
Nursing = {"nurse", "nursing"};
Management = {"director", "recruiter"};

Now the pattern Target will match only those headlines, which contain nursing keywords, but do not contain management position keywords, defined as the separate variation of strings named Management.

As you can see, Nevod patterns make all rules of classification explicit and easy to write, read and maintain. For example, in order to treat “Chief Nursing Officer” as an additional exception for nursing, all what is needed is to add an item to the variation:

Management = {"director", "recruiter", CXO};
CXO = "chief" .. [0-3] .. "officer";

The pattern CXO is the sequence of “chief” and “officer” having no more than 3 words in between.

The ability of such a fine tuning allows reaching 95% accuracy with Nevod versus common 70% with AI/ML-based approaches. Accuracy is achieved without any compromise on the speed: 1.5 million of records can be processed in 20 seconds on a single CPU core, which gives a sub-millisecond speed that is about 0.13 milliseconds per one job post. Performance doesn’t degrade regardless the number of categories and categorization rules.

You can test patterns from the article yourself in the Nevod Playground.

Copyright (С) 2011-2022
Nezaboodka Software LLC. All rights reserved.