Resume Parsing

Resume Parsing is conversion of a free-form resume document into a structured set of information suitable for storage, reporting, and manipulation by software. Resume parsing helps recruiters to efficiently manage electronic resume documents sent electronically.

The most common CV/Resume format is MS Word. Despite being easy for humans to read and understand, is quite difficult for a computer to interpret. Unlike our brains which gain or disseminate context through understanding the situation along with taking into consideration the words around it, to a computer a resume is just a long sequence of letters, numbers and punctuation. A CV parser is a program that can analyze a document, and extract from it the elements of what the writer actually meant to say. In the case of a CV the information is all about skills, work experience, education, contact details and achievements.

Natural Language Processing

Natural language processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language. NLP draws from many disciplines, including computer science and computational linguistics, in its pursuit to fill the gap between human communication and computer understanding.

While natural language processing isn’t a new science, the technology is rapidly advancing thanks to an increased interest in human-to-machine communications, plus an availability of big data, powerful computing and enhanced algorithms.

As a human, you may speak and write in English, Spanish or Chinese. But a computer’s native language – known as machine code or machine language – is largely incomprehensible to most people. At your device’s lowest levels, communication occurs not with words but through millions of zeros and ones that produce logical actions.


Suppose you wanted to digitize a magazine article or a printed contract. You could spend hours retyping and then correcting misprints. Or you could convert all the required materials into digital format in seconds using an Optical Character Recognition software.

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example from a television broadcast)