/What is OCR (Optical Character Recognition) and what does it mean?

Learn what OCR (Optical Character Recognition) is, how it works, and its practical applications in modern technology. Read more!

Marta Morrás

Global Marketing Director

Article
December 8, 2022

Optical Character Recognition (OCR) is a technology that converts scanned documents or images into editable text. This technology has revolutionized how information is processed and stored, making it easier and more efficient to digitize and organize data. OCR technology improves the accuracy and speed of data entry, reducing the risk of errors and saving time and resources.

What are the origins of OCR?

The history of this technology dates back to the late 19th century when several inventors and researchers began exploring the idea of using machines to recognize and interpret written text. One of the earliest attempts at OCR was by David Harris, who patented an “electric pen” in 1892. This device was designed to trace over the written text and convert it into a digital format that a machine could read.

In the early 20th century, other inventors and researchers began to develop their OCR systems. These included a man named Emanuel Goldberg, who developed the machine that could read and interpret handwritten text, and another inventor named George Davis, who created a system that could recognize typed text.

Despite these early efforts, Optical Character Recognition technology became widespread in the 1950s, when the first commercially successful OCR systems were developed. These systems had limited capabilities, but they marked the beginning of the general use of this technology in various industries.

Today, Optical Character Recognition is used in many applications, such as document scanning, digitizing, and automatic data entry. It has become an essential tool for companies and organizations looking to quickly and accurately process large amounts of written information.

How does OCR work?

OCR technology works by analyzing an image’s pixels and identifying its individual characters. To locate text in a document, it often looks for horizontal or vertical lines of text.

It then breaks the text into unique characters and uses pattern recognition algorithms to compare the visual characteristics of the characters with those in the database. Once recognized, these characters are converted into a machine-readable format, such as text or a searchable PDF.

What is it used for?

In identity document verification, OCR is a critical technology that allows multiple information checks to be performed to validate the authenticity of documents. This technology can accurately recognize text in a wide range of fonts, sizes, and styles, which is essential when verifying documents from different countries and languages.

The Veridas document verification engine can extract all the information written on an ID document and examine different security measures, such as correlating the visual data with that contained in the MRZ or machine-readable zone found on the back of many documents.

Thanks to this information, it is also possible to read the person’s date of birth and detect minors in specific registration processes where this may be an impediment. The expiration or issue date of the document is also read automatically and can function as adjustable barriers to entry for companies using these technologies.

In addition to document processing, OCR technology has other everyday uses, such as reading text from images or photographs, vehicle license plate recognition, or even transcribing handwritten notes.

Overall, OCR is a valuable technology that has dramatically improved the efficiency and accuracy of data processing. As technology evolves, we expect to see more applications and advances.

What are the types of OCR?

There are several different types of OCR (optical character recognition) technology designed to recognize other characters or documents. Some common types of OCR include:

Handwritten: recognizes and interprets the handwritten text.
Printed: recognizes and interprets printed text on a page.
Structured: recognizes and interprets text arranged in a specific format, such as a table or form.
Scene text: recognizes and interprets text that appears in an image or video of a scene.
Industrial: recognizes and interprets text appearing on industrial documents, such as labels or barcodes.

Each type of OCR technology uses different algorithms and techniques to recognize and interpret text, and some may be more effective than others depending on the specific use case.

What are the benefits of OCR?

Increased productivity and efficiency: OCR technology allows you to quickly and accurately convert scanned documents into editable text, reducing the time and effort required for manual data entry.
Increased data accuracy: OCR technology uses advanced algorithms and machine learning techniques to accurately extract text from scanned documents, reducing the possibility of errors and ensuring data integrity.
Increased searchability and organization: OCR technology allows scanned documents to be searched and retrieved quickly and efficiently, enabling more efficient document management and organization.
Increased accessibility: OCR technology enables the creation of accessible digital versions of scanned documents, making them easier to access and use by people with disabilities.
Improved collaboration and sharing: OCR technology facilitates sharing and collaboration on scanned documents, enabling teams to work more efficiently and effectively.

Why is OCR important?

OCR technology makes it possible to extract all the information present in an identity document and digitize it instantly. In this way, in a registration process, your users don’t need to fill in their personal information manually as it is auto-completed by scanning the document automatically.

This speeds up the registration process and avoids possible typing errors that users could make by filling in the information themselves. Unlike other technologies available in the market that only extract the information contained in the MRZ, the Veridas OCR API can read all the data present in the document, from the name and surname to the address.

At Veridas, we have a wide document coverage that allows us to verify identity documents from more than 190 countries.

Optical character recognition software

Veridas OCR engine is 100% proprietary and has been specially trained to read all the OCR fields of identity documents. Once the capture is made, the client can obtain all the visible information of the captured document, both on the front and the back. This engine allows to obtain accuracy greater than 99% in most fields.

Veridas can perform OCR reading in any language, and in Latin, Arabic, Chinese and Cyrillic alphabets. It also reads special characters. The special characters are listed below:

ÁàâäĂåąæéèèëęėíìîïįóòôöúùûüūųßçċċćœďłļňñņŕșņŕșțťšțťýÿë.

Veridas reads all fields printed in the document. In particular, the following fields are read.

Best Optical character recognition software

Spain

The solution reads the DNI number, support number, name, surname (together and separately), date of birth, expiration date, sex, nationality, CAN, municipality of birth, province of birth, address, municipality of domicile and province address and name of parents.

Mexico

The solution reads the name, surname, CURP, Voter Code, Credential Identification Code (CIC), FUAR, issue number, section, OCR number, date of birth, expiration date, date of issue, state of birth, state of address, municipality of address, district of address, address, gender and nationality.

USA

The solution reads the name, surname, identification number, document number, date of birth, expiration date, date of issue, address, gender, eye color, weight, height, the type of permit, restrictions and annotations.

Italy

The solution reads the name, surname, identification number, document number, Codice Fiscale, date of birth, expiration date, date of issue, place of issue, address, gender, height, the CAN, the nationality, the country of issue, the birth certificate code and the name of the parents.