handwritten

A new layer to search

Unlocking the universe of handwriting

documents

Search any handwriting

Today all media formats are searchable except handwriting. We are opening up this very last frontier of searchability.

content-island

Connecting content islands

What determines a document’s value is not only the message it conveys but also its relation to information contained in other resources. Unlocked handwriting will bring new context to the data stored in institutions around the globe.

The Technology

Our Handwritten Text Recognition (HTR) technology converts any handwritten text into a machine-readable format. HTR is a game changer, converting illegible handwriting into words and letters that are recognizable and able to be processed by computer programs.

A New Approach

laboratory

The crucial asset of digital content is its searchability: the ability to find relevant words and information in an entire document. This is also what distinguishes SearchInk from existing HTR solutions. Instead of focusing on text transcription, we focus on search results, which allow us to reach an unprecedented word recognition rate.

Learning How To Read

ai

We believe that understanding a document as a whole is a crucial part of the HTR process. Therefore, the core of our technology is teaching the algorithm how to understand and analyze documents the way a person would, thus allowing it to find relevant content quickly and accurately. Our aim is not just plain word extraction; we are teaching the machine how to read.

How we do it

Our technology relies on artificial intelligence and machine learning algorithms (including Computer Vision and Deep Neural Networks, among others) that complement each other and make our set of algorithms a unique Handwritten Text Recognition (HTR) solution.

Computer Vision

computer-vision

Computer Vision is the task of designing algorithms aimed at understanding the contents of an input image and extracting its relevant information. Using the latest cutting-edge Computer Vision algorithms, we are able to automatically detect the type and structure of the document being processed, recognize its handwritten text, understand its content, and extract the semantic information that it conveys.

Deep Neural Networks

neural-network

Our second stage of development relies on Deep Learning. These algorithms are able to model higher-level abstractions of the processed data through a combination of linear and nonlinear transformations, which are automatically learned and inferred by the system. At the initial stage, our neural network architecture is supported by the transcribed and labeled datasets and learns how to adjust the weights within the network so that it can assess the semantic meaning of various handwritten words. By using cutting-edge technology, there is no need for human experts to explicitly set a list of rules by hand anymore.

Constant Self-Improvement

In the later stages of development, we plan to transition to completely unsupervised learning. This will involve exposing the network to a huge number of documents, but without explicitly telling it the correct transcription. Instead, the network will independently learn to recognize features and to cluster similar text snippets, thus revealing hidden groups, links, or patterns within the data.

Learn More

Reading process

Creating Ground-Truth

In order to validate our technology and provide a baseline for the learning algorithms, we label a representative sample of different types of documents. By providing such golden standard we are able to provide a generic and scalable solution that is able to perform equally well when dealing with heterogeneous collections having different writing styles, languages, layouts, etc.

Layout Analysis

SearchInk automatically recognizes both the physical and logical layout structure of the different processed documents. It is able to process tabular patterns identifying headers, columns and cells as well as segmenting handwritten text lines and individual words for further processing.

Word Spotting and Semantic Analysis

SearchInk is able to detect recurrent text patterns across collections with independence of their explicit text transcription. Such recurrent text blocks are analyzed and their physical location is leveraged in order to detect anchor words that can be used for semantic analysis of the different records.

Partners

In order to create a unique, groundbreaking HTR tool, we are cooperating with the Pattern Recognition and Human Language Technology (PRHLT) research center of the UPV Universitat Politècnica de València and the Computer Vision Center (CVC) in Barcelona, two internationally recognized facilities in the fields of Hidden Markov Models’ application and layout analysis. Each stage of development of SearchInk technology is closely supervised not only by internal machine learning specialists but also by our academic partners.

upv-logopahltcvc-logo