Unstructured Data Is A Digital Transformation Disabler 


Even as the drum beats of digitalisation reverberate across all sectors, legacy companies riddled with manual processes and unstructured data struggle to transform – despite their investment of time, effort, and money.

Of course, a robust strategy, a true north star, strong leadership, and an innovative culture are other essential ingredients for success. Still, companies that don’t have a solid plan for processing unstructured data within their business processes will find it difficult to leapfrog and achieve the full potential of digital transformation.

McKinsey’s report Fueling Digital Operations with Analog Data states, Despite operating in an increasingly digital world, many businesses still use paper in their processes and need to extract information and insights from these documents. Companies that find a better way to do so can improve performance and capture value over the near and long term.

As McKinsey points out, the problem surrounding unstructured documents, data, and companies drowning in paper are not new. Although trillions have been spent on digitisation, paper and other unstructured documents remain relevant and valuable. Even at COVID’s peak – the most recent year of available data – an estimated 2.8 trillion pages were printed in 2020. Analogue data is, and will inevitably continue to be, produced and will likely remain a core part of the operating model of virtually all organisations. This is a problem of massive magnitude, and the positive implications of successfully solving this problem are enormous.

Companies that solve the unstructured data challenge will reduce the friction in their business processes and eliminate redundancies and data replication from multiple manual entries. Yet, many companies struggle with this challenge. From outsourcing labour to manually inputting data and using more advanced techniques like optical character recognition (OCR) to extensive use of forms to collect data and pass the onus of data entry to customers, companies have tried many avenues – often with limited success.

McKinsey rightly postulates, Today, advances in artificial intelligence (AI) – especially machine learning (ML) applications such as optical character recognition – enable more efficient data processing, including faster and more accurate information retrieval from paper documents. The combined application of classification, extraction, and other algorithms – known collectively as “intelligent document processing” (IDP) – erases the boundary between the analogue and digital worlds. The important distinction IDP offers is that the technology not only digitises analogue documents (by scanning them into digital format) but also allows computers to understand the data in documents.

Indeed, techniques such as enhanced OCR, the usage of computer vision, and deep learning methods are advancing the art of natural language processing (NLP) and text analytics. However, even as they deliver marginal value, general-purpose unstructured data processing solutions often miss the mark.

The reason is the volume, variety, variability, and complexity of the data and documents and the domain context of the words and phrases pose significant barriers to machine intelligence. Without domain awareness, graphs, and language models, the interpretation of specific terms and their relative importance and implications are often lost in unstructured data processing. Furthermore, digitising newer sets of data is not enough. For example, in an insurance industry context, actuaries and underwriters need to use data going back years to adjust for recency biases.

Intelligent document processing using NLP algorithms is not just about cost savings. It is a foundational technology where the benefits of having digitised business data available for upstream and downstream applications can result in exponential benefits. This is an effective way to ensure the enterprise’s democratisation of information and knowledge.

Further, McKinsey states, Beyond productivity gains, companies can greatly enhance the effectiveness of operations and decision making. IDP is at the top of the funnel for taking advantage of emerging automation technologies that rely on digitised, structured data to function – such as RPA. When enterprise data is ingested through an IDP solution, it becomes useful for many applications – machine learning modules, predictive analytics, data clustering, and AI-enabled cognitive agents – that can yield substantial competitive advantages in an organization’s tech-enabled operating model.”

While we agree with McKinsey’s report and rationale for IDP and the value thereof, we do find some points of divergence. For example, we don’t believe deploying IDP requires significant human workforce investments, provided that companies choose the right solution(s) and are ready to execute AI projects.

At nROAD, we successfully configured and deployed a complex financial document processing solution in production for a large investment bank. In just eight weeks, with limited involvement from the client, we achieved significant results and numerical accuracy.

It’s not surprising that the digitisation of analogue data and documents is at the top of nearly every CXO’s agenda, across industries and sectors, particularly in regulated industries where data is often the lifeblood of an enterprise. However, as technology evolves and domain-centric solutions emerge, the mammoth task of addressing the ever-growing volume of data could become more manageable, finally allowing businesses to stop drowning in paper and take a much-needed breath.

If you liked reading this, you might like our other stories
How Data in Motion is Gearing Up in Sports
Improving Cloud Infrastructure Through RANs