How do you solve a Document Processing Bottleneck?

How do you solve a Document Processing Bottleneck?

Document processing is usually the lifeblood of an enterprise workflow. Whether they’re physical, digital, standardised, structured, or unstructured, there will be bottlenecks that emerge. These can occur around manual document processing, scanning document classification and data extraction. In a way, the data is being constrained by the very medium which contains it. Every interaction with a document is a point of friction which increases processing time and creates an opportunity for error. The stats around how much this is affecting productivity are truly dire. New developments in Machine Learning, however, offer ways to rectify some of these problems.


  1. What is the Document Disconnect?
  2. The Document Processing Bottleneck Challenge
  3. Can AI Perform Document Processing?

pile of documents in need of automatic ai document processing

What is the Document Disconnect?

Adobe sponsored research looked into the operations of over 1,500 top performing companies and came up with some dire statistics about how documents are affecting organisational productivity.

  • 46% of business leaders surveyed directly attributed problems in planning and forecasting due to poor document processes.
  • It was identified that poor document processes made staff devote more that a third of their work time on administrative issues rather than their main productive duties.
  • Half of those surveyed said that they have had productivity related problems caused by documents simply going missing and not being able to be found.
  • An astounding 77% of business leaders claimed that issues around automation and poor implementation of policies to do with document processing was directly negatively affecting customer experience.

The main issue causing these problems was identified as something called "document disconnect". Several factors contribute to this including siloing of systems, problems during document handover and an overall lack of visibility and lineage tracking around documents.

Expensive Software and Digitisation Aren't Always a Solution

An interesting point brought up in the report is that often the organisations were using high quality, top-of-the-line enterprise applications, tools and management systems. The issues that were arising were during the interaction of one system with another, where integration was improvised or handled on the fly using ad-hoc solutions. Even if one of these solutions worked initially, it would lack documentation and would typically degrade quickly over time.

Another white paper identified that over 60% of finance departments specifically pointed out that cumbersome manual data entry and slow document processes as being significant workflow impediments. This led to invoices being processed outside of SLA in an astounding 76% of the time. Another observation of the report was that a majority of departments were already using electronic document management systems but were still reporting problems. This shows that digitisation will not necessarily remove document gridlock. What is required is an effective automation and classification system.

The Document Processing Bottleneck Challenge

Although the exact number is contested, the amount enterprise data which is unstructured is estimated to be somewhere between 40% - 80% and is expected to go up over time.

This category of data presents the greatest challenge for document classification and automation, since its context and purpose has conventionally only been defined by metadata tags added via manual human processing. This has been the conventional option for handling unstructured data offered by Content Management Systems and, as such, is often the horizon of many enterprises understanding of solutions.

Variations on metadata tagging exist, such as the folksonomy approach, which opens up tagging to anyone within the organisation who is using the documents. This creates informal, non-hierarchical taxonomies which are also subjective and inconsistent, additionally wasting the time of multiple document users.

However, the approach of building up a rich metadata layer around a document has a potential beneficial side-effect. Such an approach to manually processing documents to aid in categorisation and discovery can potentially be setting them up for easier processing by AI/Machine Learning tools.

a filing cabinet full of documents that have been processed and cetegorised

Can AI Perform Document Processing?

Advances in the capability as well as accessibility of machine learning models have heralded a transformation in automated processing of unstructured documents. These AI Document Processing systems can eliminate a significant part of the manual processing and categorisation involved in the document pipeline while offering accuracy that is equivalent to or better than that of humans. Lack of structure is not a problem, since machine learning algorithms are in their element when processing messy and fuzzy data. It doesn’t matter what form a set of business documents takes. No matter how uniquely chaotic they are, if they are consistent, a machine learning document classification algorithm will manage. When given a suitable training dataset, its categories will be clarified.

Managing Documents as a Step Towards Good Data Governance

The fact that unstructured and irregular data is less of an issue for automated systems goes some way to reducing the "document disconnect" mentioned above. When less human input is needed in categorising and classifying there are fewer delays in the document pipeline and less opportunity for human error when performing these very tedious tasks. 

As already mentioned, implementation of software and document digitisation are not solutions in themselves. The aim should ultimately be to establish good organisational data governance which involves a 360 degree assessment of an organisation's tools, resources, processes and most importantly, people. This assessment should look critically at how data is being produced, how it is consumed and whether it is fit for purpose. 

Machine Learning for Document Tagging

Solutions involving document tagging while preserving the original format of the document while building a rich layer of metadata around it are perfect fodder for machine learning. The process of tagging a document is functionally similar to that of building up a machine learning model. Feeding the tagged document metadata into a tool like Azure Machine Learning studio can create a model which can then be used to process and automatically categorise future documents.

With the application of machine learning algorithms to enterprise document tagging and classification, the final step in removing the friction and bottlenecks around manual document processing could be within reach for even modestly sized businesses.

To find out more about how AI is transforming document pipelines, watch the free BizData AI For Documents Webinar hosted by one of our Directors - Maurice Bernardo: