Documents are usually the lifeblood of an enterprise workflow. Whether they’re physical, digital, standardised, structured, or unstructured, there will still inevitably be bottlenecks that emerge around manual document processing, document classification and data extraction. In a way, the data is being constrained by the very medium which contains it. Every interaction with a document is a point of friction which increases processing time and creates an opportunity for error. So, how bad is the problem really and what are the options for overcoming the bottleneck?
The Surprising Research
Adobe sponsored research quantified the exact amount of productivity that document friction costs. The study identified that almost $20k per worker, per year was being lost purely in document related handling and processing issues. They also highlighted the fact that many of the collaboration tools which were implemented in order to reduce downtime in data-sharing situations, failed to specifically alleviate problems that were created by document form and medium. Specifically, the issues identified in the report are around inefficient processes and lack of enthusiasm about implementing automation.
Additionally, a white paper published by IThound identified that over 60% of finance departments specifically pointed out that cumbersome manual data entry and slow document processes as being significant workflow impediments. This led to invoices being processed outside of SLA in an astounding 76% of the time. Another observation of the report was that a majority of departments were already using electronic document management systems but were still reporting problems. This shows that digitisation will not necessarily remove document gridlock. What is required is an effective automation and classification system.
The Document Bottleneck Challenge
Although the exact number is contested,
the amount enterprise data which is unstructured is estimated to be somewhere between 40% - 80% and is expected to go up over time.
This category of data presents the greatest challenge for document classification and automation, since its context and purpose has conventionally only been defined by metadata tags added via manual human processing. This has been the conventional option for handling unstructured data offered by Content Management Systems and, as such, is often the horizon of many enterprises understanding of solutions.
Variations on metadata tagging exist, such as the folksonomy approach, which opens up tagging to anyone within the organisation who is using the documents. This creates informal, non-hierarchical taxonomies which are also subjective and inconsistent, additionally wasting the time of multiple document users.
The Impact of AI on Automatic Document Processing
Advances in the capability as well as accessibility of machine learning models have heralded a transformation in automated processing of unstructured documents. These systems can eliminate a significant part of the manual processing and categorisation involved in the document pipeline while offering accuracy that is equivalent to or better than that of humans. Lack of structure is not a problem, since machine learning algorithms are in their element when processing messy and fuzzy data. It doesn’t matter what form a set of business documents takes. No matter how uniquely chaotic they are, if they are consistent, a machine learning document classification algorithm will manage. When given a suitable training dataset, its categories will be clarified.
With the application of machine learning algorithms to enterprise document classification, the final step in removing the friction and bottlenecks around manual document processing could be within reach for even modestly sized businesses.
To find out more about how AI is transforming document pipelines, tune in for the BizData AI For Documents Webinar:
Also, click here to read the follow-up to this blog in which we go under the hood to look into the types of machine learning classification algorithms and how they function in relation to automatic document tagging.