About the Article
8.5.2020 Using the example of an intelligent system for digitization, OCR, classification, and reconciliation of accounting documents, we show how kt.team solves complex integration challenges. Reading time: 5 min.
How document processing automation with Icdocs helps speed up document workflows, improve data accuracy, and reduce operating costs
8.5.2020 Using the example of an intelligent system for digitization, OCR, classification, and reconciliation of accounting documents, we show how kt.team solves complex integration challenges. Reading time: 5 min.
One kt.team client, a large logistics company, ships thousands of shipments every day.
Each shipment is accompanied by a master agreement, goods invoice, goods and transport waybill (TTN), invoices, passes for driver-couriers, handover-acceptance acts...
The document package for shipments is usually quite large and often multi-page.
Most of these documents arrive on paper: the driver-courier brings them in and hands them over for validation to a dedicated accounting unit.
Documents are checked manually for completeness, required attributes, and correctness of completion.
After scanning, the accounting staff manually enters in Admin Tool the contract number that the document should be linked to, and manually 'links' the individual pages of the document together.
The verified documents are sent to the accounting system (BluJay) and the electronic repository (Magellan).
In practice, dozens of employees spend their days on target-driven 'paper shuffling'.
Thousands of person-hours are spent on mechanical work every month.
The processing speed for each subsequent document package inevitably decreases, even experienced employees start to lose focus.
Errors caused by human factors are inevitable.
The client had long wanted to automate routine document tasks so employee time could be used more efficiently.
But it could not find an off-the-shelf software product or product suite on the market that would fit smoothly into the company's existing business process and IT infrastructure.
All existing off-the-shelf products could handle only one or two steps of a complex business process, which would not solve the problem.
After analyzing the situation, we proposed that the client develop a custom solution for the validation and digitization of paper documents.
Our team studied the business process and identified the stages that could be automated: scanning, determining the document type, checking document completeness, assembling the document package, integration with the accounting system, and integration with the electronic document repository.
Our solution was not to build a separate product for each stage, but to combine them into a single product, iCdocs (short for intelligent compiler of documents).
We'll reply within 30 minutes and send relevant cases, diagrams, or analyses tailored to your context.
The most difficult stage to automate was determining the document type.
To implement these tasks, we tested several hypotheses.
The first hypothesis was to work with images.
We planned to train a neural network on a specific set of patterns that correspond to document forms.
By comparing a scan of a specific document with reference patterns stored in memory, the neural network was supposed to determine the document type and the counterparty named in it.
Practice showed that this was a poor approach.
For many documents, such as waybills, there is no single widely accepted format.
The number of fields, the relative placement of elements, and the completion of required fields differ.
Even long training that required significant system resources would not deliver an acceptable result, and identifying each document would take longer than manual processing.
Such a solution would not have been cost-effective from the customer's business perspective.
So instead of images, we decided to work with text.
Regardless of the format used by the counterparty, the goods and transport waybill always contains the document title, the TTN number, shipment and contract numbers, and other text information that makes correct processing possible. iCdocs uses random forest machine learning and vector analysis of word positions by metric to determine document types.
This approach proved to be more effective.
By analyzing the presence of the "right" words, we were able to reach 78% from the start; iCdocs identified the document type on its own, and the operator only had to confirm the result.
In addition to document type recognition in iCdocs, we implemented other functions. 1.
Before iCdocs was implemented, document scanning was manual or semi-automated.
The operator started the scan manually, then manually retrieved the resulting files from the scanning software and processed them.
We wrote a scanner driver that starts scanning a document package and sends the scanned images to iCdocs.
The operator only places the paper documents in the scanner. 2.
After determining the document type, iCdocs checks whether it is filled out correctly: whether the required fields are completed and whether the information they contain matches the standard.
To perform this function correctly, the system must be trained, so document completion is verified with operator involvement at first.
By confirming or rejecting the correctness of the fields and documents overall, it trains iCdocs to recognize the correct documents and send incorrect ones back for revision.
Sorting documents by counterparties and packages
To sort documents by counterparty and package, iCdocs was integrated with the standard BluJay accounting system.
The system retrieves the current contract and shipment numbers, compares them with the data in the document package, and links the document package to the corresponding counterparty and contract. 4.
Checking document package completeness
Document package completeness is checked within iCdocs.
The system checks that the required documents are present in the list and compares the stated and actual page counts in the documents.
If a package is missing a document or a document is missing pages, iCdocs notifies the operator of the issue. 5.
After the document package is verified, iCdocs automatically exports the data to the company's information systems: accounting, CRM, and document archives.
When transferred, the documents are already assembled into a package and associated with the counterparty, contract, and shipment. 6.
Backup In addition to the standard repository, iCdocs stores backup copies of documents.
Thus, the entire processing of document packages, from digitization to handoff for storage, is implemented in a single product.
iCdocs functionality can be applied not only in logistics but also in other industries.
Let's look at a few cases where it will be useful.
The organization receives and processes a large number of paper documents every day.
Incoming document packages must be checked promptly for correctness, sorted, routed for processing, and archived. Several contracts are in place with each partner. The terms of each contract vary, for example different carriers, payment methods, or payment approaches, and each request or transaction must be matched to the relevant contract.
Partners operate through several legal entities with different organizational forms, OKVED codes, and separate contracts for each entity.
The document package must be checked for completeness, matching contract number, and accurate field completion. The legal entity, signature, and legal details must match those specified in the contract.
At the same time, iCdocs is not, strictly speaking, a universal product.
The process of handling paper documents is organized differently in each company, with different roles and information systems involved.
The first step in integrating iCdocs is always to study the relevant business process and adapt the product to the customer's needs and infrastructure.
Contacts
Leave your current contact details and describe your task. We will come back with clarifying questions and a proposal for the next step.