Case study: how to optimize paper processing using machine learning

8.5.2020
Case study: how to optimize paper processing using machine learning

Content

Using the example of developing an intelligent system for digitizing, OCR, classifying and reconciling accounting documents, we will tell you how kt.team solves complex integration problems.

1. “Paper transfer” as a separate business process

2. Objective: automate document validation

3. A solution for a difficult task

4. One product for the entire process

5. Not just logistics

5 минут

“Paper transfer” as a separate business process

One of kt.team's clients, a large logistics company, sends thousands of shipments every day. Each shipment is accompanied by a framework agreement, a bill of lading, a bill of lading (abbreviated. Waybill), invoices, passes for forwarding drivers, acceptance and transfer certificates... The package of documents — usually multi-page ones — for shipments is quite large.

Most of these documents are received in paper form: a forwarding driver brings them and submits them to a special accounting department for validation.

In a very simplified form, the business process for processing incoming paper documents is shown in Figure 1.

Figure 1 — Business process for processing incoming paper documents

Documents are manually checked for completeness, the presence of mandatory attributes, and the correctness of filling in. After scanning, the accounting officer manually enters the contract number to which the document should be linked into the Admin Tool, and manually “links” the individual pages of the document together. Verified documents are sent to the accounting program (BluJay) and electronic storage (Magellan).

In fact, dozens of employees are engaged in “shifting papers” every day, albeit for targeted purposes.

Thousands of man-hours are spent on mechanical work every month. The processing speed of each next batch of documents inevitably decreases — even experienced employees get blurred. Mistakes caused by human factors are inevitable.

Objective: automate document validation

The client has long wanted to automate routine document management tasks in order to use employees' time more efficiently. But he did not find a ready-made information product or a set of products on the market that would be seamlessly integrated into the company's existing business process and information infrastructure. All existing boxed products could take one or two steps from a complex business process, which would not solve the problem.

After analyzing the situation, we suggested that the client develop an individual solution for the process of validating and digitizing paper documents.

Our team has studied the business process and identified stages that can be automated:

  • scanning;
  • determining the type of document;
  • determining the completeness of the document;
  • preparation of a package of documents;
  • integration with an accounting program;
  • integration with an electronic document repository.

Our decision was not to write a separate product for each of the stages, but to combine them into a single product — ICDocs (short for intelligent compiler of documents).

A solution for a difficult task

The most difficult stage in terms of automation was determining the type of document.

To implement these tasks, we tested several hypotheses.

The first hypothesis is working with images. We planned to teach the neural network a specific set of images that correspond to document forms. By comparing a scan of a specific document with images stored in memory, the neural network had to determine the type of document and the counterparty specified in it.

Practice has shown that this is a bad approach. There is no single generally accepted form for a number of documents (for example, invoices). The number of fields, the relative position of elements, and the filling in mandatory fields differ. Even long-term training, which requires large system resources, would not provide an acceptable result, and defining each document would take longer than manual processing.

Such a solution would not be appropriate from the point of view of the client's business.

So we decided to work with text instead of images. Regardless of the form accepted by the counterparty, the bill of lading must contain the text name of the document, the number of the waybill, shipment and contract, and other textual information that allows it to be processed correctly.

ICDocs implements random forest machine learning and vector analysis of word placement by metric to determine document types.

This approach has proved to be more effective. Analyzing the availability of the “right” words, we were able to approach 78% at the start. ICDocs independently recognized the type of documents — the operator only had to confirm the results. Thus, a single product was created for the entire process.

One product for the entire process

In addition to recognizing the type of documents in ICDocs, we have implemented other functions.

1. Digitization of paper documents

Prior to the introduction of ICDocs, documents were scanned manually or semi-automatically. The operator started the scan himself, picked up the resulting files from the scanner program and processed them. We've written a scanner driver that scans a batch of documents and sends scanned images to ICDocs. The operator only inserts paper documents into the scanner.

2. Verification of filling

After determining the type of document, ICDocs verifies that it is filled in correctly: whether the required fields are filled in and whether the information contained in them complies with the standard. To perform this function correctly, the system must learn, so the first time the documents are verified with the participation of an operator. Confirming or not confirming the correctness of filling in fields and documents in general, he teaches ICDocs to recognize the “right” documents and send incorrect ones for revision.

ICDocs looks at the key fields and characteristics of the document (Figure 2).

Figure 2 — Document fields and characteristics verified by ICDocs

3. Sort documents by counterparties and packages

To sort documents by counterparties and packages, ICDocs has been integrated with the standard BluJay accounting program. The system requests valid contract and shipment numbers, compares them with data from the package of documents and “links” the package of documents to the relevant counterparty and contract.

4. Checking the completeness of document packages

The completeness of document packages is checked within ICDocs. The system checks the availability of the required documents from the list and checks the nominal and actual number of pages in the documents. If the package does not contain a document or a document contains pages, ICDocs informs the operator about the problem.

5. Data export

Upon completion of the verification of a package of documents, ICDocs automatically exports data to the company's information systems: accounting, CRM, and document archives. When transferred, the documents are already collected in a package and are associated with the counterparty, the contract and the shipment.

6. Backup

In addition to the regular storage, ICDocs saves backup copies of documents. Thus, all processing of document packages — from digitization to storage — is implemented in one product (Figure 3).

Figure 3 — Automated processing of document batches using ICDocs

Not just logistics

The ICDocs functionality is applicable not only in logistics, but also in other areas. Here are some examples of cases in which it will be useful.

  • The organization receives and processes large amounts of paper documents every day. It is necessary to promptly check incoming packages of documents for correctness, sort them, transfer them to work and save them to the archive.
  • Several agreements have been concluded with each of the partners. Each contract varies working conditions, for example, different delivery workers, payment methods or approaches, and you need to correlate each application or transaction with the corresponding agreement.
  • Partners operate through several legal entities with different organizational forms, OKVED codes and separate agreements for each legal entity. The package of documents must be checked for completeness, compliance with the required contract number and the correctness of filling in the information. The legal entity, signature, and legal data must comply with those specified in the agreement.

At the same time, ICDocs, strictly speaking, is not a universal product. The process of processing paper documentation in each company is structured differently; different positions and information systems are used. The first step in integrating ICDocs will always be to study the relevant business process and adapt the product to the customer's tasks and infrastructure.

Table of contents
Другие статьи

Смотреть все

A tool that will help keep your business young

11/8/2021

Подробнее

Global experience in migrating projects to Magento 2

26/9/2019

Подробнее

Why are techies against data buses: middleware, ESB, message brokers?

10/2/2022

Подробнее

Смотреть все

We use cookies to provide the best site experience

Ok