Client
A large distributor of imported goods with a product range of hundreds of SKUs on the market.
The situation at the start of the project
Most brands in the client's portfolio had no ready digital data on product composition. The only source of information was the digital packaging artwork.
The challenge
Since 2024, there has been a requirement to upload product composition data to the national catalog. The digitization process was done manually, which significantly slowed the work down and led to errors.
Business Outcome
The composition digitization process became almost fully automated. Now it is enough to upload an image — the system does the rest: it extracts the text, finds the barcode, links the composition to a specific product, and builds structured data.
The finished files can be used immediately to upload into a PIM system or any other system holding product data, and then transferred to the national catalog. Manual work is reduced to verifying the result — this cuts time spent and lowers the risk of errors.
Comment from the KT.Team manager
"When we started, we had no comparable cases — this was the first project where we needed not just to recognize barcodes and text, but to precisely isolate the relevant areas on packaging, often positioned at an angle or in different projections. This required the team to deeply work through the approach and tune every stage of the pipeline.
We built the recognition service to fit the client's business. When new types of packaging come to market, the service can be extended to support them. As a result, our client is ready to expand its product range, while the team is free to focus on higher-priority tasks."
We'll curate materials for your task
We'll reply within 30 minutes and send relevant cases, diagrams, or analyses tailored to your context.
Business request at the start
Automate composition recognition across hundreds of packages to eliminate labor-intensive manual processing and speed up data preparation for upload to the national registry.
Development time
6 months from the first discussions and only 3 months from the start of development to launching a working system.
Result
The composition recognition service cut data processing time several times over: instead of 30 minutes per package, the system now automatically processes up to 10 images in 2 minutes. Recognition accuracy is 80–95%.
How it works
The service runs automatically on a schedule and goes through the following steps:
If any image could not be recognized, this is also recorded in the report — the user immediately sees which files require manual review.
The finished report can be uploaded into any master system — using the barcode, the data is correctly matched to the right product.
- checks the uploaded PDF files and finds the product composition block (the "Ingredients" section);
- recognizes the text and extracts the composition information;
- finds the barcode on the packaging to precisely link the composition to the product;
- builds a report in which each barcode is matched with its recognized composition.
Technology stack
The recognition service is built as a distributed solution using modern computer vision tools and architectural approaches:
Composition recognition and extraction use a multi-stage image processing pipeline that combines both classic computer vision algorithms and OCR tools:
- Go development language for high performance.
- Microservice architecture with asynchronous interaction through the RabbitMQ message queue: the service receives tasks, processes them, and returns the results back to the queue.
- Integration with SMB/FTP for uploading PDF documents.
- Two separate modules for working with files: one tracks new uploads, the other builds the final Excel reports.
- Conversion: converting PDFs into images.
- Contour detection: outlining object boundaries with OpenCV.
- Contour hierarchy: detecting nested areas (for example, on packaging).
- Clustering: grouping elements with DBSCAN to isolate text blocks.
- Filtering: selecting image regions that are highly likely to contain text and a barcode.
- Text recognition: using Tesseract OCR.
- Composition extraction: searching for keywords (for example, "ingredients") allowing for distortions, normalizing text, and, when needed, inverting colors and rotating the image.
- Barcode recognition: using the ZXing library or searching for a 13-digit sequence in the OCR results.
What happened next
The system is already running in production and helps the client process composition data quickly. To date, it has successfully recognized more than 500 packages. Improvements and updates are added as new products prepare to enter the market.
To enable automatic data transfer to the national catalog, the PIM system includes a tool that generates Excel templates tailored to each HS code (TN VED). The forms are filled automatically with data from PIM, accounting for transformations and mapping to reference tables. The user only needs to download the finished file and upload it to their national catalog account — but that is another story.


