AI tools
RAG — a corporate knowledge base you can trust
Employees spend up to 19–30% of the workday searching for information, while an LLM with no access to your data makes answers up. The KT RAG stack.
RAG is not a “chat with PDF” — it is a loosely coupled layer of corporate memory: retrieval, vector storage, chunking and reranking over your wikis and policies, where every answer is grounded in a verifiable source and decoupled from any specific model vendor.
AI layer
An AI assistant must take action, not just reply with text
Core thesis of the AI block: a pilot with measurable impact, private data under control, agent actions logged, quality passes evals before scaling.
Assistant ≠ chatbot
A chatbot answers; an assistant checks the regulations, queries systems, records the deviation and proposes the next step.
Control plane
Agent registry, owner, permissions, memory, evals, trace logs, kill-switch and budget at the enterprise-layer level.
Data
RAG returns an answer with a source citation; LLM Gateway obfuscates personal data before the model and restores it after the response.
Industry solutions
What you can build with RAG
Capabilities
RAG capabilities
Retrieval over corporate sources
The model answers from your documents, wiki and databases, not from internet "memory" — an employee gets the answer in seconds instead of hours of search across the 19–30% of the day that gets lost.
Grounding and source citations
Every answer shows which document it came from — answers are verifiable, and hallucinations are cut off at the architecture level, not by coaxing the model.
Vector store (pgvector / Qdrant)
Semantic search over millions of chunks: finds the answer by meaning, not by exact word match. pgvector when the data is already in Postgres, Qdrant when you need high-load search with filters.
Chunking and content preparation
Documents are split into meaningful chunks with metadata — the model gets "less but more precise" context, which directly raises relevance and lowers query cost.
Reranking (cross-encoder)
The second stage reorders candidates by real relevance: recall@10 rises from 74% to 89%, answer accuracy by 33–40% in ~120 ms. High ROI at minimal latency.
LLM-wiki — a layer of verified answers
An add-on to RAG: on top of the stack we maintain a vetted corporate wiki, and for critical questions the system returns a pre-verified answer — cutting hallucinations even further.
RAG for support and employees
40–50% of routine requests are resolved automatically with a source in the answer; the internal assistant cuts regulation lookup time from minutes to seconds.
A loosely coupled, detachable stack
Storage, retrieval and model are decoupled: swap the LLM or vector DB without rewriting everything. The solution moves easily between teams and contractors — no vendor lock-in.
Quality evaluation and anti-hallucination
precision@K, provenance coverage and hallucination rate metrics are built into the pipeline — answer quality is measured, not declared, and does not silently degrade after changes.
Approach
How we implement RAG
Minimal core modification
We don't fork or patch the RAG core. RAG stays on the standard upgradable version — business logic goes into separate microservices alongside it, so platform updates don't break your customizations.
International Standards, Not Homegrown Hacks
Where a mature international solution exists, we use it instead of inventing our own protocol or platform. Before writing code, we study how the problem is already solved in the industry.
Transferability
The solution is loosely coupled and documented: it can be handed over between teams and contractors without rewriting. You are not tied to us.
AI compatibility
RAG in the AI stack
Grounding for any LLM
The RAG layer feeds verified context into the model (GPT, Claude, open-source) — grounding answers in your data no matter which LLM you use today or switch to tomorrow.
Integration with MCP / the context layer
We connect the corporate knowledge base to agents via MCP as a standard source: RAG owns "what we know", MCP owns "how the agent fetches it". Both layers are detachable and reusable.
Operating behind the LLM & Security Gateway
Retrieval and model calls pass through a gateway: model routing, budgets, observability and PII obfuscation before sending — corporate knowledge does not leak out.
Foundation for AI agents
Agents that serve users and enter data rely on RAG as the source of truth — turning a "chatty" assistant into a tool that answers from company facts.
Integration with the Sloy platform
RAG/LLM-wiki embeds into Sloy as a corporate-memory layer for enterprise agent management: a single knowledge store, grounding and provenance shared across multiple agents and scenarios.
Projects
Cases
AI ingredient recognition by barcode
- Processing sped up from 30 minutes to 2 per batch of 10 images
- Composition recognition accuracy is 80–95%
OSNO-VA: AI accountant
- Rag: implementation and integration.
AI analytics of the real estate market
- Rag: implementation and integration.
Contacts
Let's Discuss Your Project
Leave your current contact details and describe your task. We will come back with clarifying questions and a proposal for the next step.


