Skip to content

Why a knowledge graph

Atlas holds every fact it knows about SAP — every object, every relationship, every piece of evidence — in a knowledge graph, stored in Keystone. That was a choice with live alternatives. This page explains why Atlas picked the shape it did, by looking at how the same question feels in each option.

A typical Atlas lookup sounds something like this: find every custom program, CDS view, or service in this system that uses a deprecated BAPI — and for each result, tell me which evidence supports the claim and how confident Atlas is in it. Three stores can answer that question, but they feel different doing it.

One question, three stores Relational (Postgres) Recursive CTE over hand-maintained link tables. Every new edge type is a schema migration. WITH RECURSIVE deps AS ( SELECT ... FROM object_ref UNION ALL SELECT ... JOIN deps ) SELECT * FROM deps; Works. Slow to evolve. Vector store Top-k nearest neighbours by embedding similarity. Returns text that resembles the question, not the callers. query = embed( "uses BAPI_PO_CREATE1") results = ann.search(k=20) Wrong relevance function. Keystone (RDF) One typed traversal; new edge types need no schema migration. SELECT ?caller WHERE { ?caller atlas:calls sap:BAPI_PO_CREATE1 . } Each result comes with its evidence tier attached. Fits the problem.
Same question — "what calls this deprecated BAPI?" — across three stores.

In a relational database, the answer is a recursive join over link tables the team maintains by hand. It works. It slows down the first time Atlas needs a new kind of edge — a new annotation family, a new stability tier — because each new edge type is a schema migration that has to land before the join can be written. Over a year of evolving SAP conventions, the migrations become the dominant cost.

In a vector store, the answer is the top twenty rows most similar to the question in embedding space. That is the wrong relevance function. Atlas’s claims are structured facts about named objects — the call site either exists or it does not. Similarity is an excellent tool for ranking prose; it is a poor tool for enumerating callers.

In a knowledge graph, the answer is one SPARQL traversal. New edge types are new predicates, so they need no schema migration. Every triple carries its provenance natively as another triple — Atlas never has to wedge an evidence trail into a schema that was not designed to carry one.

Atlas picks the graph because the graph is the shape of the problem.

Three jobs, all central to how Atlas plans.

Entity resolution. Different sources call the same object by different names. BAPI_PO_CREATE1 appears in the Simplification Item Catalog under one label, in api.sap.com under a canonical URL, in an OData service under a technical name, and in a customer extract under a local Z-alias. The graph resolves all of them to one identity and every later query sees one object.

Four sources, one identity SIMPLIFICATION ITEM CATALOG BAPI_PO_CREATE1 API.SAP.COM Purchase Order Create API ODATA SERVICE CATALOG PurchaseOrder_POST YOUR SYSTEM EXTRACT ZPROG_PO_WRAPPER → … alias canonical URL binds to uses CANONICAL GRAPH NODE atlas:entity/BAPI_PO_CREATE1 4 sourcedFrom triples · 1 identity status: deprecated (SI-4321) Every downstream query sees one node. The four names Atlas read are kept as provenance. This is the entity-resolution job the graph was chosen to do.
Entity resolution — four sources collapse into one identity, provenance preserved.

Traversal for planning. The plan graph Atlas builds for a task is itself a query over the knowledge graph: give me the ordered set of nodes required to build this view, given these evidence constraints. That query is a natural fit for SPARQL, which is why Atlas writes it there rather than translating to recursive SQL that would fight the engine every time the shape of the graph grows.

Evidence provenance. Every assertion Atlas writes carries a sourcedFrom triple with a version and a timestamp. You can always reconstruct why Atlas said what it said, and when a source refreshes overnight, every claim downstream of that source is automatically re-scored.

Hot transactional writes still go through Postgres in the DJED platform layer. Free-text search over raw documents still goes through the crawler’s OpenSearch index. Vector similarity for fuzzy natural- language matching still goes through a separate embeddings service. The graph holds resolved facts, not raw text or live transactional state.

SPARQL is harder than SQL for most engineers, and nothing changes that. Atlas offsets the learning curve two ways: every reference-worthy query ships as a tested example in the Keystone reference, and the typed client that the Go services and the UI use covers the common lookups without anyone writing raw SPARQL. You can fall through to raw queries when you need to; most of the time you will not have to.