Why a knowledge graph

Atlas holds every fact it knows about SAP — every object, every relationship, every piece of evidence — in a knowledge graph, stored in Keystone. That was a choice with live alternatives. This page explains why Atlas picked the shape it did, by looking at how the same question feels in each option.

The question that shapes the store

A typical Atlas lookup sounds something like this: find every custom program, CDS view, or service in this system that uses a deprecated BAPI — and for each result, tell me which evidence supports the claim and how confident Atlas is in it. Three stores can answer that question, but they feel different doing it.

Same question — "what calls this deprecated BAPI?" — across three stores.

In a relational database, the answer is a recursive join over link tables the team maintains by hand. It works. It slows down the first time Atlas needs a new kind of edge — a new annotation family, a new stability tier — because each new edge type is a schema migration that has to land before the join can be written. Over a year of evolving SAP conventions, the migrations become the dominant cost.

In a vector store, the answer is the top twenty rows most similar to the question in embedding space. That is the wrong relevance function. Atlas’s claims are structured facts about named objects — the call site either exists or it does not. Similarity is an excellent tool for ranking prose; it is a poor tool for enumerating callers.

In a knowledge graph, the answer is one SPARQL traversal. New edge types are new predicates, so they need no schema migration. Every triple carries its provenance natively as another triple — Atlas never has to wedge an evidence trail into a schema that was not designed to carry one.

Atlas picks the graph because the graph is the shape of the problem.

What Atlas uses the graph for

Three jobs, all central to how Atlas plans.

Entity resolution. Different sources call the same object by different names. BAPI_PO_CREATE1 appears in the Simplification Item Catalog under one label, in api.sap.com under a canonical URL, in an OData service under a technical name, and in a customer extract under a local Z-alias. The graph resolves all of them to one identity and every later query sees one object.

Entity resolution — four sources collapse into one identity, provenance preserved.

Traversal for planning. The plan graph Atlas builds for a task is itself a query over the knowledge graph: give me the ordered set of nodes required to build this view, given these evidence constraints. That query is a natural fit for SPARQL, which is why Atlas writes it there rather than translating to recursive SQL that would fight the engine every time the shape of the graph grows.

Evidence provenance. Every assertion Atlas writes carries a sourcedFrom triple with a version and a timestamp. You can always reconstruct why Atlas said what it said, and when a source refreshes overnight, every claim downstream of that source is automatically re-scored.

What the graph is not for

Hot transactional writes still go through Postgres in the DJED platform layer. Free-text search over raw documents still goes through the crawler’s OpenSearch index. Vector similarity for fuzzy natural- language matching still goes through a separate embeddings service. The graph holds resolved facts, not raw text or live transactional state.

The cost Atlas pays

SPARQL is harder than SQL for most engineers, and nothing changes that. Atlas offsets the learning curve two ways: every reference-worthy query ships as a tested example in the Keystone reference, and the typed client that the Go services and the UI use covers the common lookups without anyone writing raw SPARQL. You can fall through to raw queries when you need to; most of the time you will not have to.