Ontology Everywhere!

Demystifying another buzzword

Jun 30, 2026

Ontologies have been around since the 80/90s. In the 00’s, the academic semantic web crowd was planning and building them while the rest of us were still figuring out data warehouses and “big data”.

Then Palantir went public and suddenly “the Ontology” popped up along their mysterious product at least in my bubble. With the LLM boom, the concept showed up in different shapes: Inmon started writing about ontologies on Substack, Microsoft Fabric added ontology as a first-class workload in late 2025, Databricks shipped “Genie Ontology” in June 2026, and Google introduced its OKF knowledge structuring approach.

Source: https://trends.google.com/explore?q=ontology&date=all&geo=Worldwide

Structuring information is becoming a competitive advantage.
So I dug in to understand what ontologies are and what they aren’t.

What an ontology actually is

The canonical definition comes from Tom Gruber in 1993:
“a specification of a conceptualization.” That’s accurate but not useful day-to-day. Jessica Talisman, who has spent 25-plus years building these systems at Amazon and Adobe, expands it: an ontology is a formal, explicit specification of a shared conceptualization - an agreement about what things exist in a domain, how they relate, and what we call them.

Three parts matter:

Formal - written in a language with explicit rules that a machine can reason over
Explicit - documented, not implied. No one has to guess what “Customer” means
Shared - agreed upon by a community. An ontology one person builds in a silo is a model, not an ontology

The “formal language” part usually meant OWL (Web Ontology Language) or RDFS (RDF Schema) - standards from the W3C. But don’t worry:
No need to learn them to understand what ontologies do.

The theoretical differentiator from simpler structures - controlled vocabularies, taxonomies, thesauri - is that ontologies support inference. What most enterprise ontology tools actually do today, though, is typed-edge traversal.

A procurement ontology says Suppliers have Contracts, Contracts have SLAs, SLAs have Penalty Clauses. When a shipment arrives late, the system traces Shipment → Contract → SLA → Penalty Clause and flags the financial impact without a developer writing that specific join. That’s useful, but it’s pre-wired traversal, not logical reasoning. True formal inference would derive something never explicitly linked: if actual_delivery > sla_deadline, this shipment is in breach. This is where LLMs might come into play to add that capability, while “grounding” them in bespoke knowledge and company context.

Then why does it look like an ERD?

Open Palantir’s Foundry, Microsoft Fabric’s ontology preview, or Protégé (the Stanford ontology editor), and you’ll see the same thing: boxes, lines, relationships, cardinalities. The similarity is real. A conceptual data model and an ontology both describe entities and their relationships. The difference is what they’re for.

Inmon draws a clean line: data models are internal-facing, ontologies are external-facing. It’s a useful historical framing - ontologies were built for the semantic web, for sharing knowledge across organizations. But virtually every enterprise ontology product today is built to describe your business, not some universal vocabulary. Palantir models enterprise decisions, Databricks extracts meaning from your warehouse, Fabric binds to your OneLake. The ontology boom is really about making your internal semantics explicit enough that AI agents can reason over them without a developer hand-holding every query.

The version of “external” that survives just means precise enough for a non-human consumer. Talisman’s advice to build on FIBO (Financial Industry Business Ontology) or Schema.org rather than inventing your own terms from scratch is about not reinventing core concepts when standard definitions already exist.

The practical difference between a data model and an ontology shows up in three places:

Relationship richness - a data model has 1:1, 1:N, N:M. An ontology has typed relationships - part-whole, causation, temporal - that carry meaning beyond cardinality.
Inference - an ontology encodes rules a machine can apply (all Customers are Legal Entities, therefore the system can flag any Customer missing a Tax ID). This is constraint propagation rather than full logical reasoning, but it’s automatic enforcement data models don’t provide.

Don’t confuse this with object-oriented class hierarchies. OOP classes exist for code reuse - a Customer class inherits methods. Ontology classes exist for describing what things are and how they relate - a Customer inherits that it must have a Tax ID, that it can sign a Contract. (The Ontology Development 101 guide from Stanford makes this exact distinction.)

Language - ontologies use formal languages like OWL with machine-readable semantics rather than diagrams.

In practice, the line blurs - especially as Modern Data Stack semantic layer tooling (dbt, Cube, AtScale) normalizes defining entities, metrics, and relationships in YAML. These tools solve measurement consistency: ensuring every dashboard calculates “monthly recurring revenue” the same way. Ontologies solve meaning: representing that an Active Customer is a subclass of Customer with a relationship to at least one Active Subscription. DataHub frames this cleanly - measurement vs meaning, different problems. Jérémy Ravenel puts it more accessibly: a semantic layer is the menu (simplifying choices for the customer), an ontology is the recipe book (structured steps the kitchen follows). If your conceptual data model has entities with rich descriptions, typed relationships, and constraints in YAML, you’re most of the way to an ontology. A well-specified semantic layer sits in the same neighborhood: closer to a lightweight ontology than most people realize. Also knowledge graphs are related: they store relationships as explicit edges so you can traverse without pre-defining paths, but they don’t add formal inference rules.

What about knowledge graphs?

Sometimes you’ll hear “ontology” and “knowledge graph” used interchangeably. They’re related, but the distinction matters. The simplest analogy: an ontology is a database’s table schemas. A knowledge graph is the records of those tables.

If the ontology says “Customers sign Contracts,” the knowledge graph says “Acme Corp signed Contract #4815 on June 1, 2026.” You can have an ontology without a knowledge graph (just the schema). You can’t have a knowledge graph without at least some schema - otherwise it’s just unstructured nodes and edges in a graph database.

Talisman’s Ontology Pipeline puts a finer point on it: the ontology is stage five (formal rule bases), the knowledge graph is stage six (synthesis layer). But the ontology is more than just a schema for a graph. It’s a formal specification that can be implemented in different ways:

Formal/standards-based (OWL + reasoners like Pellet, HermiT; rule engines like Drools, CLIPS) - the academic path dating back to the semantic web era. Write formal markup, let a reasoner infer new facts. Not what most enterprises run in production, but the purest expression of the idea.
Simplified forms - taxonomies and controlled vocabularies (SKOS, ISO 25964) without inference. Good for organizing terminology and a practical middle ground between nothing and a full ontology.
Graph databases (Neo4j, Amazon Neptune) - entities as nodes, relationships as edges. Supports traversal but not automatic inference without a rules layer.
Platform-native ontologies - entity types bound to your operational data. Fabric handles graph, querying, and governance with NL2Ontology. Databricks Genie Ontology auto-extracts from tables and queries using PageRank-like authority weighting. Palantir goes furthest: the ontology is the architectural core with actions, write-backs, and its own security model.

One more axis: does the ontology read, or does it also write?

Most ontology implementations are read-only - you query or reason over them. OWL reasoners infer facts you look up. Graph databases let you traverse relationships. Fabric’s ontology layer is built for asking business questions. Databricks’ Genie Ontology is a context store for agent answers. None of these write back to your operational systems.

As said before, Palantir is the exception. Their Ontology models “decisions” - pairing data objects with actions as first-class citizens with their own permissions. A purchase order approval isn’t just something the ontology helps you reason about; it’s a typed, governed object in the ontology itself, with a defined path to writing back into operational systems in real time. This matters for the debate below because the cost of a modeling error depends on whether the system can act on it — a wrong answer is a lot easier to undo than a wrong purchase order.

The big fight: curated vs automated

The sharpest contradiction in my research was about who builds the ontologies.

Inmon and Talisman are unambiguous: ontology building is “a human activity, not a mechanical one.” It requires thousands of decisions about what matters, what to include, what to leave out. Concepts in the ontology “can be found, counted and analyzed” while those absent “will pass through unrecognized.” The ontology builder is making choices about what matters - a responsibility, not a configuration parameter.

Talisman’s Ontology Pipeline captures this: it progresses from controlled vocabulary → metadata → taxonomy → thesaurus → ontology → knowledge graph, gated by SHACL validation at each stage. Every step requires human judgment - there’s no automation shortcut for deciding how many levels your taxonomy needs. It’s also standards-heavy (ISO 25964, SKOS-XL, SHACL). This is a mature practice with 25-plus years of methodology behind it.

Then there’s Databricks. The Genie Ontology is automatic: it extracts knowledge from your data estate using PageRank-like authority weighting, with - based on my current understanding - no humans making decisions. It claims 84.5% first-attempt accuracy against general coding agents’ 52.4%.
Besides my natural skepticism against any self-reported vendor benchmark which tell you what marketing wants you to know, my bigger skepticism is around: The risk is that it learns from how people already query their data, not from what the data actually means - pattern-matching on existing usage, not reasoning about business meaning. It’s the difference between a city that creates rules from observing how people drive, versus one where people drive based on rules.

Microsoft Fabric sits in the middle: you define entity types manually, then the ontology graph is built automatically from your OneLake bindings.

Palantir, unsurprisingly, leans curated - their ontology is built on a four-fold integration of data, logic, action, and security. Curation is doubly important when the ontology executes actions rather than just answering questions.

This tension can also be explained by two business models talking past each other. Databricks needs self-serve automation because their margin depends on land-and-expand without consulting overhead. Palantir’s margin depends on forward-deployed engineering and high-touch curation. Talisman has spent 25-plus years building these systems - of course ontology building is “a human activity, not a mechanical one.”

The practical answer is probably tiered regardless of vendor: start with automation to get the skeleton right, then curate the high-stakes concepts manually.

Hot take: don’t get stuck on the definition

A lot of people will read this and say “no one agrees on what ontology means, so how do I build one?” The skeptical reading deserves a fair hearing - prestige terminology gets diluted by marketing (we’ve seen this with “platform,” “data mesh,” “AI-native”). But the number of serious, independent players converging on ontology from different starting points means the market is in motion.

Don’t get stuck on finding the way to get started. Thirty years ago, “data warehouse” meant Inmon’s corporate information factory or Kimball’s dimensional star schema, and people argued endlessly about which was correct. Today we call both of them data warehousing and mostly don’t fight about it. The same arc is happening with ontologies - we’re in the messy middle, not the end state.

The specific definition matters less than the principle: as AI agents become primary consumers of our data, we need a layer that explicitly describes what things mean, how they relate, and what rules govern them. Whether you call that an ontology, a semantic layer, a knowledge graph, or a digital twin is less important than building towards it.

Structure your knowledge. Make it accessible. The market will sort out the labels.

p.s. If you’re wondering where to start, the Stanford Protégé guide (7-step methodology with competency questions) and Jessica Talisman’s Ontology Pipeline (controlled vocabulary to knowledge graph) are both free. One is academic, one is practitioner. Pick your lane.

Hands-On Data

Discussion about this post

Ready for more?