What Ontology Actually Means for Analytics

How We Think About Ontologies at Codd AI

When we built Codd AI, we had to answer a question that sounds simple but turns out to be surprisingly hard: what does an AI actually need to know about your business before it can answer questions about your data reliably?

The obvious answer is metrics. Define revenue, define churn, define utilisation rate. Give the AI the formulas and it can compute the numbers. This is what every BI semantic layer does, and it is genuinely useful. But it is not sufficient. A formula tells the AI how to compute a number. It does not tell the AI what that number means, how the entities it touches relate to each other, or why a query about "floating charges" should resolve to the blanket_collateral table rather than the collateral table.

What the AI actually needs, before it touches a single metric, is a structured understanding of the business domain: the concepts that exist, how they relate, and what each piece of data is for. In building Codd AI, we found ourselves building exactly that layer. We called it the Concept Model, and in building it we found ourselves navigating one of the most confused terminological landscapes in the data industry: the world of ontologies.

This post is our attempt to make sense of that landscape. To explain what ontologies actually are in the computer science sense, why the market has turned the term into noise, and why a practical, analytics-focused variant of ontological thinking is the right foundation for AI analytics. We will be direct about where Codd AI sits in this picture, and honest about what we are and are not building.

What an Ontology Actually Is

The word ontology comes from philosophy, from the Greek ontos (being) and logos (study). In philosophy it means the study of what exists. In computer science, Tom Gruber defined it in 1993 as "a formal, explicit specification of a shared conceptualization." Tim Berners-Lee and colleagues then brought the concept to a wider audience through their 2001 Scientific American article on the Semantic Web, proposing ontologies as the foundation for machine-readable knowledge on the internet.

The key word is formal. A spreadsheet of business definitions is not an ontology. A data dictionary is not an ontology. Even a well-labelled ER diagram is not an ontology. What makes something an ontology, in the strict sense, is that a machine can reason over it, drawing new conclusions from existing facts without being explicitly programmed to do so.

The W3C built a stack of standards to make this possible:

RDF (Resource Description Framework) expresses everything as triples: subject, predicate, object. Every entity gets a globally unique URI. AcmeCorp hasLoan Loan_4521. The simplicity is deceptive. Triples compose into arbitrarily complex knowledge graphs.

OWL (Web Ontology Language) sits on top of RDF and adds logic. It lets you define classes, subclass hierarchies, property constraints, and inference rules. An OWL reasoner can automatically conclude that if a customer's loan is overdue by more than 90 days, they belong to the class NonPerformingBorrower, without that classification ever being explicitly stored.

SPARQL is the query language for this world, the equivalent of SQL for relational databases.

Together these technologies enable something qualitatively different from a database: automated reasoning. A system built on OWL can tell you not just what is stored, but what is implied. It can verify logical consistency, detect contradictions, and classify new instances automatically.

What Codd AI Does, and Does Not, Build

When we encountered this formal definition, we asked ourselves honestly: is this what Codd AI builds? The answer is no, and we think being honest about that is important.

Codd AI does not build OWL ontologies. We do not run description logic reasoners. We do not expose SPARQL endpoints. These are powerful technologies for knowledge management systems, and they are the wrong tool for analytics teams who need answers from databases that already exist.

What Codd AI builds instead is what we call an Analytics Ontology: a structured, machine-usable representation of a business domain, reverse-engineered from existing data and business knowledge, designed specifically to ground AI analytics rather than to support formal logical reasoning. In the product, practitioners interact with this as the Concept Model, the layer that sits above the data model and metrics, and that makes both of them work correctly for AI.

The distinction matters. We are not claiming to do what OWL does. We are claiming to do something that OWL's approach is too heavy for analytics to achieve, and something that a BI metric layer is too thin to deliver. The Analytics Ontology is a purpose-built middle path, and the rest of this post explains why that path exists, what it contains, and why we believe it is the right foundation for AI analytics in 2026.

How the Market Got Confused

The Semantic Layer That Wasn't Semantic

The confusion starts with a term coined in the early 1990s. Business Objects built a tool that let business users query databases by dragging and dropping familiar concepts rather than writing SQL, and called the abstraction layer a "semantic layer." Three decades later, every major BI platform uses the same term to describe essentially the same thing: YAML-based metric definitions that compile to SQL at query time. dbt's metrics layer, Looker's LookML, AtScale, Cube, Power BI's semantic model. These tools are genuinely valuable. But they are SQL compilers with business glossaries, not semantic systems in any formal sense. Even their more recent AI-assisted features still operate within this paradigm: they help you define and manage metrics faster, but they do not model what those metrics mean in your business context.

Metric definition and ontology engineering are fundamentally different disciplines. Knowing that revenue is calculated as SUM(order_total) WHERE order_status = 'completed' tells you how to compute a number. It does not tell you what revenue means, why it dropped in Q3, or how it relates to pipeline, bookings, and billings in your specific business context.

The Ontology Overcorrection

On the other side of the confusion sits the knowledge graph and semantic web community, which sometimes dismisses the BI world's metric layers as fake ontologies while building formal OWL systems that are technically rigorous but practically inaccessible.

Real OWL ontologies require ontology engineers, specialists who understand description logics, axiom design, and reasoner behaviour. They take months to build from scratch. They assume you are starting with a blank slate and modelling your domain from first principles. They were not designed for the common analytics reality: you have an existing database with hundreds of tables, inconsistent naming conventions, sparse documentation, and a business team that needs answers by next Thursday.

The result is a gap. Analytics practitioners are told that "semantic layers" give them semantic technology, which they do not. They are also told that real ontologies require expertise and timelines they do not have, which is largely true. So they proceed without either, with SQL metrics that answer quantitative questions correctly but cannot support the contextual, relational reasoning that AI agents need.

Why This Gap Is Now Critical

For most of the history of BI, this gap did not matter much. BI was about dashboards and reports. A well-defined metric layer was sufficient.

Generative AI changed the equation. LLMs are now being asked to serve as analytics agents, to answer business questions in natural language, to connect data across domains, to surface insights without a human writing each query. And LLMs without grounding hallucinate. They generate confident, plausible-sounding answers that are factually wrong because they have no grounded understanding of what your data actually means.

A SQL metric layer does not solve this. It tells the AI how to compute numbers. It does not tell the AI what a "customer" is in your context, how customers relate to facilities and collateral in your domain, which columns are identifiers versus measures versus filters, or what business rules constrain valid answers.

This is the gap that a genuine ontology approach, even a practical, analytics-focused one, fills. It is the gap Codd AI was built to close.

The Analytics Ontology: A Practical Middle Path

The Problem with Both Extremes

The BI semantic layer is too thin. It defines metrics but not meaning. It cannot support the kind of relational reasoning AI agents need.

The formal OWL ontology is too heavy. It requires specialists, assumes a blank slate, takes months to build, and produces outputs that analytics tools cannot consume natively.

What analytics teams actually need is something in between: an approach that captures the semantic richness of an ontology (classes, relationships, hierarchies, property roles, business context) without requiring ontology engineers or OWL expertise, and that is grounded in the reality of existing databases rather than built from scratch.

This is the Analytics Ontology: a structured, machine-usable representation of your business domain, built from your existing data and business knowledge, designed to ground AI analytics rather than to support formal logical reasoning.

It is not a replacement for formal ontologies in knowledge management systems. It is a purpose-built variant for analytics, the same relationship to a formal OWL ontology that a BI semantic layer has to the Semantic Web. Analytically focused, practically achievable, and grounded in real data.

A Note on Terminology

The term "Analytics Ontology" is precise and defensible, but it carries the word "ontology," which triggers unproductive conversations with practitioners who either fear the term or misunderstand it. In product surfaces, the practitioner-facing name is the Concept Model, immediately understood by analysts who have never heard the word ontology. In thought leadership and expert conversations, the architectural term is Analytics Ontology, precise and intellectually honest about what it is and is not. Both refer to the same layer, which sits at the front of a natural pipeline: Concept Model (meaning) → Data Model (structure) → Metrics (measurement).

The Five Components of an Analytics Ontology

1. Business Concepts (Classes)

The named entities in your domain, not database tables, but the business objects those tables represent. Customer, CreditFacility, CollateralAsset, BlanketCollateral. Each concept has a business name, a plain-language description, and a set of synonyms.

The synonyms matter more than they might seem. When a business user asks "show me all floating charges," the system needs to know that "floating charge" is a synonym for blanket collateral lien, which maps to the blanket_collateral table. Without that synonym chain, natural language queries fail on terminology mismatches between how users speak and how data is stored. In Codd AI, synonyms are captured at the concept level and carried through to the data model and metric generation layers, so the synonym chain is preserved end to end.

2. Concept Relationships

How business concepts connect to each other, expressed as typed, directional relationships in plain language. In banking: Customer Has Many CreditFacility. CollateralAsset Component Of BlanketCollateral. In healthcare: Patient Has Many Encounter. Encounter Generates ClaimSubmission. Provider Performs Procedure. The domain changes, but the structure is the same.

These relationships serve two purposes. First, they define the semantic structure of the domain: what things exist and how they relate. Second, and equally important for large schemas, they act as a table clustering mechanism. When you have hundreds of tables, the concept relationships tell the system which tables belong together and should be processed as a coherent unit. This transforms what would otherwise be an overwhelming all-at-once schema ingestion into manageable, semantically coherent clusters, each processed with full domain context rather than noise from unrelated tables.

This clustering function is one of the most practically valuable aspects of the Analytics Ontology approach, and one that purely formal ontology tools, designed for from-scratch modelling rather than reverse engineering, do not address.

3. Concept Types

A classification of each concept by its analytical role:

Fact (Transactional) records events and states over time. OutstandingBalanceSnapshot, ObligationDetail. These drive metrics and change frequently.
Dimension (Reference) provides context and classification. Customer, DateDimension, CollateralType. These are the lenses through which metrics are sliced.
Bridge (Junction) connects many-to-many relationships. FacilityCollateralLinkage. These are the join tables the AI needs to traverse correctly, without which cross-domain queries fail silently.
Isolated (No relationships yet) are concepts with no relationships identified. Not a permanent classification but a signal to the human reviewer that this concept needs attention before the data model is generated.

These types are familiar to BI practitioners as Fact and Dimension from Kimball dimensional modelling, which is intentional. The Analytics Ontology does not ask practitioners to abandon familiar vocabulary. It extends it with the additional precision that AI-powered analytics requires.

4. Properties with Roles

Every concept has properties, but not all properties are equal from an AI perspective. An Analytics Ontology assigns each property a role that tells the AI what to do with it, not just what type it is.

Three roles are meaningfully distinct for AI query generation:

Link Properties are columns that join to other concepts. collateral_id links to CollateralAsset. customer_id links to Customer. These tell the AI how to construct joins without guessing from column name patterns, which fails on real-world schemas where cust_ref links to party_master or oblig_key links to loan_account.
Measure Properties are columns that can be aggregated. market_value, forced_sale_value, total_outstanding, overdue_days. These tell the AI what to compute, what belongs in a SUM, AVG, COUNT, MIN, or MAX.
Dimension Properties are columns that filter or group results. blanket_type, blanket_status, country, asset_class_covered. These tell the AI what belongs in WHERE clauses and GROUP BY expressions.

A fourth role handles system columns:

Audit covers columns like created_at, modified_by, and system_id that the AI should deprioritise when answering business questions. Without this role, AI agents include system columns in query results, producing technically correct but practically useless output.

It is worth noting what this list does not include. Labels like "classifier" and "descriptor" were considered and rejected. They are human-readable annotations that mean nothing to an AI during query generation. A property role only earns its place if it changes what the AI does with that column. These four roles all do. Decorative labels do not.

The practical consequence is concrete. When a user asks "what is the average coverage limit by blanket type?", the AI needs to know that coverage_limit is a Measure (AVG it), blanket_type is a Dimension (GROUP BY it), and the join to get there uses a Link Property. Without explicit roles, the AI infers this from column names and data types, which works on clean schemas and fails on messy real-world ones.

5. Confidence Scores

Unlike a formal ontology, which treats its assertions as ground truth because they were placed there by human experts, an Analytics Ontology is AI-generated and human-reviewed. Every relationship, every property role assignment, and every concept description carries a confidence score: the AI's assessment of how certain it is about that assertion.

This is genuinely novel. Formal ontologies have no equivalent. An Analytics Ontology makes uncertainty visible, directing human reviewers to the places where their attention is most needed.

In Codd AI's implementation, confidence at 95 to 100% renders in green. At 70 to 90% in amber. Below 70% in red. A reviewer opening the Concept Model sees immediately where their attention is needed, not by reading every card, but by scanning for colour. This is how a system of hundreds of concepts becomes reviewable in hours rather than days.

The confidence score is not decorative. It is the mechanism by which an AI-generated model becomes a human-verified one, and the feature that most clearly distinguishes an Analytics Ontology from both a formal OWL ontology (which asserts certainty) and a BI semantic layer (which has no concept of confidence at all).

How It Works in Practice: Reverse Engineering from Existing Data

The most important design choice in the Analytics Ontology approach is direction. Formal ontologies are built top-down: domain experts model the world, then data is mapped to the model. Analytics Ontologies are built bottom-up: the model is reverse-engineered from existing data and business knowledge.

This is not a compromise. It is the right approach for analytics, for three reasons.

First, analytics teams have existing data. They cannot start from a blank slate. Any approach that requires modelling the domain before touching the data will not survive contact with reality.

Second, the data contains knowledge. Column names, table structures, foreign key patterns, and existing documentation encode significant domain understanding. Ignoring this in favour of from-scratch modelling throws away the most reliable source of truth available.

Third, the goal is grounding, not publication. A formal ontology needs to be logically complete and consistent because it will be used by reasoners that will expose any gap. An Analytics Ontology needs to be good enough to ground AI queries correctly, a much lower bar that is practically achievable.

The Four-Step Process

Step 1: Ingest metadata and business knowledge

The process begins with two inputs: technical metadata from the database (tables, columns, data types, existing foreign keys, any available column descriptions) and business knowledge documents (data dictionaries, process documentation, regulatory glossaries, anything that describes the domain in business language).

These two inputs address a fundamental asymmetry. Technical metadata tells you how the data is stored. Business knowledge tells you what it means. An Analytics Ontology needs both, and the combination of GenAI with both inputs simultaneously is what makes automated generation possible at quality levels that would otherwise require dedicated ontology engineering teams.

In Codd AI, these inputs are called Knowledge Cells: the business and technical documents that feed the Concept Model generation step. They are not just a document store. They are the knowledge foundation from which the AI extracts domain understanding before generating a single concept or relationship.

Step 2: Generate the Concept Model

Using GenAI informed by both inputs, Codd AI generates a draft Analytics Ontology, identifying business concepts, inferring relationships, classifying concept types, and assigning property roles. Every element carries a confidence score.

The output is a concept graph: a visual representation of the domain with concepts as nodes and typed relationships as edges. At this stage the concept graph also serves its second function, partitioning the schema into coherent clusters. Concepts and their related tables form natural processing groups, so that the next step processes semantically related tables together rather than attempting to make sense of the entire schema at once.

Step 3: Human review and verification

The concept graph is presented to a domain expert or data analyst for review in the Concept Model Review screen. The reviewer can correct concept names, add or remove relationships, reclassify concept types, adjust property roles, and add synonyms from their domain knowledge.

Low-confidence elements are highlighted, directing review effort to where it matters most. High-confidence elements can be accepted quickly. The goal is not perfection but sufficiency: a Concept Model that correctly represents the domain well enough to ground AI queries reliably.

This step is also where the two-layer architecture becomes tangible. The reviewer is not looking at tables and columns. They are looking at business concepts and their relationships, expressed in the language of the domain rather than the language of the database. A commercial banking analyst reviewing Customer, Has Many, CreditFacility is doing something fundamentally different from a data engineer reviewing an ER diagram. They are verifying business truth, not database structure.

Step 4: Generate the data model

Once the Concept Model is verified, it feeds the data model generation step. Each concept maps to its physical tables. The concept relationships inform join path generation. Property roles inform the classification of each column. Business descriptions and synonyms carry through as metadata.

The result is a data model that is not just a structural diagram but a semantically enriched representation: every table and column annotated with its business meaning, every join path justified by a concept relationship, every column classified by its analytical role. This data model then grounds metric generation, completing the pipeline:

Knowledge Cells → Concept Model → Data Model → Metrics (what we know) → (what things mean) → (how things connect) → (what we measure)

An important design principle: the Concept Model captures business meaning (what things are, how they relate, what each property is for), while the Data Model captures physical structure (cardinality, foreign keys, data types, join paths). This separation keeps the Concept Model stable even when the underlying schema changes, and it means domain experts and data engineers can review different layers independently, each answering questions in their own language.

What This Means for AI Analytics

The practical consequence of getting this right is significant.

Without an Analytics Ontology, an AI analytics agent has:

Metric definitions (how to compute numbers)
Column names (which it interprets based on training data patterns)
Data types (which tell it INTEGER from STRING but not Measure from Identifier)

With an Analytics Ontology, the same agent has:

Business concept definitions with synonyms ("floating charge" maps to blanket_collateral)
Typed relationships between concepts (how to join without guessing)
Property roles (what to aggregate, what to filter, what to join through, what to ignore)
Business context from domain documents (what a "blanket lien" means in commercial banking)
Confidence-scored, human-verified assertions (where the model is certain and where a human confirmed it)

The difference in query quality is not marginal. It is the difference between an AI that generates syntactically correct SQL that answers the wrong question, and one that generates business-correct SQL that actually reflects what the user asked.

Consider a concrete example. A user in a commercial banking context asks: "What is the average coverage limit by blanket type for active liens?"

Without an Analytics Ontology, the AI must infer from column names that coverage_limit is aggregable, that blanket_type is a grouping dimension, that blanket_status = 'Active' is the right filter, and that the relevant table is blanket_collateral rather than collateral or facility. On a clean, well-named schema it might get this right. On the kind of schema that exists in most real organisations, where columns are named cvg_lmt, blkt_typ, and sts_cd, it will not.

With a Codd AI Concept Model, each of these inferences is replaced by a verified fact. The AI knows coverage_limit is a Measure because a human confirmed it. It knows blanket_type is a Dimension. It knows the correct table and the correct filter. The query it generates is not a best guess. It is grounded in a structured understanding of the domain that was built from the data itself and verified by someone who knows the business.

Why the Terminology Debate Matters

The data industry has been arguing about ontologies for thirty years without resolving the argument, because the two sides are talking about different things. Formal ontologists are right that what BI vendors call semantic layers are not semantic in any meaningful sense. BI vendors are right that formal OWL ontologies are impractical for analytics teams with existing schemas, limited budgets, and urgent deadlines.

The Analytics Ontology resolves this not by declaring a winner but by recognising that analytics has its own requirements. The goal is not formal reasoning or logical completeness. The goal is AI grounding: giving an AI agent a verified, structured understanding of the business domain it is querying, built from the data that already exists, reviewable by the people who understand the domain, and practical enough to build in hours rather than months.

Getting the terminology right matters for a practical reason. When an organisation hears "semantic layer" and believes it has invested in semantic technology, it stops looking for what it actually needs. When it hears "ontology" and dismisses the concept as too complex, it also stops looking. The term "Analytics Ontology" is an attempt to hold the space between these two failures: to claim that ontological thinking is relevant and achievable for analytics teams, while being honest that it is a distinct, purpose-built variant rather than a full formal ontology.

Conclusion

The data industry needs to stop arguing about whether BI semantic layers are "real" ontologies and start asking a more useful question: what does an AI analytics agent actually need to query business data correctly?

It needs to know what things are, not just what tables exist but what business concepts they represent, what they mean, and what other concepts they relate to. It needs to know what to do with each column, not just its data type but whether to aggregate it, filter by it, join through it, or ignore it. It needs this knowledge to be verified by humans who understand the domain, not just inferred by an AI from column names.

Codd AI's answer is the Analytics Ontology, grounded in existing data, generated by AI, verified by practitioners, and expressed in the plain language of the business domain rather than the formal language of description logics. In the product, it lives as the Concept Model, the first step in a pipeline that moves from business meaning to physical structure to measurable metrics.

For organisations building AI analytics infrastructure, the choice is not between a BI semantic layer and a formal OWL ontology. It is between AI that works from metric definitions and column names, and AI that works from a structured, verified understanding of the business domain.

The Analytics Ontology, and its practitioner-facing implementation as the Concept Model, is how you give AI the latter, without hiring ontology engineers or abandoning your existing data stack.

Built on the principle that meaning and measurement are different things, and that AI needs both.

If you want to explore what an Analytics Ontology looks like for your data, schedule a 30-minute call with our team.