Step 6. Map your data

Before you can score what you can build, you need to understand what data you actually have. This step is about taking stock.

The most useful way to do that is to build a data map: a structured inventory of the things that exist in your business, the events that happen to them, and where the data lives. In data engineering this is called an ontology, but the concept is straightforward — you are listing the nouns (entities) and verbs (events) of your business, and noting which systems capture them.

The good news is that if you have completed the process annotation exercise in Step 2, you have already done most of the work. Your customer journey stages map directly to events. Your partners and resources map to entities. The annotation layers give you the attributes.

Start by interviewing the operators

Before you open a spreadsheet or look at a database, you need to talk to the people who run the business day to day.

This is not a data discovery exercise. It is a language exercise. Your goal is to understand how operational stakeholders think about the business -- the words they use, the things they consider distinct, the events that feel significant to them. The data map you build should reflect their mental model, not the column names in your source systems.

This matters more than it sounds. If a head of collections thinks about their work in terms of "customers in difficulty," your data map should have an entity called something like that -- not a status flag with a value of 3. If a sales manager thinks in terms of "deals moving through stages," your events should look like that, not like CRM audit log entries.

When the data model matches the mental model, queries become natural. A business user -- or an AI agent acting on their behalf -- can ask a question in plain language and get a direct answer, because the data is already organised the way they think. When the ontology is built around system artefacts instead of business concepts, every question requires interpretation, and agentic analytics breaks down.

For each stage of the customer journey, sit with the person who runs it and ask:

What do you call the things you work with day to day? What is a "customer" to you, what is an "account," what is a "deal"?
What counts as something happening? What are the moments that matter in your work?
How do you know when something has gone well, or gone wrong?
What would you want to be able to look up or track that you currently cannot?

Then build your entity and event names from their answers. The test of a good data map is whether an operational stakeholder can read it and immediately recognise their business in it.

How to build your data map

Once you have done the interviews, work through the following:

Extract entities. List every noun that came up -- every person, organisation, product, or asset that operational stakeholders work with and refer to by name.
Extract events. List what happens at each stage. Every transition, every decision point, every moment that stakeholders described as significant.
Define attributes. For each entity and event, list the attributes that matter for analytics. Use the names stakeholders use, not system field names.
Map relationships. How do entities relate to each other? A user belongs to an account. An account has a subscription. A subscription has payments.
Map to source systems. For each attribute, identify which system holds the data. This is where reality starts to bite.
Assess data access. For each source system, note how the data can be reached: API (real-time or batch), direct database connection, file export (CSV, SFTP), or manual extract. Note any access constraints -- authentication requirements, rate limits, data sharing agreements, or systems where access has not yet been granted. This assessment is what feeds directly into your feasibility scores in Step 7.

Example — B2B SaaS

Entities:

Entity	Key Attributes	Source	Access
Account	Company name, industry, size, plan, MRR, contract start date, region	CRM, billing system	CRM: REST API available. Billing: REST API available (Stripe/Chargebee typical).
User	Role, signup date, last active date, feature permissions	Product database	Direct DB connection or REST API depending on stack.
Subscription	Plan type, billing period, MRR, trial start date, renewal date	Billing system	REST API available. Webhook events for state changes.
Feature	Name, release date, access tier	Product database	Direct DB or internal API. Often requires engineering to expose.
Opportunity	Stage, value, close date, source, owner	CRM	REST API available. Rate limits apply on most platforms.
Campaign	Name, channel, spend, start date, UTM parameters	Marketing platform	API available (Google Ads, Meta etc). Spend data often requires separate pull.

Events:

Event	Key Attributes	Source	Access
Trial started	Timestamp, source campaign, signup channel	Product database, marketing platform	Product events via API or direct DB. Attribution requires UTM join to marketing platform.
User signed up	Timestamp, role, invite or organic	Product database	API or direct DB.
Feature activated	Timestamp, user, feature name, account	Product events	Event stream (e.g. Segment, Mixpanel, Amplitude) or direct DB.
Subscription converted	Timestamp, plan, MRR, conversion channel	Billing system	Webhook or API.
Subscription upgraded	Timestamp, old plan, new plan, MRR change	Billing system	Webhook or API.
Payment received	Timestamp, amount, payment method, status	Billing system	Webhook or API.
Support ticket raised	Timestamp, category, severity, resolution time	Support system	REST API (Zendesk, Intercom etc). Resolution time often requires calculated field.
Account churned	Timestamp, reason (if captured), MRR lost	CRM, billing system	Billing webhook for cancellation. Churn reason typically manual field in CRM -- often incomplete.

Example — Non-bank lender

Entities:

Entity	Key Attributes	Source	Access
Borrower	Name, date of birth, income, employment type, credit score, segment	Application system, credit bureau	Application system: API or direct DB. Credit bureau: batch file (typically SFTP, daily or on-demand). Data sharing agreement required.
Loan	Amount, rate, product type, term, channel, status	Application system, servicing platform	Application system: API or direct DB. Servicing platform: varies widely -- some support API, others require scheduled extract.
Property	Address, valuation, property type	Application system, valuation provider	Application system: API or direct DB. Valuation provider: API (e.g. CoreLogic, Valex) -- licensing required.
Broker	Name, firm, volume tier, quality rating	Broker portal, commission system	Broker portal: direct DB or internal API. Commission system: often a separate legacy system requiring file extract.
Payment	Amount, date, method, days past due	Servicing platform	Scheduled extract or API depending on platform. Real-time access uncommon in older servicing systems.

Events:

Event	Key Attributes	Source	Access
Application submitted	Timestamp, channel, product type	Application system	API or direct DB. Usually well-structured.
Pre-approval issued	Timestamp, conditions, indicative rate	Application system, decisioning engine	Direct DB. Decisioning engine output may require separate extract if third-party.
Application declined	Timestamp, decline reason	Decisioning engine	Direct DB or API. Decline reason codes often require a lookup table to interpret.
Documents received	Timestamp, document type, completeness	Document management system	API or direct DB. Completeness scoring often not captured -- may need to be derived.
Loan settled	Timestamp, funded amount	Application system	API or direct DB.
Payment missed	Timestamp, amount due, days overdue	Servicing platform	Scheduled extract typical. Real-time alerting requires event stream or polling.
Hardship arrangement created	Timestamp, reason, arrangement type	Case management system	Often a separate system from servicing -- direct DB or file extract. Data quality varies.
Loan discharged	Timestamp, reason, balance at discharge	Servicing platform	Scheduled extract or API. Discharge reason often manually entered -- check completeness.