Step 6. Map your data

Step 7 of 9

Step 6. Map your data

Before you can score what you can build, you need to understand what data you actually have. This step is about taking stock.

The most useful way to do that is to build a data map: a structured inventory of the things that exist in your business, the events that happen to them, and where the data lives. In data engineering this is called an ontology, but the concept is straightforward — you are listing the nouns (entities) and verbs (events) of your business, and noting which systems capture them.

The good news is that if you have completed the process annotation exercise in Step 2, you have already done most of the work. Your customer journey stages map directly to events. Your partners and resources map to entities. The annotation layers give you the attributes.

Start by interviewing the operators

Before you open a spreadsheet or look at a database, you need to talk to the people who run the business day to day.

This is not a data discovery exercise. It is a language exercise. Your goal is to understand how operational stakeholders think about the business -- the words they use, the things they consider distinct, the events that feel significant to them. The data map you build should reflect their mental model, not the column names in your source systems.

This matters more than it sounds. If a head of collections thinks about their work in terms of "customers in difficulty," your data map should have an entity called something like that -- not a status flag with a value of 3. If a sales manager thinks in terms of "deals moving through stages," your events should look like that, not like CRM audit log entries.

When the data model matches the mental model, queries become natural. A business user -- or an AI agent acting on their behalf -- can ask a question in plain language and get a direct answer, because the data is already organised the way they think. When the ontology is built around system artefacts instead of business concepts, every question requires interpretation, and agentic analytics breaks down.

For each stage of the customer journey, sit with the person who runs it and ask:

  • What do you call the things you work with day to day? What is a "customer" to you, what is an "account," what is a "deal"?
  • What counts as something happening? What are the moments that matter in your work?
  • How do you know when something has gone well, or gone wrong?
  • What would you want to be able to look up or track that you currently cannot?

Then build your entity and event names from their answers. The test of a good data map is whether an operational stakeholder can read it and immediately recognise their business in it.

How to build your data map

Once you have done the interviews, work through the following:

  • Extract entities. List every noun that came up -- every person, organisation, product, or asset that operational stakeholders work with and refer to by name.
  • Extract events. List what happens at each stage. Every transition, every decision point, every moment that stakeholders described as significant.
  • Define attributes. For each entity and event, list the attributes that matter for analytics. Use the names stakeholders use, not system field names.
  • Map relationships. How do entities relate to each other? A user belongs to an account. An account has a subscription. A subscription has payments.
  • Map to source systems. For each attribute, identify which system holds the data. This is where reality starts to bite.
  • Assess data access. For each source system, note how the data can be reached: API (real-time or batch), direct database connection, file export (CSV, SFTP), or manual extract. Note any access constraints -- authentication requirements, rate limits, data sharing agreements, or systems where access has not yet been granted. This assessment is what feeds directly into your feasibility scores in Step 7.

Example — B2B SaaS

Entities:

EntityKey AttributesSourceAccess
AccountCompany name, industry, size, plan, MRR, contract start date, regionCRM, billing systemCRM: REST API available. Billing: REST API available (Stripe/Chargebee typical).
UserRole, signup date, last active date, feature permissionsProduct databaseDirect DB connection or REST API depending on stack.
SubscriptionPlan type, billing period, MRR, trial start date, renewal dateBilling systemREST API available. Webhook events for state changes.
FeatureName, release date, access tierProduct databaseDirect DB or internal API. Often requires engineering to expose.
OpportunityStage, value, close date, source, ownerCRMREST API available. Rate limits apply on most platforms.
CampaignName, channel, spend, start date, UTM parametersMarketing platformAPI available (Google Ads, Meta etc). Spend data often requires separate pull.

Events:

EventKey AttributesSourceAccess
Trial startedTimestamp, source campaign, signup channelProduct database, marketing platformProduct events via API or direct DB. Attribution requires UTM join to marketing platform.
User signed upTimestamp, role, invite or organicProduct databaseAPI or direct DB.
Feature activatedTimestamp, user, feature name, accountProduct eventsEvent stream (e.g. Segment, Mixpanel, Amplitude) or direct DB.
Subscription convertedTimestamp, plan, MRR, conversion channelBilling systemWebhook or API.
Subscription upgradedTimestamp, old plan, new plan, MRR changeBilling systemWebhook or API.
Payment receivedTimestamp, amount, payment method, statusBilling systemWebhook or API.
Support ticket raisedTimestamp, category, severity, resolution timeSupport systemREST API (Zendesk, Intercom etc). Resolution time often requires calculated field.
Account churnedTimestamp, reason (if captured), MRR lostCRM, billing systemBilling webhook for cancellation. Churn reason typically manual field in CRM -- often incomplete.

Example — Non-bank lender

Entities:

EntityKey AttributesSourceAccess
BorrowerName, date of birth, income, employment type, credit score, segmentApplication system, credit bureauApplication system: API or direct DB. Credit bureau: batch file (typically SFTP, daily or on-demand). Data sharing agreement required.
LoanAmount, rate, product type, term, channel, statusApplication system, servicing platformApplication system: API or direct DB. Servicing platform: varies widely -- some support API, others require scheduled extract.
PropertyAddress, valuation, property typeApplication system, valuation providerApplication system: API or direct DB. Valuation provider: API (e.g. CoreLogic, Valex) -- licensing required.
BrokerName, firm, volume tier, quality ratingBroker portal, commission systemBroker portal: direct DB or internal API. Commission system: often a separate legacy system requiring file extract.
PaymentAmount, date, method, days past dueServicing platformScheduled extract or API depending on platform. Real-time access uncommon in older servicing systems.

Events:

EventKey AttributesSourceAccess
Application submittedTimestamp, channel, product typeApplication systemAPI or direct DB. Usually well-structured.
Pre-approval issuedTimestamp, conditions, indicative rateApplication system, decisioning engineDirect DB. Decisioning engine output may require separate extract if third-party.
Application declinedTimestamp, decline reasonDecisioning engineDirect DB or API. Decline reason codes often require a lookup table to interpret.
Documents receivedTimestamp, document type, completenessDocument management systemAPI or direct DB. Completeness scoring often not captured -- may need to be derived.
Loan settledTimestamp, funded amountApplication systemAPI or direct DB.
Payment missedTimestamp, amount due, days overdueServicing platformScheduled extract typical. Real-time alerting requires event stream or polling.
Hardship arrangement createdTimestamp, reason, arrangement typeCase management systemOften a separate system from servicing -- direct DB or file extract. Data quality varies.
Loan dischargedTimestamp, reason, balance at dischargeServicing platformScheduled extract or API. Discharge reason often manually entered -- check completeness.