Step 6. Map your data
Before you can score what you can build, you need to understand what data you actually have. This step is about taking stock.
The most useful way to do that is to build a data map: a structured inventory of the things that exist in your business, the events that happen to them, and where the data lives. In data engineering this is called an ontology, but the concept is straightforward — you are listing the nouns (entities) and verbs (events) of your business, and noting which systems capture them.
The good news is that if you have completed the process annotation exercise in Step 2, you have already done most of the work. Your customer journey stages map directly to events. Your partners and resources map to entities. The annotation layers give you the attributes.
Start by interviewing the operators
Before you open a spreadsheet or look at a database, you need to talk to the people who run the business day to day.
This is not a data discovery exercise. It is a language exercise. Your goal is to understand how operational stakeholders think about the business -- the words they use, the things they consider distinct, the events that feel significant to them. The data map you build should reflect their mental model, not the column names in your source systems.
This matters more than it sounds. If a head of collections thinks about their work in terms of "customers in difficulty," your data map should have an entity called something like that -- not a status flag with a value of 3. If a sales manager thinks in terms of "deals moving through stages," your events should look like that, not like CRM audit log entries.
When the data model matches the mental model, queries become natural. A business user -- or an AI agent acting on their behalf -- can ask a question in plain language and get a direct answer, because the data is already organised the way they think. When the ontology is built around system artefacts instead of business concepts, every question requires interpretation, and agentic analytics breaks down.
For each stage of the customer journey, sit with the person who runs it and ask:
- What do you call the things you work with day to day? What is a "customer" to you, what is an "account," what is a "deal"?
- What counts as something happening? What are the moments that matter in your work?
- How do you know when something has gone well, or gone wrong?
- What would you want to be able to look up or track that you currently cannot?
Then build your entity and event names from their answers. The test of a good data map is whether an operational stakeholder can read it and immediately recognise their business in it.
How to build your data map
Once you have done the interviews, work through the following:
- Extract entities. List every noun that came up -- every person, organisation, product, or asset that operational stakeholders work with and refer to by name.
- Extract events. List what happens at each stage. Every transition, every decision point, every moment that stakeholders described as significant.
- Define attributes. For each entity and event, list the attributes that matter for analytics. Use the names stakeholders use, not system field names.
- Map relationships. How do entities relate to each other? A user belongs to an account. An account has a subscription. A subscription has payments.
- Map to source systems. For each attribute, identify which system holds the data. This is where reality starts to bite.
- Assess data access. For each source system, note how the data can be reached: API (real-time or batch), direct database connection, file export (CSV, SFTP), or manual extract. Note any access constraints -- authentication requirements, rate limits, data sharing agreements, or systems where access has not yet been granted. This assessment is what feeds directly into your feasibility scores in Step 7.
Example — B2B SaaS
Entities:
| Entity | Key Attributes | Source | Access |
|---|---|---|---|
| Account | Company name, industry, size, plan, MRR, contract start date, region | CRM, billing system | CRM: REST API available. Billing: REST API available (Stripe/Chargebee typical). |
| User | Role, signup date, last active date, feature permissions | Product database | Direct DB connection or REST API depending on stack. |
| Subscription | Plan type, billing period, MRR, trial start date, renewal date | Billing system | REST API available. Webhook events for state changes. |
| Feature | Name, release date, access tier | Product database | Direct DB or internal API. Often requires engineering to expose. |
| Opportunity | Stage, value, close date, source, owner | CRM | REST API available. Rate limits apply on most platforms. |
| Campaign | Name, channel, spend, start date, UTM parameters | Marketing platform | API available (Google Ads, Meta etc). Spend data often requires separate pull. |
Events:
| Event | Key Attributes | Source | Access |
|---|---|---|---|
| Trial started | Timestamp, source campaign, signup channel | Product database, marketing platform | Product events via API or direct DB. Attribution requires UTM join to marketing platform. |
| User signed up | Timestamp, role, invite or organic | Product database | API or direct DB. |
| Feature activated | Timestamp, user, feature name, account | Product events | Event stream (e.g. Segment, Mixpanel, Amplitude) or direct DB. |
| Subscription converted | Timestamp, plan, MRR, conversion channel | Billing system | Webhook or API. |
| Subscription upgraded | Timestamp, old plan, new plan, MRR change | Billing system | Webhook or API. |
| Payment received | Timestamp, amount, payment method, status | Billing system | Webhook or API. |
| Support ticket raised | Timestamp, category, severity, resolution time | Support system | REST API (Zendesk, Intercom etc). Resolution time often requires calculated field. |
| Account churned | Timestamp, reason (if captured), MRR lost | CRM, billing system | Billing webhook for cancellation. Churn reason typically manual field in CRM -- often incomplete. |
Example — Non-bank lender
Entities:
| Entity | Key Attributes | Source | Access |
|---|---|---|---|
| Borrower | Name, date of birth, income, employment type, credit score, segment | Application system, credit bureau | Application system: API or direct DB. Credit bureau: batch file (typically SFTP, daily or on-demand). Data sharing agreement required. |
| Loan | Amount, rate, product type, term, channel, status | Application system, servicing platform | Application system: API or direct DB. Servicing platform: varies widely -- some support API, others require scheduled extract. |
| Property | Address, valuation, property type | Application system, valuation provider | Application system: API or direct DB. Valuation provider: API (e.g. CoreLogic, Valex) -- licensing required. |
| Broker | Name, firm, volume tier, quality rating | Broker portal, commission system | Broker portal: direct DB or internal API. Commission system: often a separate legacy system requiring file extract. |
| Payment | Amount, date, method, days past due | Servicing platform | Scheduled extract or API depending on platform. Real-time access uncommon in older servicing systems. |
Events:
| Event | Key Attributes | Source | Access |
|---|---|---|---|
| Application submitted | Timestamp, channel, product type | Application system | API or direct DB. Usually well-structured. |
| Pre-approval issued | Timestamp, conditions, indicative rate | Application system, decisioning engine | Direct DB. Decisioning engine output may require separate extract if third-party. |
| Application declined | Timestamp, decline reason | Decisioning engine | Direct DB or API. Decline reason codes often require a lookup table to interpret. |
| Documents received | Timestamp, document type, completeness | Document management system | API or direct DB. Completeness scoring often not captured -- may need to be derived. |
| Loan settled | Timestamp, funded amount | Application system | API or direct DB. |
| Payment missed | Timestamp, amount due, days overdue | Servicing platform | Scheduled extract typical. Real-time alerting requires event stream or polling. |
| Hardship arrangement created | Timestamp, reason, arrangement type | Case management system | Often a separate system from servicing -- direct DB or file extract. Data quality varies. |
| Loan discharged | Timestamp, reason, balance at discharge | Servicing platform | Scheduled extract or API. Discharge reason often manually entered -- check completeness. |