Understanding analytics

Part 4 of 6

The main tasks in building advanced analytics

Turning an idea into a working model involves a set of tasks. In traditional analytics, people do these step by step; in agentic analytics, AI agents can do many of them (or assist) so the same work happens faster or with less manual effort.

1. Defining the problem and success

What it is: Being clear about what you're predicting or deciding, for whom, and how you'll measure success (e.g. accuracy, profit, fairness).

Why it matters: A model built for the wrong question or the wrong metric will disappoint in the real world.

2. Getting and preparing data

What it is: Pulling the right data from the right sources, cleaning it (missing values, duplicates, errors), and aligning it so it's ready for modelling. Often the slowest and messiest part of the process.

Why it matters: Models are only as good as the data they're trained and run on.

Each cleaning and joining decision (missing values, duplicates, outliers, keys, train vs production) needs judgement — there is rarely one “correct” recipe. That is why this step routinely consumes more calendar time than leadership expects.

3. Feature engineering

What it is: Creating the inputs the algorithm will use. Raw data (e.g. "date of first purchase") is turned into features (e.g. "days since first purchase," "month of year," "is weekend?") that are more useful for the model.

Why it matters: Good features often matter more than choosing the fanciest algorithm. This step is where domain knowledge (how the business works) meets the data.

Turning dates, amounts, categories, counts, text, and interactions into model-ready inputs is a craft: encoding choices, leakage risk, and “what would we do in production?” all matter. Domain experts (operations, risk, marketing) are usually the best source of feature ideas; turning those ideas into a repeatable pipeline is where teams get stretched. A short feature workshop with stakeholders (see Change management) is a practical way to generate and prioritise features.

4. Model training (and selection)

What it is: Using historical data to train the algorithm — i.e. to set its internal parameters so it fits the patterns in that data. Often you try several algorithms or configurations and pick the one that performs best on held-out data.

Why it matters: Training is where the algorithm "learns." The choice of algorithm and how you train it (e.g. how much data, what you optimise for) drives performance and behaviour.

Bias and variance (in plain language). When you train a model, you care about two kinds of error:

  • Bias is error from the model being too simple — it misses real patterns in the data (e.g. a straight line when the relationship is curved). The model underfits: it's not using the data well. You see this when performance is poor on both the training set and on new data.
  • Variance is error from the model being too reactive to the training data — it memorises noise and quirks. The model overfits: it looks great on training data but does worse on new data.

There is a tradeoff: simpler models tend to have higher bias and lower variance; more complex models tend to have lower bias and higher variance. The goal is to find the sweet spot (e.g. via validation, regularisation, or cross-validation) where the model captures real structure without fitting noise. If your model does well on training data but poorly on held-out or production data, you're likely overfitting — try a simpler model, more data, or stronger regularisation.

5. Validation and testing

What it is: Checking that the model works as intended: accuracy on unseen data, fairness across groups, robustness to bad inputs, and behaviour at the edges (e.g. extreme values, rare events).

Why it matters: A model that looks great on the data it was trained on can fail or behave badly in production if it wasn't properly validated.

6. Deployment and monitoring

What it is: Putting the model into the systems people use (dashboards, apps, workflows) and then monitoring its performance and inputs over time so you can detect drift or degradation.

Why it matters: A model that isn't used doesn't create value; one that isn't monitored can silently become wrong or unfair.

7. Iteration and maintenance

What it is: Retraining, adjusting, or replacing the model when the business, data, or performance requirements change.

Why it matters: The world changes; models that aren't updated eventually become obsolete.


In traditional analytics, analysts and data scientists do most of these tasks by hand (writing code, running experiments, deploying pipelines). In agentic analytics, many of these steps can be partially or fully carried out by AI agents, with humans setting direction and checking results.

Each of these steps requires specialist judgement. Most mid-market analytics teams are under-resourced for at least three of them — not because they are not capable, but because the system was not designed for them. TrueState handles the execution end-to-end on your warehouse so the work ships instead of stalling in the backlog. See how it fits your stack.