My core advice for analytics architecture is simple:
Keep your BI tool as dumb as possible.
In other words, get the complex business logic out of your visualization layer. A data visualization or BI tool should focus purely on visualizing data effectively to squeeze out insights and information.
Some might have heard the term "headless BI." It’s a concept similar to headless Content Management Systems (CMS). Your backend – in our case, the data warehouse or another layer – defines everything: all the logic, all the metrics. The user-facing parts, like your BI tool, are then primarily responsible for ensuring the data looks good and the dashboards work as intended. The "thinking" should happen elsewhere.
Define Aggregatable Metrics in the Data Warehouse: Any metrics that involve straightforward aggregations like
COUNT(...)
orSUM(...)
should be defined and calculated in your data warehouse layer (e.g., in dbt models, SQL views).Calculate Ratios and Averages On-the-Fly (Carefully): For metrics like averages or conversion rates, you typically need to calculate them on the fly in the BI tool, but based on pre-aggregated components from the warehouse. For instance, a conversion rate would be
SUM(purchases) / SUM(website_visits)
. The key is to ensure you're not trying to average averages or sum up other non-aggregatable or semi-aggregatable metrics that have been pre-calculated incorrectly. The base sums come from the warehouse; the division happens in the BI tool.
Please do not apply the dumb meaning for filters, drill-downs or generally actionable features. The point is rather that only the necessary elements should be DEFINED inside the BI layer.
Why should you try to keep your BI Tool lean?
There are significant strategic advantages to this approach:
Flexibility and Future-Proofing: The visualization layer often sees the biggest differences in suitability between small and large-scale companies. A BI tool that's perfect for an early-stage startup (think of Looker Studio or Metabase) might lack the governance and administrative features needed by a large enterprise (e.g., a Looker, Omni, Tableau, Power BI), and vice versa.
In contrast, you can start with a data warehouse engine like BigQuery or Snowflake and likely never have a compelling reason to switch, regardless of how much you scale. The "bandwidth" or maturity fit for BI tools is often tighter and less forgiving. Don't lock yourself into a BI tool that's too small, too big, too expensive, or too heavy to maintain by embedding all your critical business logic within it.
Easier Migration and Evolution: If your logic resides primarily in the data warehouse, switching BI tools becomes a much simpler task of re-pointing and rebuilding visualizations, not re-engineering your entire metric system.
The Rise of Semantic Layers: Consider if a semantic layer makes sense for your architecture. These have matured significantly in the last 3 years. Nowadays, they can integrate not only the most common BI tools, but also Excel and Google Sheets, enable data science use cases via REST APIs, and even serve data to your mobile or web applications directly from the data warehouse. (Obviously, you can also throw AI Agents into the mix as a potential downstream consumer)
Rule of thumb: If data is used extensively in two or more places, consider evaluating if a semantic layer might be a good fit.
Extend the Principle: "Shift Left" for Data Quality
This philosophy of moving logic upstream ties directly into the "shift left" concept, borrowed from software engineering’s shift left testing.
Imagine your data flow from left (sources) to right (consumption and action). You want to push tests, data quality fixes and transformations as far left as possible – and sensible.
Test your data for validity, accuracy, and usability as early as possible in your pipeline. Don't wait for the entire ETL/ELT process to complete before discovering that something was broken from step one. Early detection saves time, resources, and headaches.
Some things you can e.g., only identify in the core- or mart-layer, that’s fine.
But to e.g. validate if certain inputs in the CRM match the defined naming conventions can be done either in the clean-layer or even already with the raw-data. (Did you know that you can define and run tests on your sources in dbt?)
Fix at the Source: If you have wrongly named clients, don't just patch it in your SQL transformation code, and certainly not in your BI tool. Fix it directly in Salesforce, your product backend, or whatever the source system is.
Why? This ensures that any system or user pulling data from that source gets the corrected information. You manage the truth in one place. If you only fix it in your data warehouse pipeline, other tools integrated directly with Salesforce will still see the wrong data.
I'm aware there's always a trade-off between speed and quality. Sometimes a fix in the source system is cumbersome, or the relevant engineering team has other priorities. You might be forced to implement a temporary fix in your SQL. But, in such cases, make sure to emphasize, educate, and if necessary, escalate why it's crucial to address the issue at its origin.
The Takeaway
Pushing your logic and quality controls as far "left" (upstream) in your data stack as possible – from the BI tool to the data warehouse, and ideally to the source systems themselves – leads to a more robust, flexible, and reliable data ecosystem. It might seem like more effort initially, but it pays dividends in the long run.