What Does dbt Actually Do?

Jack C
3 min readApr 19, 2023

--

dbt, or data build tool, is now pretty much the industry standard tool for transforming, documenting, and testing your data.

In this article, I want to explain what it actually is — and what it does.

A bit of background

Before we dive into what dbt does, let’s look at how a data pipeline might typically create a centralised data model in a data warehouse, without dbt.

Building a centralised data model means that end users are all pulling insights from the same data — which is far better than ad-hoc SQL queries, but there are still some gaps:

  • Builders separate from users: the people creating data pipelines sit separately from users, who have the business logic, querying them
  • Manually specified dependencies: you have to tell your scheduler (e.g. Airflow) what order to run your SQL models
  • Testing isn’t easy: ensuring your primary keys are unique, or that a column contains set values, is a heavily manual (or forgotten) process
  • Documenting isn’t easy: model documentation either won’t exist, or is done in a different tool (and often is stale!)

On top of all of this — SQL itself has a lot of limitations in terms of functionality when you compare it to something like Python.

What is dbt?

dbt Labs have 2 main products:

  1. dbt Core: a set of open source Python packages, run via the command line, that helps you transform, document, and test your data
  2. dbt Cloud: a web based UI that allows you to both develop and deploy data pipelines (built on top of dbt Core)

You can read more on their site here on the difference. But fundamentally, both are built on the dbt framework which I’ll go on to next.

What does dbt actually do?

  • Lowers the technical barrier: anyone who can write SQL can create full data pipelines, meaning the people with the business context are the ones building the data models
  • Infers dependencies: if SQL model B queries SQL model A, dbt knows to run A -> B. Sounds simple, but it’s a huge benefit!
  • Makes documenting & testing easy: SQL models are paired with yml files, allowing for model & column level documentation & testing
An example yml file that would pair with dim_transactions.sql
  • Brings Python-like functionality to SQL: dbt opens up the use of variables, loops, reusable functions (macros) and a lot of other things not found in native SQL
Example of Python-like functionality

New to dbt, or someone who wants to learn the advanced concepts? My dbt course is now live on Udemy (link), and covers everything from basic to advanced dbt — including building a full project from scratch, 7 hours of video, and 50 downloadable course materials!

--

--

Jack C
Jack C

Written by Jack C

I write about Data Analytics and Analytics Engineering

Responses (2)