Benefits of dbt: (Part 1) Dependencies

Jack C
3 min readApr 22, 2023

--

Following my articles on “What Does dbt Actually Do?” and the differences between dbt Core & dbt Cloud, I wanted to go into detail on 3 of the main benefits of dbt itself.

To recap, the main benefits of dbt are:

  • Lowering the technical barrier for building a data pipeline
  • Inferring dependencies (which we’ll go through here)
  • Making documenting & testing easy
  • Adding Python-like functionality to SQL

The bottom 3 benefits all roll in to the first benefit, so I won’t be going in to detail on that aspect as it’s more of a consequence of the other 3.

Before we go into what I mean by “inferring dependencies”, let’s first demystify a concept in data pipelines known as a “DAG”.

Let’s say you have a very simple pipeline with 2 SQL files, transactions.sql that extracts raw data and does some basic renaming / recasting of columns, and dim_transactions.sql that takes data from our transactions.sql output to be shown in our BI tool.

In practise, you’d probably never have a setup this simple! But, just for illustration, in this case you’d want to first run transactions.sql then run dim_transactions.sql.

We can draw this as a DAG, a Directed Acyclic Graph:

Before a tool like dbt, you’d have to manually specify what order to run your SQL files in.

Now, with dbt, you only need to use a Jinja expression, using {{ ref('transactions') }} in your SQL file:

There’s a lot to unpick here!

  • Jinja is a Python based templating language used in dbt — which gets replaced (templated) by actual SQL when compiled by dbt (step 2 above)
  • A Jinja “expression” is a pair of double curly brackets {{ }}
  • The ref function does 2 things: it finds the database & schema you want to use, and infers dependencies

So, to summarise, ref tells dbt that dim_transactions.sql depends on transactions.sql, and looks up what database & schema you want to read data from & write data to. Then, the {{ }} brackets get replaced with the outputs of the ref function — resulting in the compiled SQL (step 2).

  • It’s useful to be able to switch the database or schema you’re reading/writing from. For example, you can switch my_schema with prod_schema without changing the SQL that you write!

Whilst automating the step of “what order do my SQL files need to be run?” seems small, when you have a data pipeline with 100s of models it’s a blessing to have a tool that does it for you!

New to dbt, or someone who wants to learn the advanced concepts? My dbt course is now live on Udemy (link), and covers everything from basic to advanced dbt — including building a full project from scratch, 7 hours of video, and 50 downloadable course materials!

--

--

Jack C

I write about Data Analytics and Analytics Engineering