Transform Data
Time required: 15 minutes
Prerequisites
Introduction
A data transform is a layer of logic that transforms one set of data into another - this is typically the ‘T’ in ELT. An example of a simple transform might be to calculate the total amount each customer has spent across all orders they have placed. Matatika uses dbt as a transformation tool, which operates on the concept of transforms-as-code, in the form of templated SQL files.
Create a source
A dbt source is a definition of the “source” database schema. To define a source, create a YAML file (e.g. my_source.yml
) under the transforms/models
directory (refer to the dbt docs for more information):
version: 2
sources:
- name: my_source
schema: "{{ env_var('DBT_SOURCE_SCHEMA') }}"
tables:
- name: customers
- name: orders
Create a model
A dbt model translates to a database table that is created and updated by dbt. To create a model for a workspace, first create a new SQL file (e.g. my_model.sql
) under the transforms/models
directory. Start writing SQL in this file to define transform logic, using dbt Jinja functions to inject dynamic behaviour:
{{ config(materialized='table') }}
with customers as (
select * from {{ source('my_source', 'customers') }}
),
orders as (
select * from {{ source('my_source', 'orders') }}
),
"final" as (
select
c.id
, c."name"
, SUM(o.product_price * o.quantity) as total_spend
from customers c
join orders o on o.customer_id = c.id
group by c.id, c."name"
order by total_spend desc
)
select * from "final"
Run your models
dbt models require the source tables to exist before they are run. If the data isn’t already present, you need to run your data import locally.
# ensure dbt is installed
meltano install transformer dbt
# run dbt models
meltano invoke dbt run
See
dbt run
docs for more information
When a pipeline with dbt is run in the Matatika platform, models in the project will be automatically run.