Prompt: Set up a dbt project structure for our e-commerce analytics. We need staging, intermediate, and mart layers. Show me the folder structure, YAML config, and SQL for a customer lifetime value mart model.
This is a comprehensive setup for a standard Kimball-style data warehouse architecture, using the modern dbt workflow.
We will define the structure, and then supply the code examples for the key components: Staging (stg_), Intermediate (int_), and Final Mart (marts/dim_... and marts/fct_...).
---
1. Target Schema Structure
Assuming the following source data tables exist in your raw schema (e.g., in Snowflake/BigQuery):
raw.orders: Transactional order details.
raw.customers: Customer profiles.
raw.products: Product catalog details.
Your compiled dbt project structure should look like this:
``
your_dbt_project/
├── models/
│ ├── staging/ # 1. Staging Layer (Clean, minimal transformation)
│ │ ├── stg_customers.sql
│ │ ├── stg_orders.sql
│ │ └── stg_products.sql
│ ├── intermediate/ # 2. Intermediate Layer (Join staging data, calculate metrics)
│ │ ├── int_customer_agg.sql
│ │ ├── int_order_details.sql
│ │ └── int_order_line_items.sql
│ └── marts/ # 3. Mart Layer (Final, consumable dimensional/fact tables)
│ ├── dim_customer.sql # Dimension Table
│ ├── dim_product.sql # Dimension Table
│ └── fct_orders.sql # Fact Table (Final Grain)
├── dbt_project.yml # Project configuration
└── profiles.yml # Connection configuration (handled by CLI)
`
---
2. Step-by-Step Model Code
A. Staging Layer (staging/)
Goal: Select only necessary columns, apply basic type casting, and standardize names. No heavy logic here.
models/staging/stg_customers.sql
`sql
{{ config(materialized='view') }}
SELECT
customer_id,
first_name,
last_name,
email,
created_at AS customer_signup_date,
-- Example of cleaning/standardizing messy data
UPPER(coalesce(country, 'UNKNOWN')) AS standard_country
FROM
{{ source('raw', 'customers') }}
`
B. Intermediate Layer (intermediate/)
Goal: Perform first-level joins and complex metric calculations that will feed the final marts.
models/intermediate/int_order_line_items.sql
`sql
{{ config(materialized='view') }}
SELECT
o.order_id,
o.customer_id,
o.order_date,
p.product_id,
p.product_name,
p.category,
o.quantity,
o.unit_price,
(o.quantity * o.unit_price) AS line_item_revenue -- Calculated metric
FROM
{{ ref('stg_orders') }} o
INNER JOIN
{{ ref('stg_products') }} p ON o.product_id = p.product_id
`
C. Mart Layer (marts/)
This is what the end-user/BI tool connects to. We'll implement a Slowly Changing Dimension (SCD) Type 1 for customers and a Fact Table for orders.
1. Dimension: Customer (models/marts/dim_customer.sql)
We use incremental here as customer data might change slightly over time.
`sql
{{ config(
materialized='incremental',
unique_key='customer_id',
incremental_strategy='merge' -- Best practice for SCD 1/2
) }}
SELECT
c.customer_id,
-- SCD Type 1 Logic: Overwrite on change
c.first_name,
c.last_name,
c.email,
c.standard_country,
c.customer_signup_date,
CURRENT_TIMESTAMP() AS dbt_loaded_at -- Audit column
FROM
{{ ref('stg_customers') }} c
{% if is_incremental() %}
-- Only process records newer than the last run
WHERE c.customer_signup_date > (SELECT MAX(dbt_loaded_at) FROM {{ this }})
{% endif %}
`
2. Fact: Orders (models/marts/fct_orders.sql)
This table typically joins dimensions to the transaction facts.
`sql
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge'
) }}
SELECT
o.order_id,
o.order_date,
-- Foreign Keys (linking to Dimensions)
c_dim.customer_id AS customer_fk,
p_dim.product_id AS product_fk,
-- Facts/Metrics
o.quantity,
o.unit_price,
o.line_item_revenue,
-- Date Dimension Keys (if you build a full dedicated date dimension)
-- DATE_DAY(o.order_date) AS date_key
FROM
{{ ref('int_order_line_items') }} o
INNER JOIN
{{ ref('dim_customer') }} c_dim ON o.customer_id = c_dim.customer_id
{# Add other dimension joins here #}
{% if is_incremental() %}
-- Only process orders placed since the last run
WHERE o.order_date > (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
`
---
3. Execution Summary
Required Configurations
- dbt_project.yml
: You must define your sources so dbt knows where raw. tables are.
`yaml
models/dbt_project.yml
name: 'my_data_warehouse'
version: '1.0.0'
profile: 'default' # Must match your CLI profile
sources:
database: your_raw_database
schema: raw_schema # Schema where raw tables live
tables:
- name: customers
- name: orders
- name: products
`
- Run Order: Because of the {{ ref(...) }}
function, dbt automatically figures out the dependency graph. You run:
- dbt run
Execution Flow Achieved by dbt
- Staging: stg_customers
reads from raw.customers.
- Intermediate: int_order_line_items
reads from stg_orders and stg_products.
- Dimension: dim_customer
reads from stg_customers.
- Fact: fct_orders
reads from int_order_line_items and then joins to the newly created/updated dim_customer` table.
This provides a clean, reproducible, and scalable data modeling pipeline.