12 expert agents with 200+ skills. Running 24/7 on local compute. Built for South African business.
Each agent is grounded in domain expertise — not generic chat. They know SA regulations, local suppliers, and your market context.
Real deliverables generated by AI agents with skill-grounded expertise. Not generic advice — actionable work product.
This is a classic warehousing challenge that requires a structured approach integrating OLTP data into an analytical (OLAP) structure. Since you are dealing with time-series transactional data (sales/...
Prompt: We have 3 years of transactional data in PostgreSQL (50M rows). Design a data warehouse architecture for customer LTV analysis, churn prediction, and cohort analysis. Include the dimensional model with specific fact and dimension tables.
This is a classic warehousing challenge that requires a structured approach integrating OLTP data into an analytical (OLAP) structure. Since you are dealing with time-series transactional data (sales/behavior), the Star Schema built around a Fact Table and supporting Dimension Tables is the industry standard and most efficient pattern for this analysis.
Given the scope (LTV, Cohorting), we will need robust dimensionality, particularly around time and customer identity.
Here is the complete architecture proposal, broken down by components and implementation details.
---
The goal is to analyze what happened (Facts) to whom (Dimensions) and when (Dimensions) over time.
(Imagine a central table linking to surrounding rectangular descriptor tables)
$$\text{Fact\_Fact\_Sales}$$
$$\text{/ | \ \ | \ }$$
$$\text{Dim\_Date} \rightarrow \text{Dim\_Customer} \rightarrow \text{Dim\_Product} \rightarrow \text{Dim\_Promotion}$$
---
Dimension tables are relatively stable and describe the entities involved. You will use Surrogate Keys (surrogate keys, e.g., date_key, customer_key) in these tables, which are integers, to link to the Fact Table.
1. Dim_Date (The Time Dimension - CRITICAL)
This table must be pre-populated with every date you expect to analyze.
date_key (Primary Key, e.g., YYYYMMDD)full_date (DATE type)day_of_week (VARCHAR, e.g., 'Wednesday')day_number (INT, 1-31)month (INT, 1-12)month_name (VARCHAR)quarter (VARCHAR, e.g., 'Q2')year (INT)2. Dim_Customer (The Who Dimension - CRITICAL)
This handles customer attributes. Use SCD Type 2 logic here if customer attributes change over time (e.g., "Customer changed subscription tier from Bronze to Silver").
customer_key (Primary Key, Surrogate Key, INT)customer_id (Natural Key, from Source System, VARCHAR)first_name, last_name (VARCHAR)acquisition_date (DATE)acquisition_channel (VARCHAR)is_active (BOOLEAN)valid_from_date, valid_to_date (To track historical attributes)3. Dim_Product (The What Dimension)
Describes what was sold or used.
product_key (Primary Key, Surrogate Key, INT)product_id (Natural Key, from Source System, VARCHAR)product_name (VARCHAR)category (VARCHAR, e.g., 'Electronics')subcategory (VARCHAR)brand (VARCHAR)4. Dim_Promotion (The How Dimension)
Describes any discount or campaign that impacted the transaction.
promotion_key (Primary Key, Surrogate Key, INT)promo_code (Natural Key, VARCHAR)promotion_type (VARCHAR, e.g., 'Percentage Off', 'BOGO')description (VARCHAR)---
The Fact table records every measurable event. Since you are tracking transactions, this will be a Transactional Fact Table.
Fact_Sales
sales_key (Primary Key, Identity, BIGINT)date_key (FK to Dim_Date)customer_key (FK to Dim_Customer)product_key (FK to Dim_Product)promotion_key (FK to Dim_Promotion, can be NULL if no promo)quantity_sold (INT)unit_price (DECIMAL)extended_price (DECIMAL) (Quantity * Unit Price)discount_amount (DECIMAL)net_sales_amount (DECIMAL) (The actual monetary value recorded for LTV calculation)---
| Business Question | Required Calculation Path | Relevant Tables |
| :--- | :--- | :--- |
| Customer LTV | Sum of net_sales_amount over time, grouped by customer_key. | Fact_Sales $\rightarrow$ Dim_Customer |
| Cohort Analysis | Grouping events by the date_key when the customer was first acquired (Dim_Customer.acquisition_date). | Dim_Customer.acquisition_date $\rightarrow$ Fact_Sales Grouped by Cohort Year/Month |
| Best Selling Products | Sum of quantity_sold or net_sales_amount, grouped by product_key. | Fact_Sales $\rightarrow$ Dim_Product |
| Monthly Performance | Aggregating metrics by date_key attributes (Year, Month, Quarter). | Fact_Sales $\rightarrow$ Dim_Date |
| Promo Impact | Analyzing net sales amount, filtered by a specific promotion_key. | Fact_Sales $\rightarrow$ Dim_Promotion |
---
This architecture requires an Extract, Transform, Load (ETL/ELT) process:
Sales_Transactions table, Customer_CRM table).Dim_Product and Dim_Date. For Dim_Customer, implement SCD Type 2 logic for critical attributes.date_key, customer_key, etc.) using the natural keys found in the raw data.Fact_Sales using the loaded surrogate keys and the calculated measures.Dim_Date is mandatory for time-series analysis.Dim_Customer table to accurately model changing customer profiles over their lifetime.This is a comprehensive setup for a standard Kimball-style data warehouse architecture, using the modern `dbt` workflow. We will define the structure, and then supply the code examples for the key co...
Prompt: Set up a dbt project structure for our e-commerce analytics. We need staging, intermediate, and mart layers. Show me the folder structure, YAML config, and SQL for a customer lifetime value mart model.
This is a comprehensive setup for a standard Kimball-style data warehouse architecture, using the modern dbt workflow.
We will define the structure, and then supply the code examples for the key components: Staging (stg_), Intermediate (int_), and Final Mart (marts/dim_... and marts/fct_...).
---
Assuming the following source data tables exist in your raw schema (e.g., in Snowflake/BigQuery):
raw.orders: Transactional order details.raw.customers: Customer profiles.raw.products: Product catalog details.Your compiled dbt project structure should look like this:
``
your_dbt_project/
├── models/
│ ├── staging/ # 1. Staging Layer (Clean, minimal transformation)
│ │ ├── stg_customers.sql
│ │ ├── stg_orders.sql
│ │ └── stg_products.sql
│ ├── intermediate/ # 2. Intermediate Layer (Join staging data, calculate metrics)
│ │ ├── int_customer_agg.sql
│ │ ├── int_order_details.sql
│ │ └── int_order_line_items.sql
│ └── marts/ # 3. Mart Layer (Final, consumable dimensional/fact tables)
│ ├── dim_customer.sql # Dimension Table
│ ├── dim_product.sql # Dimension Table
│ └── fct_orders.sql # Fact Table (Final Grain)
├── dbt_project.yml # Project configuration
└── profiles.yml # Connection configuration (handled by CLI)
`
---
)Goal: Select only necessary columns, apply basic type casting, and standardize names. No heavy logic here.
models/staging/stg_customers.sql
`sql
{{ config(materialized='view') }}
SELECT
customer_id,
first_name,
last_name,
email,
created_at AS customer_signup_date,
-- Example of cleaning/standardizing messy data
UPPER(coalesce(country, 'UNKNOWN')) AS standard_country
FROM
{{ source('raw', 'customers') }}
`
)Goal: Perform first-level joins and complex metric calculations that will feed the final marts.
models/intermediate/int_order_line_items.sql
`sql
{{ config(materialized='view') }}
SELECT
o.order_id,
o.customer_id,
o.order_date,
p.product_id,
p.product_name,
p.category,
o.quantity,
o.unit_price,
(o.quantity * o.unit_price) AS line_item_revenue -- Calculated metric
FROM
{{ ref('stg_orders') }} o
INNER JOIN
{{ ref('stg_products') }} p ON o.product_id = p.product_id
`
)This is what the end-user/BI tool connects to. We'll implement a Slowly Changing Dimension (SCD) Type 1 for customers and a Fact Table for orders.
1. Dimension: Customer (models/marts/dim_customer.sql)
We use incremental here as customer data might change slightly over time.
`sql
{{ config(
materialized='incremental',
unique_key='customer_id',
incremental_strategy='merge' -- Best practice for SCD 1/2
) }}
SELECT
c.customer_id,
-- SCD Type 1 Logic: Overwrite on change
c.first_name,
c.last_name,
c.email,
c.standard_country,
c.customer_signup_date,
CURRENT_TIMESTAMP() AS dbt_loaded_at -- Audit column
FROM
{{ ref('stg_customers') }} c
{% if is_incremental() %}
-- Only process records newer than the last run
WHERE c.customer_signup_date > (SELECT MAX(dbt_loaded_at) FROM {{ this }})
{% endif %}
`
2. Fact: Orders (models/marts/fct_orders.sql)
This table typically joins dimensions to the transaction facts.
`sql
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge'
) }}
SELECT
o.order_id,
o.order_date,
-- Foreign Keys (linking to Dimensions)
c_dim.customer_id AS customer_fk,
p_dim.product_id AS product_fk,
-- Facts/Metrics
o.quantity,
o.unit_price,
o.line_item_revenue,
-- Date Dimension Keys (if you build a full dedicated date dimension)
-- DATE_DAY(o.order_date) AS date_key
FROM
{{ ref('int_order_line_items') }} o
INNER JOIN
{{ ref('dim_customer') }} c_dim ON o.customer_id = c_dim.customer_id
{# Add other dimension joins here #}
{% if is_incremental() %}
-- Only process orders placed since the last run
WHERE o.order_date > (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
`
---
: You must define your sources so dbt knows where raw. tables are.`yaml
name: 'my_data_warehouse'
version: '1.0.0'
profile: 'default' # Must match your CLI profile
sources:
database: your_raw_database
schema: raw_schema # Schema where raw tables live
tables:
`
function, dbt automatically figures out the dependency graph. You run: reads from raw.customers. reads from stg_orders and stg_products. reads from stg_customers. reads from int_order_line_items and then joins to the newly created/updated dim_customer` table.This provides a clean, reproducible, and scalable data modeling pipeline.
This is a classic and high-value Machine Learning problem. Since the goal is to predict **customer attrition (churn)**, the output will be a **classification problem**. Here is a comprehensive framew...
Prompt: We want to predict customer churn for our SaaS platform. We have 18 months of usage data, payment history, and support tickets. Frame this as an ML problem: target variable, features, model selection, and how to measure success.
This is a classic and high-value Machine Learning problem. Since the goal is to predict customer attrition (churn), the output will be a classification problem.
Here is a comprehensive framework for structuring this project, from problem definition to deployment considerations.
---
The most critical step is defining what "churn" means operationally.
The success of the model isn't just high accuracy; it must drive revenue action.
---
Features must capture changes in customer behavior, not just static facts. These fall into several behavioral buckets.
These quantify how and how often the customer uses the product.
Static and transactional data related to the contract.
Indicates friction points.
Does the customer adopt the "sticky" features?
Churn is usually rare (e.g., 5-10% churn rate). Your dataset will be highly imbalanced.
---
Start with a simple Logistic Regression model. This provides feature importance coefficients that are highly interpretable ("For every extra logged-in day, the odds of churn decrease by X%").
Once the best model is selected, do not just trust the black box. You must know why it predicts churn.
---
The project fails if the model is just a Jupyter Notebook file.
The model must be retrained regularly (e.g., weekly or monthly) using the latest data to capture changing market dynamics.
Instead of treating all high-risk customers equally, segment the output:
| Segment | Risk Score Range | Value Status | Recommended Action |
| :--- | :--- | :--- | :--- |
| Red Alert | High (> 0.7) | High Value | Proactive outreach from Account Executive (AE), specialized tutorial, Executive check-in. |
| Amber Alert | Medium (0.3 - 0.7) | Medium/High | Automated email sequence, usage campaign offering, in-app guidance. |
| Green | Low (< 0.3) | Low Value | Monitoring only (Minimal resource expenditure). |
| False Positive | Low Model Score, High Support Cost | N/A | Process review (Did we misclassify an edge case?). |
| Aspect | Consideration | Why it matters |
| :--- | :--- | :--- |
| Problem Type | Binary Classification | Predicting Yes/No (Churn/Stay) |
| Key Metrics | AUC, Recall (for HVCs) | Focus on catching valuable stragglers. |
| Feature Focus | Behavioral Change (Velocity) | Static data is weak; trends predict failure. |
| Best Model | XGBoost/LightGBM | Superior handling of complex, non-linear interactions. |
| Output Action | Risk Segmentation (Red/Amber) | Ensures sales/success teams focus effort where it matters most. |
A chatbot answers questions. An agent does work. The difference is defined skills, persistent memory, and tool access.
Browse real deliverables from each agent — marketing plans, financial models, legal analysis, IoT specifications, and more.
View Showcase