Building an E-commerce Data Warehouse with Kettle and MySQL

Business Overview

A small online retailer wants daily insight into sales performance—order volume, revenue, and payment-method splits. The operational system (MySQL schema itcast_shop) contains six transactional tables that must be moved into a dedicated analytics store (itcast_shop_bi) and then aggregated for reporting.

Operational Schema

Table Purpose
ityewu_areas Province → city → district hierarchy
ityewu_goods SKU master data
ityewu_goods_cats Three-level product category tree
ityewu_orders Order header
ityewu_order_goods Order line items (1-to-many with orders)
ityewu_users Customer master

Architecture

ODS (Operational Data Store) → APP (pre-aggregated results) → FineBI dashboards.
ETL engine: Pentaho Data Integration (Kettle).
Repository: MySQL (shared DB repo for version control and scheduling). Repository Setup

  1. Open Spoon → Connect → Repository Manager → Add → Database Repository.
  2. Create a new JDBC connection to itcast_shop_bi, test, and save.
  3. Default admin/admin credentials let the team share jobs/transformations.

ODS Layer – Raw Copy

All source tables are copied 1-for-1 into ods_ tables with an extra etl_date column.

Incremental Strategy

Source ODS Table Frequency Mode
ityewu_areas ods_ityewu_areas Monthly Insert/Update on areaId
ityewu_goods_cats ods_ityewu_goods_cats month Insert/Update on catId
ityewu_goods ods_ityewu_goods daily Insert/Update on goodsId
ityewu_orders ods_ityewu_orders daily Insert/Update on orderId
ityewu_order_goods ods_ityewu_order_goods daily Insert/Update on ogId
ityewu_users ods_ityewu_users daily Insert/Update on userId

Kettle Job Example – Monthly Static Tables

Start
  |
  +-- Transformation: areas_to_ods
  |      Table Input: SELECT *, CURDATE() AS etl_date FROM itcast_shop.ityewu_areas
  |      Insert/Update: target = ods_ityewu_areas, keys = areaId
  |
  +-- Transformation: cats_to_ods
         Table Input: SELECT *, CURDATE() AS etl_date FROM itcast_shop.ityewu_goods_cats
         Insert/Update: target = ods_ityewu_goods_cats, keys = catId

Schedule the job via Start → Repeat → Monthly → 1st 02:00.

APP Layer – Business Aggregates

All result tables are prefixed app_ and refreshed daily.

1. Daily Order Revenue & Count

CREATE TABLE app_order_total (
  id          INT AUTO_INCREMENT PRIMARY KEY,
  dt          DATE,
  total_money DOUBLE,
  total_cnt   INT
);

INSERT INTO app_order_total(dt, total_money, total_cnt)
SELECT
  DATE(createTime)            AS dt,
  SUM(realTotalMoney)         AS total_money,
  COUNT(1)                    AS total_cnt
FROM ods_ityewu_orders
GROUP BY dt;

2. Daily Distinct Buyers

CREATE TABLE app_order_user (
  id             INT AUTO_INCREMENT PRIMARY KEY,
  dt             DATE,
  total_user_cnt INT
);

INSERT INTO app_order_user(dt, total_user_cnt)
SELECT
  DATE(createTime) AS dt,
  COUNT(DISTINCT userId)
FROM ods_ityewu_orders
GROUP BY dt;

3. Daily Payment-Method Split

CREATE TABLE app_order_paytype (
  id          INT AUTO_INCREMENT PRIMARY KEY,
  dt          DATE,
  paytype     VARCHAR(20),
  total_money DOUBLE,
  total_cnt   INT
);

INSERT INTO app_order_paytype(dt, paytype, total_money, total_cnt)
SELECT
  DATE(createTime) AS dt,
  CASE payType
       WHEN 1 THEN 'Alipay'
       WHEN 2 THEN 'WeChat'
       WHEN 3 THEN 'Card'
       ELSE 'Other'
  END              AS paytype,
  SUM(realTotalMoney),
  COUNT(1)
FROM ods_ityewu_orders
GROUP BY dt, payType;

4. Top-5 Power Users by Order Count

CREATE TABLE app_order_user_top5 (
  id         INT AUTO_INCREMENT PRIMARY KEY,
  dt         DATE,
  userid     VARCHAR(20),
  username   VARCHAR(50),
  total_cnt  INT
);

INSERT INTO app_order_user_top5(dt, userid, username, total_cnt)
SELECT
  DATE(createTime) AS dt,
  userId,
  userName,
  COUNT(orderId)   AS total_cnt
FROM ods_ityewu_orders
WHERE DATE(createTime) = CURDATE() - INTERVAL 1 DAY
GROUP BY dt, userId, userName
ORDER BY total_cnt DESC
LIMIT 5;

5. Revenue & Orders by Top-Level Category

Because the category tree is self-referencing, we first materialize a flattened helper:

CREATE TABLE tmp_goods_cat AS
SELECT
  l3.catId  AS cat_id_l3,
  l2.catId  AS cat_id_l2,
  l1.catId  AS cat_id_l1,
  l1.catName AS cat_name_l1
FROM ods_ityewu_goods_cats l3
JOIN ods_ityewu_goods_cats l2 ON l3.parentId = l2.catId
JOIN ods_ityewu_goods_cats l1 ON l2.parentId = l1.catId
WHERE l3.cat_level = 3;

CREATE TABLE app_order_goods_cat (
  id          INT AUTO_INCREMENT PRIMARY KEY,
  dt          DATE,
  cat_name    VARCHAR(50),
  total_money DOUBLE,
  total_cnt   INT
);

INSERT INTO app_order_goods_cat(dt, cat_name, total_money, total_cnt)
SELECT
  DATE(o.createTime)               AS dt,
  c.cat_name_l1,
  SUM(og.goodsNum * og.payPrice),
  COUNT(DISTINCT og.orderId)
FROM tmp_goods_cat c
JOIN ods_ityewu_goods g   ON g.goodsCatId = c.cat_id_l3
JOIN ods_ityewu_order_goods og ON og.goodsId = g.goodsId
JOIN ods_ityewu_orders o       ON o.orderId = og.orderId
GROUP BY dt, c.cat_name_l1;

Automation

A single Kettle job orchestrates the daily flow:

  1. Run incremental transformations for goods, orders, order_goods, users.
  2. Execute the five SQL scripts above to refresh app_ tables.
  3. Trigger FineBI cache refresh via REST call (optional).

Schedule: every day at 01:30 via Start → Repeat → Daily.

Dashboards

FineBI connects to itcast_shop_bi and reads the app_ tables directly, giving analysts drag-and-drop access to:

  • Daily revenue trend
  • Payment mix pie chart
  • Top buyers leaderboard
  • Category performance heat map

Tags: MySQL Kettle Pentaho-Data-Integration FineBI data-warehouse

Posted on Tue, 19 May 2026 03:36:23 +0000 by icd_lx