Analyzing Taobao User Behavior with SQL

Data Source Overview

We analyzed 1,048,575 behavioral records from 8,477 random users between November 18 and December 18, 2014. The dataset contains these fields:

user_id: Unique user identifier
item_id: Product ID
behavior_type: Action type (1=pv/view, 2=fav/favorite, 3=cart/add to cart, 4=buy/purchase)
user_geohash: Geographic location (contains nulls)
item_category: Product category ID
time: Timestamp of action

Analytical Objectives

  1. Overall shopping metrics: PV, daily PV, UV, purchasing users, repurchase rate
  2. Behavior conversion funnel: Click-to-cart, cart-to-purchase rates
  3. Characteristics of high/low purchase rate users
  4. Temporal behavioral patterns
  5. RFM-based user segmentation

Data Preparation

Data Import

We imported records in to MySQL using ETL tools for efficiency with the table named 'user_behavior'.

Data Cleaning

The user_geohash column was excluded due to null values and encryption limitations. We split the timestamp into separate date and hour columns:

ALTER TABLE user_behavior
ADD COLUMN action_date DATE,
ADD COLUMN action_hour TINYINT;

UPDATE user_behavior
SET action_date = DATE(time),
    action_hour = HOUR(time);

Behavior types were relabeled for readability:

UPDATE user_behavior
SET behavior_type = CASE
    WHEN behavior_type = 1 THEN 'pv'
    WHEN behavior_type = 2 THEN 'fav'
    WHEN behavior_type = 3 THEN 'cart'
    WHEN behavior_type = 4 THEN 'buy'
END;

Analytical Insights

Key Metrics

Core shopping metrics were calculated:

-- Total page views
SELECT COUNT(*) AS total_pv FROM user_behavior;

-- Daily active users
SELECT AVG(daily_uv) FROM (
    SELECT action_date, COUNT(DISTINCT user_id) AS daily_uv
    FROM user_behavior
    GROUP BY action_date
) AS daily_stats;

-- Purchasing users
SELECT COUNT(DISTINCT user_id) AS buyers
FROM user_behavior
WHERE behavior_type = 'buy';

Conversion Funnel

Behavioral conversion rates showed:

WITH funnel AS (
    SELECT 
        COUNT(DISTINCT CASE WHEN behavior_type = 'pv' THEN user_id END) AS viewers,
        COUNT(DISTINCT CASE WHEN behavior_type IN ('cart','fav') THEN user_id END) AS engagers,
        COUNT(DISTINCT CASE WHEN behavior_type = 'buy' THEN user_id END) AS buyers
    FROM user_behavior
)
SELECT 
    ROUND((engagers/viewers)*100, 2) AS engagement_rate,
    ROUND((buyers/engagers)*100, 2) AS conversion_rate
FROM funnel;

The overall purchase conversion rate was 1.04%, indicating significant growth potential compared to industry benchmarks.

User Segmentation

High-purchase-rate users typical demonstrated:

  • Targeted browsing (≤5 views per purchase)
  • Minimal cart/favorite actions
  • Purpose-driven shopping patterns

Low-purchase-rate users exhibited two distinct behaviors:

  • Minimal engagement (potential usability barriers)
  • High browsing with low conversion (comparison shopppers)

Temporal Patterns

Daily activity peaked between 19:00-23:00 with consistent weekly patterns showing:

SELECT 
    DAYNAME(action_date) AS weekday,
    COUNT(*) AS actions
FROM user_behavior
GROUP BY weekday
ORDER BY actions DESC;

Friday activity dips were observed, with recovery during weekends. Activity spikes occurred during promotional events.

RFM Analysis

User segmentation using recency and frequency metrics:

WITH buyers AS (
    SELECT user_id
    FROM user_behavior
    WHERE behavior_type = 'buy'
    GROUP BY user_id
    HAVING COUNT(*) >= 1
),
recency_rank AS (
    SELECT 
        user_id,
        DATEDIFF('2014-12-19', MAX(action_date)) AS days_since_last,
        NTILE(4) OVER (ORDER BY DATEDIFF('2014-12-19', MAX(action_date))) AS recency_quartile
    FROM user_behavior
    WHERE user_id IN (SELECT user_id FROM buyers)
    GROUP BY user_id
),
frequency_rank AS (
    SELECT 
        user_id,
        COUNT(*) AS purchase_count,
        NTILE(4) OVER (ORDER BY COUNT(*) DESC) AS frequency_quartile
    FROM user_behavior
    WHERE user_id IN (SELECT user_id FROM buyers)
    AND behavior_type = 'buy'
    GROUP BY user_id
)
SELECT 
    r.user_id,
    CONCAT(r.recency_quartile, f.frequency_quartile) AS rfm_score
FROM recency_rank r
JOIN frequency_rank f ON r.user_id = f.user_id;

RFM scoring enables targeted engagement strategies:

  • High-value users (score 44): Priority retention
  • Frequent but recent shoppers (score 41): Discount opportunities

Automated Reporting

Scheduled ETL processes enable monthly RFM score updates to track user value changes.

Tags: MySQL RFM ETL UserAnalysis Taobao

Posted on Sat, 27 Jun 2026 16:31:44 +0000 by lajkonik86