Data Source Overview
We analyzed 1,048,575 behavioral records from 8,477 random users between November 18 and December 18, 2014. The dataset contains these fields:
user_id: Unique user identifier
item_id: Product ID
behavior_type: Action type (1=pv/view, 2=fav/favorite, 3=cart/add to cart, 4=buy/purchase)
user_geohash: Geographic location (contains nulls)
item_category: Product category ID
time: Timestamp of action
Analytical Objectives
- Overall shopping metrics: PV, daily PV, UV, purchasing users, repurchase rate
- Behavior conversion funnel: Click-to-cart, cart-to-purchase rates
- Characteristics of high/low purchase rate users
- Temporal behavioral patterns
- RFM-based user segmentation
Data Preparation
Data Import
We imported records in to MySQL using ETL tools for efficiency with the table named 'user_behavior'.
Data Cleaning
The user_geohash column was excluded due to null values and encryption limitations. We split the timestamp into separate date and hour columns:
ALTER TABLE user_behavior
ADD COLUMN action_date DATE,
ADD COLUMN action_hour TINYINT;
UPDATE user_behavior
SET action_date = DATE(time),
action_hour = HOUR(time);
Behavior types were relabeled for readability:
UPDATE user_behavior
SET behavior_type = CASE
WHEN behavior_type = 1 THEN 'pv'
WHEN behavior_type = 2 THEN 'fav'
WHEN behavior_type = 3 THEN 'cart'
WHEN behavior_type = 4 THEN 'buy'
END;
Analytical Insights
Key Metrics
Core shopping metrics were calculated:
-- Total page views
SELECT COUNT(*) AS total_pv FROM user_behavior;
-- Daily active users
SELECT AVG(daily_uv) FROM (
SELECT action_date, COUNT(DISTINCT user_id) AS daily_uv
FROM user_behavior
GROUP BY action_date
) AS daily_stats;
-- Purchasing users
SELECT COUNT(DISTINCT user_id) AS buyers
FROM user_behavior
WHERE behavior_type = 'buy';
Conversion Funnel
Behavioral conversion rates showed:
WITH funnel AS (
SELECT
COUNT(DISTINCT CASE WHEN behavior_type = 'pv' THEN user_id END) AS viewers,
COUNT(DISTINCT CASE WHEN behavior_type IN ('cart','fav') THEN user_id END) AS engagers,
COUNT(DISTINCT CASE WHEN behavior_type = 'buy' THEN user_id END) AS buyers
FROM user_behavior
)
SELECT
ROUND((engagers/viewers)*100, 2) AS engagement_rate,
ROUND((buyers/engagers)*100, 2) AS conversion_rate
FROM funnel;
The overall purchase conversion rate was 1.04%, indicating significant growth potential compared to industry benchmarks.
User Segmentation
High-purchase-rate users typical demonstrated:
- Targeted browsing (≤5 views per purchase)
- Minimal cart/favorite actions
- Purpose-driven shopping patterns
Low-purchase-rate users exhibited two distinct behaviors:
- Minimal engagement (potential usability barriers)
- High browsing with low conversion (comparison shopppers)
Temporal Patterns
Daily activity peaked between 19:00-23:00 with consistent weekly patterns showing:
SELECT
DAYNAME(action_date) AS weekday,
COUNT(*) AS actions
FROM user_behavior
GROUP BY weekday
ORDER BY actions DESC;
Friday activity dips were observed, with recovery during weekends. Activity spikes occurred during promotional events.
RFM Analysis
User segmentation using recency and frequency metrics:
WITH buyers AS (
SELECT user_id
FROM user_behavior
WHERE behavior_type = 'buy'
GROUP BY user_id
HAVING COUNT(*) >= 1
),
recency_rank AS (
SELECT
user_id,
DATEDIFF('2014-12-19', MAX(action_date)) AS days_since_last,
NTILE(4) OVER (ORDER BY DATEDIFF('2014-12-19', MAX(action_date))) AS recency_quartile
FROM user_behavior
WHERE user_id IN (SELECT user_id FROM buyers)
GROUP BY user_id
),
frequency_rank AS (
SELECT
user_id,
COUNT(*) AS purchase_count,
NTILE(4) OVER (ORDER BY COUNT(*) DESC) AS frequency_quartile
FROM user_behavior
WHERE user_id IN (SELECT user_id FROM buyers)
AND behavior_type = 'buy'
GROUP BY user_id
)
SELECT
r.user_id,
CONCAT(r.recency_quartile, f.frequency_quartile) AS rfm_score
FROM recency_rank r
JOIN frequency_rank f ON r.user_id = f.user_id;
RFM scoring enables targeted engagement strategies:
- High-value users (score 44): Priority retention
- Frequent but recent shoppers (score 41): Discount opportunities
Automated Reporting
Scheduled ETL processes enable monthly RFM score updates to track user value changes.