Age Calculation from National ID Numbers in Hive

Age Calculation from National ID Numbers in Hive

In many enterprise applications, determining user age from national identification numbers is essential for compliance, age verification, and personalized services. This technical guide demonstrates how to implement age calculation from Chinese ID numbers using Hive SQL.

Understanding ID Number Structure

Chinese national ID numbers contain specific segments of information:

  • First 6 digits: Regional code
  • Next 8 digits: Birth date (YYYYMMDD format)
  • Following 3 digits: Sequence number
  • Last digit: Check digit

Extracting Birth Date in Hive

The following Hive function extracts the birth date components from an ID number:

CREATE TEMPORARY FUNCTION extract_birthdate AS 'com.example.udf.ExtractBirthdate';

-- Extract birth date components
SELECT 
  id_number,
  extract_birthdate(id_number) AS birth_date
FROM user_ids;

Calculating Age in Hive

Once we have the birth date, we can calculate the age using Hive date functions:

CREATE TEMPORARY FUNCTION calculate_age AS 'com.example.udf.CalculateAge';

-- Calculate current age
SELECT 
  id_number,
  calculate_age(extract_birthdate(id_number)) AS current_age
FROM user_ids;

Complete Implementation

Here's a comprehensive solution that handles edge cases:

-- Create UDF for extracting birth date
ADD FILE /path/to/ExtractBirthdate.jar;
CREATE TEMPORARY FUNCTION extract_birthdate AS 'com.example.udf.ExtractBirthdate';

-- Create UDF for age calculation
ADD FILE /path/to/CalculateAge.jar;
CREATE TEMPORARY FUNCTION calculate_age AS 'com.example.udf.CalculateAge';

-- Main query
SELECT 
  user_id,
  id_number,
  extract_birthdate(id_number) AS birth_date,
  calculate_age(extract_birthdate(id_number)) AS age,
  CASE 
    WHEN calculate_age(extract_birthdate(id_number)) >= 18 THEN 'Adult'
    WHEN calculate_age(extract_birthdate(id_number)) >= 13 THEN 'Teenager'
    ELSE 'Child'
  END AS age_group
FROM user_profiles
WHERE length(id_number) = 18;

Handling Edge Cases

When implementing this solution, consider these edge cases:

  • Invalid ID numbers (wrong length or format)
  • Future birth dates
  • Leap year birthdays
  • Time zone differences for date calculations

The provided implementation includes validation to handle these scenarios gracefully.

Tags: Hive sql Age Calculation National ID Data Processing

Posted on Fri, 08 May 2026 04:02:33 +0000 by simonb