Database Sharding and User Registration Implementation for a Train Booking System

User Table Structure

The original table structure is designed as follows. Due to the large user base, database sharding becomes necessary.

Sharding Strategy Design

Based on the system design assumptions, the 12306 system needs to support approximately 1 billion registered users, with roughly 10 million new users annually. Before implementing database sharding, we need to determine the optimal number of shards to achieve the best performance. To make this decision, we can estimate the maximum data volume per table based on historical experience. A common practice is to set 20 million records as the threshold per table, which provides a good balance between manageable data volume and efficient CRUD operations.

Given the current user base of 1 billion with 10 million new users per year, and considering a long system lifecycle, the total data volume may reach approximately 3 billion records. With each table holding around 20 million records, we would need approximately 150 tables to accommodate this data.

When performing sharding capacity assessment, it is advisable to overestimate rather than underestimate. This approach helps identify potential data distribution issues early, allowing for timely adjustments. The threshold we chose is relatively conservative because our system has excellent scalability and can easily handle significant data growth. This sharding strategy ensures the system can handle data for centuries without performance degradation.

Selecting the appropriate sharding key is a critical decision that directly impacts system performence and scalability. Here are the key factors to consider:

  1. Access Frequency: The sharding key should account for data access patterns. Keeping frequently accessed data on the same shard improves query performance and reduces cross-shard query overhead.
  2. Data Distribution: The sharding key should ensure uniform data distribution across all shards, preventing hot spots where certain shards receive significantly more traffic than others.
  3. Business Relevance: The sharding key should align closely with business operations to minimize cross-shard queries and complex distributed transactions.
  4. Immutability: Once selected, the sharding key should remain constant and not change frequently with business variations.

Based on these considerations, we select the username field as the sharding key.

Sharding Implementation

The implementation uses ShardingSphere for database sharding operations. For detailed configuration, refer to the official ShardingSphere documentation.

Step 1: Add ShardingSphere Dependency

<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>shardingsphere-jdbc-core</artifactId>
    <version>5.3.2</version>
</dependency>

Step 2: Configure Sharding Rules

spring:
  application:
    name: train-booking-user-service
  datasource:
    driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
    url: jdbc:shardingsphere:classpath:shardingsphere-config.yaml

Step 3: User Sharding Configuration

For demonstration purposes, we use 2 databases with 16 tables each.

shardingsphere-config.yaml:

dataSources:
  ds_0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    jdbcUrl: jdbc:mysql://127.0.0.1:3306/train_user_0?useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true&allowMultiQueries=true&serverTimezone=Asia/Shanghai
    username: root
    password: root
  ds_1:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    jdbcUrl: jdbc:mysql://127.0.0.1:3306/train_user_1?useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true&allowMultiQueries=true&serverTimezone=Asia/Shanghai
    username: root
    password: root

rules:
  - !SHARDING
    tables:
      t_user:
        actualDataNodes: ds_${0..1}.t_user_${0..31}
        databaseStrategy:
          standard:
            shardingColumn: username
            shardingAlgorithmName: user_database_hash_mod
        tableStrategy:
          standard:
            shardingColumn: username
            shardingAlgorithmName: user_table_hash_mod
      t_passenger:
        actualDataNodes: ds_${0..1}.t_passenger_${0..31}
        databaseStrategy:
          standard:
            shardingColumn: username
            shardingAlgorithmName: passenger_database_hash_mod
        tableStrategy:
          standard:
            shardingColumn: username
            shardingAlgorithmName: passenger_table_hash_mod
      t_user_mail:
        actualDataNodes: ds_${0..1}.t_user_mail_${0..31}
        databaseStrategy:
          standard:
            shardingColumn: mail
            shardingAlgorithmName: t_user_mail_database_hash_mod
        tableStrategy:
          standard:
            shardingColumn: mail
            shardingAlgorithmName: t_user_mail_table_hash_mod
      t_user_phone:
        actualDataNodes: ds_${0..1}.t_user_phone_${0..31}
        databaseStrategy:
          standard:
            shardingColumn: phone
            shardingAlgorithmName: t_user_phone_database_hash_mod
        tableStrategy:
          standard:
            shardingColumn: phone
            shardingAlgorithmName: t_user_phone_table_hash_mod
    shardingAlgorithms:
      user_database_hash_mod:
        type: CLASS_BASED
        props:
          sharding-count: 32
          table-sharding-count: 16
          strategy: standard
          algorithmClassName: org.example.train.framework.starter.database.algorithm.CustomDbHashModAlgorithm
      passenger_database_hash_mod:
        type: CLASS_BASED
        props:
          sharding-count: 32
          table-sharding-count: 16
          strategy: standard
          algorithmClassName: org.example.train.framework.starter.database.algorithm.CustomDbHashModAlgorithm
      t_user_mail_database_hash_mod:
        type: CLASS_BASED
        props:
          sharding-count: 32
          table-sharding-count: 16
          strategy: standard
          algorithmClassName: org.example.train.framework.starter.database.algorithm.CustomDbHashModAlgorithm
      t_user_phone_database_hash_mod:
        type: CLASS_BASED
        props:
          sharding-count: 32
          table-sharding-count: 16
          strategy: standard
          algorithmClassName: org.example.train.framework.starter.database.algorithm.CustomDbHashModAlgorithm
      passenger_table_hash_mod:
        type: HASH_MOD
        props:
          sharding-count: 32
      t_user_mail_table_hash_mod:
        type: HASH_MOD
        props:
          sharding-count: 32
      t_user_phone_table_hash_mod:
        type: HASH_MOD
        props:
          sharding-count: 32
      user_table_hash_mod:
        type: HASH_MOD
        props:
          sharding-count: 32
  - !ENCRYPT
    tables:
      t_user:
        columns:
          id_card:
            cipherColumn: id_card
            encryptorName: common_encryptor
          phone:
            cipherColumn: phone
            encryptorName: common_encryptor
          mail:
            cipherColumn: mail
            encryptorName: common_encryptor
          address:
            cipherColumn: address
            encryptorName: common_encryptor
      t_passenger:
        columns:
          id_card:
            cipherColumn: id_card
            encryptorName: common_encryptor
          phone:
            cipherColumn: phone
            encryptorName: common_encryptor
        queryWithCipherColumn: true
    encryptors:
      common_encryptor:
        type: AES
        props:
          aes-key-value: d6oadClrrb9A3GWo
props:
  sql-show: true

Cache Penetration Solution

Cache penetration occurs when a cache system receives requests for data that does not exist, causing every request to hit the database or other storage systems, effectively bypassing the cache. This significantly degrades system performance and typically happens during malicious attacks, high concurrency scenarios with non-existent cache entries, or after cache data expires.

Cache penetration leads to the following issues:

  1. Increased database load due to frequent queries, reducing system throughput.
  2. Memory exhaustion on cache servers from storing large amounts of non-existent data requests.
  3. Poor user experience due to longer response times when data cannot be served from cache.

We solve this problem using a Bloom filter combined with caching.

User Registration API

The user registration flow is illustrated below:

Chain of Responsibility Pattern

In this system, the chain of responsibility pattern is implemented in three steps: defining the method input parameters, creating the business filter interface, and implementing specific filter executors.

Step 1: Define Method Input Parameters

Since this is a user registration endpoint, we can directly use the registration request entity for validation.

@Data
public class UserRegisterRequest {

    private String username;
    private String password;
    private String realName;
    private Integer idType;
    private String idCard;
    private String phone;
    private String mail;
    private Integer userType;
    private Integer verifyState;
    private String postCode;
    private String address;
    private String region;
    private String telephone;
}

Step 2: Define Business Filter Interface

public interface UserRegisterChainFilter<T extends UserRegisterRequest> extends AbstractChainHandler<UserRegisterRequest> {

    @Override
    default String mark() {
        return UserChainMarkEnum.USER_REGISTER_FILTER.name();
    }
}

Step 3: Implement Specific Filter Handlers

When defining filter handlers, the getOrder() method must be used to determine execution order. Typically, validation strategies that execute quickly and in-memory should run first, while handlers requiring external interactions such as Redis should execute later.

For example, if a user does not provide an ID card number, executing username validation first would be wasteful since parameter validation will fail anyway, resulting in unnecessary cache queries. Although the performance impact may be small, these unnecessary operations should be avoided.

By properly using the getOrder() interface, we can optimize the execution sequence, prioritizing efficient operations and avoiding unnecessary performance overhead.

User Table Related Operations

The login functionality allows users to authenticate using either username, email, or phone number combined with a password. It is important to note that in our sharding strategy, data is partitioned by username. Therefore, querying user information without the username will cause read dispersion issues.

To address this, we introduced two routing tables: the user phone table and user email table. These tables contain the phone number, email address, and their corresponding usernames. This design enables flexible authentication using phone number, email, or username during login.

Additional Operations

After user registration, the username becomes unavailable. We need to add it to the Bloom filter to prevent others from registering with the same username.

Regarding username reusability, we have carefully considered this requirement. Usernames from deleted acounts should be available for new registrations. Therefore, we implemented an extension for reusable usernames.

This means we need to delete both the cache entries and database records for the reusable username. Once deleted, subsequent users cannot use this username through this extension point. This design ensures proper username reuse while preventing issues with deleted account information affecting new users.

Tags: database-sharding MySQL ShardingSphere java spring-boot

Posted on Sat, 20 Jun 2026 16:47:10 +0000 by ari_aaron