User Table Structure
The original table structure is designed as follows. Due to the large user base, database sharding becomes necessary.
Sharding Strategy Design
Based on the system design assumptions, the 12306 system needs to support approximately 1 billion registered users, with roughly 10 million new users annually. Before implementing database sharding, we need to determine the optimal number of shards to achieve the best performance. To make this decision, we can estimate the maximum data volume per table based on historical experience. A common practice is to set 20 million records as the threshold per table, which provides a good balance between manageable data volume and efficient CRUD operations.
Given the current user base of 1 billion with 10 million new users per year, and considering a long system lifecycle, the total data volume may reach approximately 3 billion records. With each table holding around 20 million records, we would need approximately 150 tables to accommodate this data.
When performing sharding capacity assessment, it is advisable to overestimate rather than underestimate. This approach helps identify potential data distribution issues early, allowing for timely adjustments. The threshold we chose is relatively conservative because our system has excellent scalability and can easily handle significant data growth. This sharding strategy ensures the system can handle data for centuries without performance degradation.
Selecting the appropriate sharding key is a critical decision that directly impacts system performence and scalability. Here are the key factors to consider:
- Access Frequency: The sharding key should account for data access patterns. Keeping frequently accessed data on the same shard improves query performance and reduces cross-shard query overhead.
- Data Distribution: The sharding key should ensure uniform data distribution across all shards, preventing hot spots where certain shards receive significantly more traffic than others.
- Business Relevance: The sharding key should align closely with business operations to minimize cross-shard queries and complex distributed transactions.
- Immutability: Once selected, the sharding key should remain constant and not change frequently with business variations.
Based on these considerations, we select the username field as the sharding key.
Sharding Implementation
The implementation uses ShardingSphere for database sharding operations. For detailed configuration, refer to the official ShardingSphere documentation.
Step 1: Add ShardingSphere Dependency
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-jdbc-core</artifactId>
<version>5.3.2</version>
</dependency>
Step 2: Configure Sharding Rules
spring:
application:
name: train-booking-user-service
datasource:
driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
url: jdbc:shardingsphere:classpath:shardingsphere-config.yaml
Step 3: User Sharding Configuration
For demonstration purposes, we use 2 databases with 16 tables each.
shardingsphere-config.yaml:
dataSources:
ds_0:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: com.mysql.cj.jdbc.Driver
jdbcUrl: jdbc:mysql://127.0.0.1:3306/train_user_0?useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true&allowMultiQueries=true&serverTimezone=Asia/Shanghai
username: root
password: root
ds_1:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: com.mysql.cj.jdbc.Driver
jdbcUrl: jdbc:mysql://127.0.0.1:3306/train_user_1?useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true&allowMultiQueries=true&serverTimezone=Asia/Shanghai
username: root
password: root
rules:
- !SHARDING
tables:
t_user:
actualDataNodes: ds_${0..1}.t_user_${0..31}
databaseStrategy:
standard:
shardingColumn: username
shardingAlgorithmName: user_database_hash_mod
tableStrategy:
standard:
shardingColumn: username
shardingAlgorithmName: user_table_hash_mod
t_passenger:
actualDataNodes: ds_${0..1}.t_passenger_${0..31}
databaseStrategy:
standard:
shardingColumn: username
shardingAlgorithmName: passenger_database_hash_mod
tableStrategy:
standard:
shardingColumn: username
shardingAlgorithmName: passenger_table_hash_mod
t_user_mail:
actualDataNodes: ds_${0..1}.t_user_mail_${0..31}
databaseStrategy:
standard:
shardingColumn: mail
shardingAlgorithmName: t_user_mail_database_hash_mod
tableStrategy:
standard:
shardingColumn: mail
shardingAlgorithmName: t_user_mail_table_hash_mod
t_user_phone:
actualDataNodes: ds_${0..1}.t_user_phone_${0..31}
databaseStrategy:
standard:
shardingColumn: phone
shardingAlgorithmName: t_user_phone_database_hash_mod
tableStrategy:
standard:
shardingColumn: phone
shardingAlgorithmName: t_user_phone_table_hash_mod
shardingAlgorithms:
user_database_hash_mod:
type: CLASS_BASED
props:
sharding-count: 32
table-sharding-count: 16
strategy: standard
algorithmClassName: org.example.train.framework.starter.database.algorithm.CustomDbHashModAlgorithm
passenger_database_hash_mod:
type: CLASS_BASED
props:
sharding-count: 32
table-sharding-count: 16
strategy: standard
algorithmClassName: org.example.train.framework.starter.database.algorithm.CustomDbHashModAlgorithm
t_user_mail_database_hash_mod:
type: CLASS_BASED
props:
sharding-count: 32
table-sharding-count: 16
strategy: standard
algorithmClassName: org.example.train.framework.starter.database.algorithm.CustomDbHashModAlgorithm
t_user_phone_database_hash_mod:
type: CLASS_BASED
props:
sharding-count: 32
table-sharding-count: 16
strategy: standard
algorithmClassName: org.example.train.framework.starter.database.algorithm.CustomDbHashModAlgorithm
passenger_table_hash_mod:
type: HASH_MOD
props:
sharding-count: 32
t_user_mail_table_hash_mod:
type: HASH_MOD
props:
sharding-count: 32
t_user_phone_table_hash_mod:
type: HASH_MOD
props:
sharding-count: 32
user_table_hash_mod:
type: HASH_MOD
props:
sharding-count: 32
- !ENCRYPT
tables:
t_user:
columns:
id_card:
cipherColumn: id_card
encryptorName: common_encryptor
phone:
cipherColumn: phone
encryptorName: common_encryptor
mail:
cipherColumn: mail
encryptorName: common_encryptor
address:
cipherColumn: address
encryptorName: common_encryptor
t_passenger:
columns:
id_card:
cipherColumn: id_card
encryptorName: common_encryptor
phone:
cipherColumn: phone
encryptorName: common_encryptor
queryWithCipherColumn: true
encryptors:
common_encryptor:
type: AES
props:
aes-key-value: d6oadClrrb9A3GWo
props:
sql-show: true
Cache Penetration Solution
Cache penetration occurs when a cache system receives requests for data that does not exist, causing every request to hit the database or other storage systems, effectively bypassing the cache. This significantly degrades system performance and typically happens during malicious attacks, high concurrency scenarios with non-existent cache entries, or after cache data expires.
Cache penetration leads to the following issues:
- Increased database load due to frequent queries, reducing system throughput.
- Memory exhaustion on cache servers from storing large amounts of non-existent data requests.
- Poor user experience due to longer response times when data cannot be served from cache.
We solve this problem using a Bloom filter combined with caching.
User Registration API
The user registration flow is illustrated below:
Chain of Responsibility Pattern
In this system, the chain of responsibility pattern is implemented in three steps: defining the method input parameters, creating the business filter interface, and implementing specific filter executors.
Step 1: Define Method Input Parameters
Since this is a user registration endpoint, we can directly use the registration request entity for validation.
@Data
public class UserRegisterRequest {
private String username;
private String password;
private String realName;
private Integer idType;
private String idCard;
private String phone;
private String mail;
private Integer userType;
private Integer verifyState;
private String postCode;
private String address;
private String region;
private String telephone;
}
Step 2: Define Business Filter Interface
public interface UserRegisterChainFilter<T extends UserRegisterRequest> extends AbstractChainHandler<UserRegisterRequest> {
@Override
default String mark() {
return UserChainMarkEnum.USER_REGISTER_FILTER.name();
}
}
Step 3: Implement Specific Filter Handlers
When defining filter handlers, the getOrder() method must be used to determine execution order. Typically, validation strategies that execute quickly and in-memory should run first, while handlers requiring external interactions such as Redis should execute later.
For example, if a user does not provide an ID card number, executing username validation first would be wasteful since parameter validation will fail anyway, resulting in unnecessary cache queries. Although the performance impact may be small, these unnecessary operations should be avoided.
By properly using the getOrder() interface, we can optimize the execution sequence, prioritizing efficient operations and avoiding unnecessary performance overhead.
User Table Related Operations
The login functionality allows users to authenticate using either username, email, or phone number combined with a password. It is important to note that in our sharding strategy, data is partitioned by username. Therefore, querying user information without the username will cause read dispersion issues.
To address this, we introduced two routing tables: the user phone table and user email table. These tables contain the phone number, email address, and their corresponding usernames. This design enables flexible authentication using phone number, email, or username during login.
Additional Operations
After user registration, the username becomes unavailable. We need to add it to the Bloom filter to prevent others from registering with the same username.
Regarding username reusability, we have carefully considered this requirement. Usernames from deleted acounts should be available for new registrations. Therefore, we implemented an extension for reusable usernames.
This means we need to delete both the cache entries and database records for the reusable username. Once deleted, subsequent users cannot use this username through this extension point. This design ensures proper username reuse while preventing issues with deleted account information affecting new users.