Overview
TiDB is a cloud-native distributed database that combines horizontal scalability with financial-grade high availability. It delivers real-time HTAP (Hybrid Transactional and Analytical Processing) capabilities and maintains MySQL 5.7 protocol compatibility.
Architecture Components
TiDB Server
The TiDB Server acts as the SQL layer and serves as the entry point for client connections. It handles the following core responsibilities:
- Managing client connections and authentication
- Parsing, compiling, and generating execution plans for SQL statements
- Converting relational data to key-value pairs for storage in TiKV
- Executing DDL operations and handling online schema changes
- Managing garbage collection with a configurable default interval
Request Processing Flow
Client Request
→ Protocol Parsing
→ SQL Parsing
→ Query Optimization
→ Execution Plan Generation
→ Distributed Execution
→ Result Return
TiKV
TiKV serves as the distributed storage engine for TiDB, providing persistent data storage with strong consistency guarantees.
Storage Architecture
Region (Data Shard)
├── Leader
└── Followers (Multiple Replicas)
├── Follower 1
└── Follower 2
Core Capabilities
- Durable data persistence
- Strong consistency and high availability through replica management
- MVCC (Multi-Version Concurrency Control) for concurrent access
- Distributed transaction support
- Coprocessor for pushdown computation to storage nodes
Internal Implementation
RocksDB: Persistent key-value storage engine
Raft: Consensus protocol for distributed consistency
MVCC: Version management for concurrent transactions
Transaction: Two-phase commit implementation
Data Distribution
Data sharding is implemented through Regions organized by key ranges. Each Region typically spans 96MB to 140MB. When size thresholds are exceeded, automatic splitting occurs. The scheduler continuously rebalances load across the cluster. Replication uses the Multi-Raft model with 3 or 5 replicas, ensuring automatic failover and strong consistency guarantees.
TiFlash
TiFlash provides columnar storage specifically designed for analytical workloads. Key features include:
- Asynchronous replication from TiKV
- Snapshot isolation for read conisstency
- Column-oriented storage format optimized for OLAP queries
- Physical isolation of analytical workloads from transactional processing
- Automatic query routing based on cost optimization
Query engines can be explicit specified using hints to target either row-based TiKV storage or columnar TiFlash storage depending on workload characteristics.
PD (Placement Driver)
PD functions as the control plane, managing cluster metadata and scheduling operations:
- Storing metadata for all TiKV nodes in the cluster
- Allocating unique identifiers for clusters, regions, and transactions
- Generating global consistent timestamps (TSO)
- Collecting cluster health metrics and performing load balancing
- Providing the TiDB Dashboard for cluster monitoring and administration