Understanding TiDB Distributed Database Architecture

Overview

TiDB is a cloud-native distributed database that combines horizontal scalability with financial-grade high availability. It delivers real-time HTAP (Hybrid Transactional and Analytical Processing) capabilities and maintains MySQL 5.7 protocol compatibility.

Architecture Components

TiDB Server

The TiDB Server acts as the SQL layer and serves as the entry point for client connections. It handles the following core responsibilities:

Managing client connections and authentication
Parsing, compiling, and generating execution plans for SQL statements
Converting relational data to key-value pairs for storage in TiKV
Executing DDL operations and handling online schema changes
Managing garbage collection with a configurable default interval

Request Processing Flow

Client Request
    → Protocol Parsing
    → SQL Parsing
    → Query Optimization
    → Execution Plan Generation
    → Distributed Execution
    → Result Return

TiKV

TiKV serves as the distributed storage engine for TiDB, providing persistent data storage with strong consistency guarantees.

Storage Architecture

Region (Data Shard)
    ├── Leader
    └── Followers (Multiple Replicas)
        ├── Follower 1
        └── Follower 2

Core Capabilities

Durable data persistence
Strong consistency and high availability through replica management
MVCC (Multi-Version Concurrency Control) for concurrent access
Distributed transaction support
Coprocessor for pushdown computation to storage nodes

Internal Implementation

RocksDB: Persistent key-value storage engine
Raft: Consensus protocol for distributed consistency
MVCC: Version management for concurrent transactions
Transaction: Two-phase commit implementation

Data Distribution

Data sharding is implemented through Regions organized by key ranges. Each Region typically spans 96MB to 140MB. When size thresholds are exceeded, automatic splitting occurs. The scheduler continuously rebalances load across the cluster. Replication uses the Multi-Raft model with 3 or 5 replicas, ensuring automatic failover and strong consistency guarantees.

TiFlash

TiFlash provides columnar storage specifically designed for analytical workloads. Key features include:

Asynchronous replication from TiKV
Snapshot isolation for read conisstency
Column-oriented storage format optimized for OLAP queries
Physical isolation of analytical workloads from transactional processing
Automatic query routing based on cost optimization

Query engines can be explicit specified using hints to target either row-based TiKV storage or columnar TiFlash storage depending on workload characteristics.

PD (Placement Driver)

PD functions as the control plane, managing cluster metadata and scheduling operations:

Storing metadata for all TiKV nodes in the cluster
Allocating unique identifiers for clusters, regions, and transactions
Generating global consistent timestamps (TSO)
Collecting cluster health metrics and performing load balancing
Providing the TiDB Dashboard for cluster monitoring and administration

Tags: distributed-database tidb NewSQL Raft-consensus HTAP

Posted on Mon, 01 Jun 2026 17:27:12 +0000 by Chris1981

Freaks City