Database Indexing and Query Execution Plans

Purpose of Indexes

Indexes serve a similar function to a book's table of contents, designed to enhance query performance.

Types of Index Algorithms

  • B-tree indexes
  • Hash indexes
  • R-tree indexes
  • Full-text indexes
  • GIS indexes

B-tree Index Classification

How Secondary Indexes Build B-tree Structure

-- Process steps:
-- 1. Extract column values from the table to generate B-tree structure
-- 2. Sort all values automatically
-- 3. Distribute sorted values evenly to leaf nodes (16K)
-- 4. Generate pointers to corresponding data pages
-- 5. Create branch and root nodes based on data volume and key length

-- Example table structure
id  name  age  gender
SELECT * FROM t1 WHERE id=10;

-- Note: ID column queries use sequential I/O, while other columns may use random I/O

Clustered Indexes

Prerequisites

-- 1. Primary key column automatically becomes clustered index
-- 2. If no primary key, unique key becomes clustered index
-- 3. Clustered index should be defined during table creation, typically using irrelevant column (ID)

How Secondary Indexes Build B-tree Structure

-- 1. Set primary key column (ID) during table creation
-- 2. Data will be stored on disk ordered by ID column (clustered index organized table)
-- 3. Sorted complete rows become leaf nodes (data pages = leaf nodes)

Differences Between Clustered and Secondary Indexes

  • Any column can have secondary indexes when needed, as long as names differ
  • Only one clustered index per table, usually the primary key
  • Secondary index leaf nodes store only sorted index column values + clustered index column values
  • Clustered index leaf nodes store complete sorted row data
  • MySQL stores table data as clustered index organized tables

Secondary Index Subtypes

  1. Regular single-column secondary index
  2. Composite indexes: multiple columns as index conditions, theoretically well-designed to reduce table lookups
  3. Unique indexes: all index column values are unique

Factors Affecting Index Tree Height

-- 1. Data volume - Solution: partitioning, sharding, distributed databases
-- 2. Long index column values - Solution: prefix indexes
-- 3. Data types:
-- Variable-length strings using CHAR - Solution: use VARCHAR for variable strings
-- ENUM type usage: enum('Shandong','Hebei','Heilongjiang','Jilin','Liaoning','Shaanxi'...)

Basic Index Management

Querying Indexes

DESC city;
-- OR
SHOW INDEX FROM city\G;

Single-column Regular Secondary Index

ALTER TABLE city ADD INDEX idx_name(name);

Multi-column Composite Index

ALTER TABLE city ADD INDEX idx_c_p(countrycode,population);

Unique Index

ALTER TABLE city ADD UNIQUE INDEX uidx_dis(district);

Prefix Index

ALTER TABLE city ADD INDEX idx_dis(district(5));

Dropping Indexes

ALTER TABLE city DROP INDEX idx_name;
ALTER TABLE city DROP INDEX idx_c_p;
ALTER TABLE city DROP INDEX idx_dis;

Index Optimization Testing

Before Optimization - 755s

mysqlslap --defaults-file=/etc/my.cnf \
--concurrency=100 --iterations=1 --create-schema='test' \
--query="select * from test.t100w where k2='MN89'" engine=innodb \
--number-of-queries=2000 -uroot -p123 -verbose

# Result: Average 755.861 seconds

After Optimization - 1.67s

mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='test' --query="select * from test.t100w where k2='MN89'" engine=innodb \
--number-of-queries=2000 -uroot -p123 -verbose

# Result: Average 1.678 seconds

Execution Plan Analysis

Purpose

Extract the optimizer-selected execution plan for database administrator evaluation of statement efficiency.

Obtaining Execution Plans

  • DESC SQL_statement
  • EXPLAIN SQL_statement
DESC SELECT * FROM school.student WHERE sname='zhang3';

Key Information to Focus On

-- table: city                              --> queried table    **
-- possible_keys: CountryCode,idx_co_po      --> potential indexes  **
-- key: CountryCode                          --> actual index used    ***
-- type: all                                --> index type        *****
-- Extra: Using index condition              --> additional info        *****

Query Types (Type)

  • Full table scan: ALL
  • Index scans: index, range, ref, eq_ref, const(system), NULL
-- index: full index scan
SELECT countrycode FROM city;

-- range: index range scan (> < >= <=, BETWEEN AND, OR, IN, LIKE)
SELECT * FROM city WHERE id>2000;
SELECT * FROM city WHERE countrycode LIKE 'CH%';

-- For secondary indexes, != and NOT IN don't use indexes
-- For primary key indexes, != and NOT IN use range

-- Examples with IN/OR
SELECT * FROM city WHERE countrycode='CHN' OR countrycode='USA';
SELECT * FROM city WHERE countrycode IN ('CHN','USA');

-- Better rewritten as UNION ALL
SELECT * FROM city WHERE countrycode='CHN' 
UNION ALL 
SELECT * FROM city WHERE countrycode='USA';

-- ref: secondary index equality lookup
-- eq_ref: multi-table joins where child table uses primary/unique key
SELECT b.name,a.name,a.population  
FROM city AS a 
JOIN country AS b ON a.countrycode=b.code 
WHERE a.population<100;

-- const(system): primary/unique key equality lookup
SELECT * FROM city WHERE id=100;

Extra Field Explanations

  1. filesort in Extra indicates file sorting occurred
  2. Check if ORDER BY, GROUP BY, DISTICNT conditions have indexes
  3. Create composite indexes based on clause execution order

EXPLAIN/DESC Usage Scenarios

-- Interview question: Company's business is slow, analyze database causes
-- Two scenarios:

-- Emergency slowness (sudden hang):
-- 1. SHOW PROCESSLIST; get statements causing database hang
-- 2. EXPLAIN to analyze execution plan and index usage
-- 3. Create indexes, modify statements

-- Persistent slowness:
-- 1. Log slow queries to slowlog, analyze slowlog
-- 2. EXPLAIN to analyze execution plan and index usage
-- 3. Create indexes, modify statements

Key Length Coverage

For utf8mb4:

  • VARCHAR(10): nullable(1) + 4*10 + 2 = 43 bytes
  • CHAR(10): nullable(1) + 4*10 = 41 bytes
  • INT: nullable(1) + 4 = 5 bytes

Purpose: Determine composite index coverage length - generally longer is better.

VARCHAR(4) utf8mb4: 44 + 3 bytes CHAR(4): 44 + 1 byte

Characteristics:

  1. Can store 4 arbitrary characters
  2. Regardless of character, numeric, or Chinese, maximum reserved length is 4 bytes per character
  3. Each Chinese character occupies 4 bytes
  4. Numbers and letters actually occupy 1 byte each
SELECT LENGTH(column_name) FROM test_table;

Key Length Application Focus

For composite index sequence (k1,k2,k3,k4):

When all index columns use equality condtiions, order doesn't matter due to optimizer rearrangement:

-- Both have same key_length = 56
SELECT * FROM test WHERE k1='aa' AND k2='China' AND k3='aaaa' AND k4='Hello China';
SELECT * FROM test WHERE k2='China' AND k3='aaaa' AND k4='Hello China' AND k1='aa';

For non-contiguous conditions missing k2:

-- Only k1 condition met
SELECT * FROM test WHERE k1='aa' AND k3='aaaa' AND k4='Hello China';

Conditions That Don't Use Indexes

-- <> and NOT IN don't use secondary indexes
EXPLAIN SELECT * FROM telephone_table WHERE tel_number <> '110';
EXPLAIN SELECT * FROM telephone_table WHERE tel_number NOT IN ('110','119');

-- >, <, >=, <=, NOT IN don't use secondary indexes
-- Individual >, <, IN may or may not use indexes depending on result set size

-- OR/IN should be rewritten as UNION
EXPLAIN SELECT * FROM telephone_table WHERE tel_number IN ('110','119');
-- Rewrite as:
EXPLAIN SELECT * FROM telephone_table WHERE tel_number='110'
UNION ALL
SELECT * FROM telephone_table WHERE tel_number='119';

-- LIKE with leading % doesn't use indexes
EXPLAIN SELECT * FROM telephone_table WHERE tel_number LIKE '31%';  -- Uses range scan
EXPLAIN SELECT * FROM telephone_table WHERE tel_number LIKE '%110'; -- No index used

-- For %search% patterns, consider Elasticsearch + MongoDB for dedicated search services

Index Usage Guidelines

Index Creation Principles

  1. Tables must have primary keys, typically irrelevant auto-incrementing columns
  2. Columns frequently used in WHERE conditions, ORDER BY, GROUP BY, JOIN ON, DISTINCT
  3. Use columns with many unique values as leading columns in composite indexes
  4. For long index columns, use prefix indexes
  5. Reduce index entries: remove unused indexes, clean infrequently used ones
  6. Perform index maintenance during off-peak hours
  7. Don't create indexes for small tables

Conditions Where Indexes Aren't Used

  1. No query conditions or conditions lack indexes
  2. Query results represent more than 25% of original table data
  3. Index failure due to outdated statistics
  4. Functions on indexed columns or arithmetic operations
  5. Implicit conversions causing index failure
  6. <> and NOT IN don't use secondary indexes
  7. LIKE with leading % doesn't use indexes
  8. Composite index issues

Tags: database indexing B-Tree Query-Optimization execution-plan

Posted on Sat, 30 May 2026 17:45:23 +0000 by jamz310