Purpose of Indexes
Indexes serve a similar function to a book's table of contents, designed to enhance query performance.
Types of Index Algorithms
- B-tree indexes
- Hash indexes
- R-tree indexes
- Full-text indexes
- GIS indexes
B-tree Index Classification
How Secondary Indexes Build B-tree Structure
-- Process steps:
-- 1. Extract column values from the table to generate B-tree structure
-- 2. Sort all values automatically
-- 3. Distribute sorted values evenly to leaf nodes (16K)
-- 4. Generate pointers to corresponding data pages
-- 5. Create branch and root nodes based on data volume and key length
-- Example table structure
id name age gender
SELECT * FROM t1 WHERE id=10;
-- Note: ID column queries use sequential I/O, while other columns may use random I/O
Clustered Indexes
Prerequisites
-- 1. Primary key column automatically becomes clustered index
-- 2. If no primary key, unique key becomes clustered index
-- 3. Clustered index should be defined during table creation, typically using irrelevant column (ID)
How Secondary Indexes Build B-tree Structure
-- 1. Set primary key column (ID) during table creation
-- 2. Data will be stored on disk ordered by ID column (clustered index organized table)
-- 3. Sorted complete rows become leaf nodes (data pages = leaf nodes)
Differences Between Clustered and Secondary Indexes
- Any column can have secondary indexes when needed, as long as names differ
- Only one clustered index per table, usually the primary key
- Secondary index leaf nodes store only sorted index column values + clustered index column values
- Clustered index leaf nodes store complete sorted row data
- MySQL stores table data as clustered index organized tables
Secondary Index Subtypes
- Regular single-column secondary index
- Composite indexes: multiple columns as index conditions, theoretically well-designed to reduce table lookups
- Unique indexes: all index column values are unique
Factors Affecting Index Tree Height
-- 1. Data volume - Solution: partitioning, sharding, distributed databases
-- 2. Long index column values - Solution: prefix indexes
-- 3. Data types:
-- Variable-length strings using CHAR - Solution: use VARCHAR for variable strings
-- ENUM type usage: enum('Shandong','Hebei','Heilongjiang','Jilin','Liaoning','Shaanxi'...)
Basic Index Management
Querying Indexes
DESC city;
-- OR
SHOW INDEX FROM city\G;
Single-column Regular Secondary Index
ALTER TABLE city ADD INDEX idx_name(name);
Multi-column Composite Index
ALTER TABLE city ADD INDEX idx_c_p(countrycode,population);
Unique Index
ALTER TABLE city ADD UNIQUE INDEX uidx_dis(district);
Prefix Index
ALTER TABLE city ADD INDEX idx_dis(district(5));
Dropping Indexes
ALTER TABLE city DROP INDEX idx_name;
ALTER TABLE city DROP INDEX idx_c_p;
ALTER TABLE city DROP INDEX idx_dis;
Index Optimization Testing
Before Optimization - 755s
mysqlslap --defaults-file=/etc/my.cnf \
--concurrency=100 --iterations=1 --create-schema='test' \
--query="select * from test.t100w where k2='MN89'" engine=innodb \
--number-of-queries=2000 -uroot -p123 -verbose
# Result: Average 755.861 seconds
After Optimization - 1.67s
mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='test' --query="select * from test.t100w where k2='MN89'" engine=innodb \
--number-of-queries=2000 -uroot -p123 -verbose
# Result: Average 1.678 seconds
Execution Plan Analysis
Purpose
Extract the optimizer-selected execution plan for database administrator evaluation of statement efficiency.
Obtaining Execution Plans
DESC SQL_statementEXPLAIN SQL_statement
DESC SELECT * FROM school.student WHERE sname='zhang3';
Key Information to Focus On
-- table: city --> queried table **
-- possible_keys: CountryCode,idx_co_po --> potential indexes **
-- key: CountryCode --> actual index used ***
-- type: all --> index type *****
-- Extra: Using index condition --> additional info *****
Query Types (Type)
- Full table scan: ALL
- Index scans: index, range, ref, eq_ref, const(system), NULL
-- index: full index scan
SELECT countrycode FROM city;
-- range: index range scan (> < >= <=, BETWEEN AND, OR, IN, LIKE)
SELECT * FROM city WHERE id>2000;
SELECT * FROM city WHERE countrycode LIKE 'CH%';
-- For secondary indexes, != and NOT IN don't use indexes
-- For primary key indexes, != and NOT IN use range
-- Examples with IN/OR
SELECT * FROM city WHERE countrycode='CHN' OR countrycode='USA';
SELECT * FROM city WHERE countrycode IN ('CHN','USA');
-- Better rewritten as UNION ALL
SELECT * FROM city WHERE countrycode='CHN'
UNION ALL
SELECT * FROM city WHERE countrycode='USA';
-- ref: secondary index equality lookup
-- eq_ref: multi-table joins where child table uses primary/unique key
SELECT b.name,a.name,a.population
FROM city AS a
JOIN country AS b ON a.countrycode=b.code
WHERE a.population<100;
-- const(system): primary/unique key equality lookup
SELECT * FROM city WHERE id=100;
Extra Field Explanations
- filesort in Extra indicates file sorting occurred
- Check if ORDER BY, GROUP BY, DISTICNT conditions have indexes
- Create composite indexes based on clause execution order
EXPLAIN/DESC Usage Scenarios
-- Interview question: Company's business is slow, analyze database causes
-- Two scenarios:
-- Emergency slowness (sudden hang):
-- 1. SHOW PROCESSLIST; get statements causing database hang
-- 2. EXPLAIN to analyze execution plan and index usage
-- 3. Create indexes, modify statements
-- Persistent slowness:
-- 1. Log slow queries to slowlog, analyze slowlog
-- 2. EXPLAIN to analyze execution plan and index usage
-- 3. Create indexes, modify statements
Key Length Coverage
For utf8mb4:
- VARCHAR(10): nullable(1) + 4*10 + 2 = 43 bytes
- CHAR(10): nullable(1) + 4*10 = 41 bytes
- INT: nullable(1) + 4 = 5 bytes
Purpose: Determine composite index coverage length - generally longer is better.
VARCHAR(4) utf8mb4: 44 + 3 bytes CHAR(4): 44 + 1 byte
Characteristics:
- Can store 4 arbitrary characters
- Regardless of character, numeric, or Chinese, maximum reserved length is 4 bytes per character
- Each Chinese character occupies 4 bytes
- Numbers and letters actually occupy 1 byte each
SELECT LENGTH(column_name) FROM test_table;
Key Length Application Focus
For composite index sequence (k1,k2,k3,k4):
When all index columns use equality condtiions, order doesn't matter due to optimizer rearrangement:
-- Both have same key_length = 56
SELECT * FROM test WHERE k1='aa' AND k2='China' AND k3='aaaa' AND k4='Hello China';
SELECT * FROM test WHERE k2='China' AND k3='aaaa' AND k4='Hello China' AND k1='aa';
For non-contiguous conditions missing k2:
-- Only k1 condition met
SELECT * FROM test WHERE k1='aa' AND k3='aaaa' AND k4='Hello China';
Conditions That Don't Use Indexes
-- <> and NOT IN don't use secondary indexes
EXPLAIN SELECT * FROM telephone_table WHERE tel_number <> '110';
EXPLAIN SELECT * FROM telephone_table WHERE tel_number NOT IN ('110','119');
-- >, <, >=, <=, NOT IN don't use secondary indexes
-- Individual >, <, IN may or may not use indexes depending on result set size
-- OR/IN should be rewritten as UNION
EXPLAIN SELECT * FROM telephone_table WHERE tel_number IN ('110','119');
-- Rewrite as:
EXPLAIN SELECT * FROM telephone_table WHERE tel_number='110'
UNION ALL
SELECT * FROM telephone_table WHERE tel_number='119';
-- LIKE with leading % doesn't use indexes
EXPLAIN SELECT * FROM telephone_table WHERE tel_number LIKE '31%'; -- Uses range scan
EXPLAIN SELECT * FROM telephone_table WHERE tel_number LIKE '%110'; -- No index used
-- For %search% patterns, consider Elasticsearch + MongoDB for dedicated search services
Index Usage Guidelines
Index Creation Principles
- Tables must have primary keys, typically irrelevant auto-incrementing columns
- Columns frequently used in WHERE conditions, ORDER BY, GROUP BY, JOIN ON, DISTINCT
- Use columns with many unique values as leading columns in composite indexes
- For long index columns, use prefix indexes
- Reduce index entries: remove unused indexes, clean infrequently used ones
- Perform index maintenance during off-peak hours
- Don't create indexes for small tables
Conditions Where Indexes Aren't Used
- No query conditions or conditions lack indexes
- Query results represent more than 25% of original table data
- Index failure due to outdated statistics
- Functions on indexed columns or arithmetic operations
- Implicit conversions causing index failure
- <> and NOT IN don't use secondary indexes
- LIKE with leading % doesn't use indexes
- Composite index issues