Optimizing MySQL query efficiency when dealing with hundreds of millions of rows requires a comprehensive approach that includes indexing, query rewriting, partitioning, and hardware configuration. Below are best practices and examples to improve query performance on large datasets.

1. Introduction
Processing large-scale data demands efficient queries. Optimized queries reduce response times and resource consumption, boosting overall system performance. This article covers multiple strategies with code examples.
2. Index Optimization
Index Types
MySQL supports several index types:
- B-Tree: Suitable for most quereis.
- Hash: Best for exact matches.
- Full-text: For text search.
- Spatial: For geospatial data.
Index Design Best Practices
- Create indexes on columns used in
WHEREclauses. - Use composite indexes for multi-column conditions.
- Avoid indexing columns with low selectivity (many duplicates).
- Regularly maintain indexes (rebuild, drop unused).
Example:
-- Create single-column indexes
CREATE INDEX idx_user_id ON users(user_id);
CREATE INDEX idx_user_age ON users(age);
-- Create a composite index
CREATE INDEX idx_user_name_age ON users(name, age);
3. Query Optimization
Query Rewriting
Rewrite queries to improve performence:
- Avoid
SELECT *; fetch only required columns. - Prefer
JOINover subqueries. - Use
EXISTSinstead ofINwhen checking for existence.
Before:
SELECT * FROM users
WHERE user_id IN (SELECT user_id FROM orders WHERE order_date > '2023-01-01');
After:
SELECT u.* FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.order_date > '2023-01-01';
Efficient Pagination
Avoid large OFFSET values. Use keyset pagination for better performance.
Traditional (slow for large offsets):
SELECT * FROM users ORDER BY user_id LIMIT 1000 OFFSET 5000;
Keyset pagination (faster):
SELECT * FROM users WHERE user_id > 5000 ORDER BY user_id LIMIT 1000;
4. Partitioning
Partition large tables into smaller, manageable pieces.
Horizontal Partitioning (by range)
CREATE TABLE users (
user_id INT,
name VARCHAR(50),
age INT,
PRIMARY KEY (user_id, age)
) PARTITION BY RANGE (age) (
PARTITION p0 VALUES LESS THAN (20),
PARTITION p1 VALUES LESS THAN (30),
PARTITION p2 VALUES LESS THAN (40),
PARTITION p3 VALUES LESS THAN (50),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
Vertical Partitioning (split columns into separate tables)
-- Base table with core columns
CREATE TABLE users_basic (
user_id INT PRIMARY KEY,
name VARCHAR(50)
);
-- Extension table with additional columns
CREATE TABLE users_extra (
user_id INT,
age INT,
FOREIGN KEY (user_id) REFERENCES users_basic(user_id)
);
5. Hardware and Configuration
Hardware Considerations
- CPU: Multi-core, high-frequency processors improve concurrency.
- Memory: Large RAM allows caching and reduces I/O.
- Storage: SSDs significantly enhance read/write speeds over HDDs.
MySQL Configuration
Adjust key parameters in my.cnf:
[mysqld]
innodb_buffer_pool_size = 4G
max_connections = 500
query_cache_size = 0
6. Complete Example
Combine indexing, query rewriting, and partitioning:
-- Create users table
CREATE TABLE users (
user_id INT PRIMARY KEY,
name VARCHAR(50),
age INT
);
-- Create partitioned orders table by year
CREATE TABLE orders (
order_id INT PRIMARY KEY,
user_id INT,
order_date DATE,
amount DECIMAL(10, 2)
) PARTITION BY RANGE (YEAR(order_date)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025)
);
-- Create indexes
CREATE INDEX idx_user_name ON users(name);
CREATE INDEX idx_order_date ON orders(order_date);
-- Optimized query
SELECT u.* FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.order_date > '2023-01-01';
7. Conclusion
Optimizing MySQL for hundreds of millions of rows involves a holistic approach: proper indexing, query rewriting, partitioning, and hardware/configuration tuning. The techniques and examples provided help reduce latency and improve throughput. Apply them to achieve better performance in your large-scale databases.