Key Differences Between MySQL Storage Engines and Performance Optimization Techniques

MySQL Storage Engine Characteristics

Storage engines operate at the table level.

MyISAM

  • Lacks transaction support but ensures atomicity for single operations.
  • Does not enforce foreign keys. Uses table-level locking: read locks permit concurrent reads but block writes; write locks block both reads and writes. Write requests take precedence over reads even if they arrive later. This makes MyISAM less suitable for workloads with frequent updates mixed with queries.
  • Consists of three files per table: index file, structure definition file, and data file.
  • Caches total row count; SELECT COUNT(*) executes instantly without WHERE. With a filter, full scan is required.
  • Employs non-clustered indexes—index entries store pointers to data locations. Secondary indexes mirror primary ones except uniqueness enforcement.
  • Supports full-text and spatial indexes.
  • AUTO_INCREMENT columns can be part of composite indexes.

Index entry → points to data address.

InnoDB

  • Fully supports transactions with all four isolation levels and crash recovery.
  • Provides row-level locking and foreign key constraints, enabling concurrent writes.
  • Does not cache row count; SELECT COUNT(*) scans the entire table unless filtered. Filtered counts behave identically across engines.
  • AUTO_INCREMENT column must have its own dedicated index.
  • DELETE FROM removes rows individually rather than rebuilding the table.
  • Table storage may be in a shared tablespace or split into multiple independent files limited by OS file size (e.g., 2 GB per file).
  • Primary index is clustered—data resides within the index. Secondary indexes store primary key values, requiring an extra lookup via the primary index. Using auto-increment primary keys avoids costly B+Tree restructuring.
  • No native FULLTEXT index support.

Index entry → stores actual data.

MyISAM suits read-heavy scenarios; InnoDB fits write-intensive or integrity-critical applications.

Index Structures Explained

Indexes are ordered data structures improving retrieval speed.

Binary Tree

Left child key < parent key < right child key. Inefficient for sorted input sequences.

Red-Black Tree (Balanced Binary Tree)

Self-balancing variant suited for sequential data but performance degrades with large datasets due to rebalancing overhead and increased height.

Hash Index

Constant-time lookups under ideal conditions but cannot handle range queries.

hash_index[key] -> record_location

B-Tree

Each node holds multiple keys and pointers to child nodes arranged in sorted order, reducing depth compared to binary trees.

B+Tree

All leaf nodes contain records and link sequentially for efficient range scans. Internal nodes store only keys and pointers, increasing fan-out and minimizing tree height. Disk-friendly layout uses fixed-size blocks.

internal_node: [k1, k2, ..., kn] + [ptr1...ptr_n+1]
leaf_node: [sorted_keys_and_data] + next_leaf_ptr

Query and Schema Optimization Strategies

  • Add targeted indexes: Avoid full scans on large tables; omit indexes on low-cardinality fields (e.g., gender).
  • Composite indexes: Order columns by query frequency from left to right. An index on (region, age, salary) enables searches on (region), (region, age), and the full triple.
  • Exclude NULLs: Columns containing NULL are omitted from indexes; avoid defaulting to NULL.
  • Sort optimization: If ORDER BY follows an indexed WHERE, ensure sort columns are within the same index.
  • LIKE usage: Prefer prefix patterns ('abc%'); leading wildcards prevent index use.
  • Avoid computations on indexed columns: Transform WHERE YEAR(ts) < 2020 into WHERE ts < '2020-01-01'.
  • Replace NOT IN / <>: Use NOT EXISTS or range alternatives (id > 3 OR id < 3).

SQL Writing Practices

  1. Create indexes on WHERE and ORDER BY columns.
  2. Avoid IS NULL checks on indexed fields.
  3. Avoid != or <> operators on indexed fields.
  4. Replace OR between indexed and non-indexed fields with UNION ALL.
-- Instead of:
SELECT id FROM tbl WHERE num = 10 OR name = 'admin';
-- Use:
SELECT id FROM tbl WHERE num = 10
UNION ALL
SELECT id FROM tbl WHERE name = 'admin';
  1. Prefer BETWEEN over IN for ranges; replace IN with EXISTS when apppropriate.
  2. For leading-wildcard searches, consider full-text search.
  3. Avoid parameters in WHERE clauses during compilation; force index usage if needed.
  4. Refactor expressions: num/2 = 100num = 200.
  5. Move functions out of indexed column filters: extract prefix matches or date ranges.
  6. Keep indexed column usage aligned with leftmost index prefix.
  7. Eliminate meaningless queries like SELECT COUNT(*) FROM tbl without filters.
  8. Limit indexes to necessary columns (≤ 6), balancing read vs. write cost.
  9. Minimize updates to clustered index columns.
  10. Use numeric types over strings for pure numbers.
  11. Prefer VARCHAR over CHAR for variable-length data.
  12. Select specific columns instead of *.
  13. Favor table variables over temporary tables for small datasets.
  14. Batch-create large temp tables using SELECT INTO; truncate before dropping.
  15. Avoid cursors; opt for set-based operations. Use fast-forward cursors only for small sets.
  16. Set NOCOUNT ON in stored procedures to suppress packet traffic.
  17. Reduce long-running transactions to improve concurrency.
  18. Do not return excessive result sets to clients unnecessarily.

CHAR vs VARCHAR

  • CHAR(len) allocates fixed length; pads with spaces up to len (max 255). Truncates excess.
  • VARCHAR(len) uses variable length (max 65,535 bytes depending on charset and row size). Stores only actual characters plus length prefix.

Transaction Isolation Levels and Concurrency Issues

  • Dirty Read: Transaction A reads uncommitted changes from B, then B rolls back—A sees invalid data.
  • Non-repeatable Read: A reads same row multiple times; B modifies and commits between reads—A gets different results.
  • Phantom Read: A runs a bulk update; meanwhile B inserts new rows matching A's criteria—A misses these rows, appearing as phantom data.

Non-repeatable reads stem from updates; phantoms stem from inserts/deletes. Row locks fix non-repeatables; table locks address phantoms.

Tags: MySQL Storage Engines indexing performance optimization transactions

Posted on Sat, 30 May 2026 18:53:44 +0000 by pazzy