Database Optimization and Core Concepts

Database Fundamentals

A database is essentially a file system designed for data storage, composed of file systems and disk storage. Each I/O operation involves seek time and rotational latency.

1. Database Design Principles

E-R Model

Modern physical databases are designed using the Entity-Relationship model:

  • E represents entities
  • R represents relationships
  • Each entity becomes a table in the database
  • Relationships define associations between entities: one-to-one, one-to-many, many-to-many
  • Relationships translate to columns in relational databases

Normalization Forms

Database design guidelines derived from research and practical experience:

  • First Normal Form (1NF): Atomic values, no divisible columns
  • Second Normal Form (2NF): Unique identifiers for records
  • Third Normal Form (3NF): Non-key attributes depend only on primary key

2. Transactoins

Basic Concepts

Transactions manage multi-step operations that must succeed or fail together.

Operations:

  • BEGIN TRANSACTION
  • ROLLBACK
  • COMMIT

Transaction Characteristics (ACID)

  • Atomicity: Indivisible unit, all operations succeed or fail together
  • Durability: Data persists after commit or rollback
  • Isolation: Independent execution of concurrent transactions
  • Consistency: Data integrity maintained before and after transactions

Isolation Levels

Issues with concurrent transactions:

  • Dirty Read: Reading uncommitted data from other transactions
  • Non-repeatable Read: Inconsistent results in same transaction
  • Phantom Read: Missing newly inserted records

Isolation levels:

  • READ UNCOMMITTED: Allows all issues
  • READ COMMITTED: Prevents dirty reads
  • REPEATABLE READ: Prevents non-repeatable reads
  • SERIALIZABLE: Complete isolation

3. Locking Mechanisms

Purpose of Locks

Locks ensure data consistency in multi-user environments by controlling concurrent access.

Lock Types

  • Shared Lock (Read Lock): Multiple readers can access simultaneously
  • Exclusive Lock (Write Lock): Single writer blocks others

Granularity

  • Table Lock: Locks entire table, minimal overhead
  • Row Lock: Fine-grained locking, maximum concurrency

4. Concurrency Control Strategies

Optimistic Locking

Assumes conflicts are rare. Uses version numbers that increment with each update.

Pessimistic Locking

Assumes conflicts will occur. Acquires locks before accessing data.

5. Pagination Implementation

MySQL

SELECT * FROM table_name LIMIT offset, count;
-- offset = (page_number - 1) * records_per_page

Oracle

SELECT * FROM (
    SELECT ROWNUM rn, tt.* FROM (
        SELECT * FROM emp ORDER BY salary DESC
    ) tt WHERE ROWNUM <= 10
) WHERE rn > 5;

6. MySQL Storage Engines

  • InnoDB: Transaction-safe with ACID compliance, supports foreign keys
  • MyISAM: Fast access, no transaction support, suitable for read-heavy applications
  • MEMORY: Stores data in RAM, data lost on shutdown
  • MERGE: Combines multiple identical MyISAM tables

7. Join Operations

Inner Join

SELECT e.name, d.name 
FROM employees e INNER JOIN departments d ON e.dept_id = d.id;

Outer Joins

-- Left Join
SELECT * FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id;

-- Right Join  
SELECT * FROM table1 t1 RIGHT JOIN table2 t2 ON t1.id = t2.id;

8. Statement vs PreparedStatement

PreparedStatement Advantages

  • Prevents SQL injection attacks
  • Better performance through precompilation
  • Parameterized queries with placeholder (?) syntax
String sql = "SELECT * FROM users WHERE username = ? AND password = ?";
PreparedStatement stmt = connection.prepareStatement(sql);
stmt.setString(1, username);
stmt.setString(2, password);
ResultSet rs = stmt.executeQuery();

9. Connection Pooling

Connection pools maintain a cache of database connections to improve resource utilization and response times.

Popular implementations:

  • C3P0
  • Druid (Alibaba)

10. Partitioning and Sharding

Partitioning Benefits

  • Improved query performance on large datasets
  • Easier maintenance operations
  • Better index efficiency

Sharding Benefits

  • Horizontal scaling across multiple servers
  • Reduced individual table sizes
  • Distributed load management

11. Query Optimization

Process

  1. Identify slow queries using slow query log
  2. Analyze with EXPLAIN statement
  3. Optimize based on findings

EXPLAIN Output Key Metrics

  • type: Access method efficiency (system > const > ref > range > index > ALL)
  • rows: Estimated rows to examine
  • Extra: Additional information about execution plan

Index Usage Guidelines

  • Follow leftmost prefix principle for composite indexes
  • Avoid leading wildcards in LIKE clauses
  • Use proper data types (quote strings)

12. Sharding Implementation

Mycat Features

  • Database middleware supporting MySQL clusters
  • Transparent application-level access
  • Supports both vertical and horizonatl partitioning

13. Indexing Strategies

Data Structure

B+Tree implementation provides:

  • Sequential I/O instead of random access
  • Efficient range queries
  • Sorted data organization

Index Types

-- Regular Index
CREATE INDEX idx_name ON table_name(column_name);

-- Unique Index  
CREATE UNIQUE INDEX idx_unique ON table_name(column_name);

-- Primary Key Index (automatically created)
CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100)
);

-- Composite Index
ALTER TABLE users ADD INDEX idx_composite (name, age, city);

14. SQL Optimization Techniques

NULL Handling

SELECT IFNULL(score_sum, 0) AS total_score FROM students;

Duplicate Removal

SELECT DISTINCT department FROM employees;

MySQL Performance Tuning

System Configuration

  • Increase I/O threads for multi-core systems
  • Configure innodb_buffer_pool_size based on available memory
  • Set innodb_file_per_table for better maintenance
  • Adjust innodb_autoinc_lock_mode for batch inserts

Storage Engine Optimization

  • InnoDB default page size: 16KB
  • Design tables to fit within page boundaries
  • Use appropriate data types (TINYINT for boolean values)

Index Strategy

  • Clustered index per table (primary key)
  • Secondary indexes with back-pointer to clustered index
  • Covering indexes eliminate need for table lookups

Scaling Approaches

  • Vertical partitioning: Split wide tables into related subsets
  • Horizontal partitioning: Distribute rows across multiple tables
  • Hot/cold data separation for optimal storage usage

Locking and MVCC

Isolation Level Evolution

  • MySQL default: REPEATABLE READ (historical reasons)
  • Enterprise preference: READ COMMITTED + ROW format
  • MVCC enables different version visibility per isolation level

Replication Improvements

  • Semi-synchronous replication ensures slave acknowledgment
  • Multi-threaded replication for parallel processing
  • Enhanced group commit reduces disk I/O

Tags: MySQL database-design transactions indexing Optimization

Posted on Tue, 19 May 2026 14:20:25 +0000 by wolf