Database Fundamentals
A database is essentially a file system designed for data storage, composed of file systems and disk storage. Each I/O operation involves seek time and rotational latency.
1. Database Design Principles
E-R Model
Modern physical databases are designed using the Entity-Relationship model:
- E represents entities
- R represents relationships
- Each entity becomes a table in the database
- Relationships define associations between entities: one-to-one, one-to-many, many-to-many
- Relationships translate to columns in relational databases
Normalization Forms
Database design guidelines derived from research and practical experience:
- First Normal Form (1NF): Atomic values, no divisible columns
- Second Normal Form (2NF): Unique identifiers for records
- Third Normal Form (3NF): Non-key attributes depend only on primary key
2. Transactoins
Basic Concepts
Transactions manage multi-step operations that must succeed or fail together.
Operations:
- BEGIN TRANSACTION
- ROLLBACK
- COMMIT
Transaction Characteristics (ACID)
- Atomicity: Indivisible unit, all operations succeed or fail together
- Durability: Data persists after commit or rollback
- Isolation: Independent execution of concurrent transactions
- Consistency: Data integrity maintained before and after transactions
Isolation Levels
Issues with concurrent transactions:
- Dirty Read: Reading uncommitted data from other transactions
- Non-repeatable Read: Inconsistent results in same transaction
- Phantom Read: Missing newly inserted records
Isolation levels:
- READ UNCOMMITTED: Allows all issues
- READ COMMITTED: Prevents dirty reads
- REPEATABLE READ: Prevents non-repeatable reads
- SERIALIZABLE: Complete isolation
3. Locking Mechanisms
Purpose of Locks
Locks ensure data consistency in multi-user environments by controlling concurrent access.
Lock Types
- Shared Lock (Read Lock): Multiple readers can access simultaneously
- Exclusive Lock (Write Lock): Single writer blocks others
Granularity
- Table Lock: Locks entire table, minimal overhead
- Row Lock: Fine-grained locking, maximum concurrency
4. Concurrency Control Strategies
Optimistic Locking
Assumes conflicts are rare. Uses version numbers that increment with each update.
Pessimistic Locking
Assumes conflicts will occur. Acquires locks before accessing data.
5. Pagination Implementation
MySQL
SELECT * FROM table_name LIMIT offset, count;
-- offset = (page_number - 1) * records_per_page
Oracle
SELECT * FROM (
SELECT ROWNUM rn, tt.* FROM (
SELECT * FROM emp ORDER BY salary DESC
) tt WHERE ROWNUM <= 10
) WHERE rn > 5;
6. MySQL Storage Engines
- InnoDB: Transaction-safe with ACID compliance, supports foreign keys
- MyISAM: Fast access, no transaction support, suitable for read-heavy applications
- MEMORY: Stores data in RAM, data lost on shutdown
- MERGE: Combines multiple identical MyISAM tables
7. Join Operations
Inner Join
SELECT e.name, d.name
FROM employees e INNER JOIN departments d ON e.dept_id = d.id;
Outer Joins
-- Left Join
SELECT * FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id;
-- Right Join
SELECT * FROM table1 t1 RIGHT JOIN table2 t2 ON t1.id = t2.id;
8. Statement vs PreparedStatement
PreparedStatement Advantages
- Prevents SQL injection attacks
- Better performance through precompilation
- Parameterized queries with placeholder (?) syntax
String sql = "SELECT * FROM users WHERE username = ? AND password = ?";
PreparedStatement stmt = connection.prepareStatement(sql);
stmt.setString(1, username);
stmt.setString(2, password);
ResultSet rs = stmt.executeQuery();
9. Connection Pooling
Connection pools maintain a cache of database connections to improve resource utilization and response times.
Popular implementations:
- C3P0
- Druid (Alibaba)
10. Partitioning and Sharding
Partitioning Benefits
- Improved query performance on large datasets
- Easier maintenance operations
- Better index efficiency
Sharding Benefits
- Horizontal scaling across multiple servers
- Reduced individual table sizes
- Distributed load management
11. Query Optimization
Process
- Identify slow queries using slow query log
- Analyze with EXPLAIN statement
- Optimize based on findings
EXPLAIN Output Key Metrics
- type: Access method efficiency (system > const > ref > range > index > ALL)
- rows: Estimated rows to examine
- Extra: Additional information about execution plan
Index Usage Guidelines
- Follow leftmost prefix principle for composite indexes
- Avoid leading wildcards in LIKE clauses
- Use proper data types (quote strings)
12. Sharding Implementation
Mycat Features
- Database middleware supporting MySQL clusters
- Transparent application-level access
- Supports both vertical and horizonatl partitioning
13. Indexing Strategies
Data Structure
B+Tree implementation provides:
- Sequential I/O instead of random access
- Efficient range queries
- Sorted data organization
Index Types
-- Regular Index
CREATE INDEX idx_name ON table_name(column_name);
-- Unique Index
CREATE UNIQUE INDEX idx_unique ON table_name(column_name);
-- Primary Key Index (automatically created)
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100)
);
-- Composite Index
ALTER TABLE users ADD INDEX idx_composite (name, age, city);
14. SQL Optimization Techniques
NULL Handling
SELECT IFNULL(score_sum, 0) AS total_score FROM students;
Duplicate Removal
SELECT DISTINCT department FROM employees;
MySQL Performance Tuning
System Configuration
- Increase I/O threads for multi-core systems
- Configure innodb_buffer_pool_size based on available memory
- Set innodb_file_per_table for better maintenance
- Adjust innodb_autoinc_lock_mode for batch inserts
Storage Engine Optimization
- InnoDB default page size: 16KB
- Design tables to fit within page boundaries
- Use appropriate data types (TINYINT for boolean values)
Index Strategy
- Clustered index per table (primary key)
- Secondary indexes with back-pointer to clustered index
- Covering indexes eliminate need for table lookups
Scaling Approaches
- Vertical partitioning: Split wide tables into related subsets
- Horizontal partitioning: Distribute rows across multiple tables
- Hot/cold data separation for optimal storage usage
Locking and MVCC
Isolation Level Evolution
- MySQL default: REPEATABLE READ (historical reasons)
- Enterprise preference: READ COMMITTED + ROW format
- MVCC enables different version visibility per isolation level
Replication Improvements
- Semi-synchronous replication ensures slave acknowledgment
- Multi-threaded replication for parallel processing
- Enhanced group commit reduces disk I/O