Principles and Practices for Relational Database Design and Performance Tuning

Database Design Principles

Designing an effective data model requires careful consideration of multiple factors:

What data needs to be stored? What entities and attributes should be captured?
What constraints are necessary to ensure data integrity during insertion, deletion, and update operations?
How can data redundancy be minimized to prevent uncontrolled growth as usage scales?
How can the design facilitate ease of maintenance and usage for database administrators?
How do different application scenarios affect table structure decisions?

Poor design often leads to operational problems, including data anomalies during updates, incorrect information representation, loss of critical data, excessive storage waste, and degraded application performance. Conversely, a well-structured database conserves storage, enforces data integrity, and simplifies application development through more straightforward query logic.

Normalization

Normalization in relational databases involves a series of formal rules that define structural standards for data tables. These rules, called Normal Forms (NF), establish hierarchical levels of design quality. The common forms are First (1NF), Second (2NF), Third (3NF), Boyce-Codd (BCNF), Fourth (4NF), and Fifth (5NF) Normal Form. Higher normalization typically reduces data redundancy. In prcatice, designs often adhere to 3NF or BCNF, but sometimes performance requirements justify intentional rule violations through denormalization.

Fundamental Concepts

Understanding normalization requires familiarity with key terminology:

Superkey: Any set of attributes that can uniquely identify a row (e.g., (player_id), (player_id, name), (citizen_id, age)).
Candidate Key: A minimal superkey. For a player table, candidate keys could be (player_id) or (citizen_id).
Primary Key: The selected candidate key used as the main identifier (e.g., (player_id)).
Foreign Key: An attribute that references the primary key of another table (e.g., team_id in the player table).
Prime Attribute: Any attribute that is part of a candidate key.
Non-Prime Attribute: An attribute not part of any candidate key.

First Normal Form (1NF)

1NF mandates atomicity—each field's value must be an indivisible, minimal data unit.

Second Normal Form (2NF)

A table in 1NF also satisfies 2NF if every non-prime attribute is fully functionally dependent on the entire primary key, not just a part of it.

Problem Example: A player_game table with attributes (player_id, game_id, name, age, game_time, venue, score) uses a composite primary key (player_id, game_id). Issues arise because name and age depend only on player_id, while game_time and venue depend only on game_id. This partial dependency leads to:

Redundancy: Player details repeat for each game; game details repeat for each player.
Insertion Anomaly: Cannot add a new game without assigning a player.
Deletion Anomaly: Deleting a player might accidentally remove game information.
Update Anomaly: Changing a game time requires updates across multiple rows, risking inconsistency.

Solution: Decompose into three tables: player(player_id, name, age), game(game_id, time, venue), and participation(player_id, game_id, score). Each table now satisfies 2NF.

Third Normal Form (3NF)

A table in 2NF satisfies 3NF if no non-prime attribute depends on another non-prime attribute (i.e., no transitive dependencies on the primary key). All non-key attributes must be directly dependent on the primary key.

Summary of Normal Forms

1NF: Ensures column atomicity.
2NF: Ensures complete dependency of non-key columns on the entire primary key.
3NF: Eliminates transitive dependencies between non-key columns.

Benefits: Normalization reduces data redundancy and improves data integrity. Drawbacks: Higher normalization can decrease query performance by increasing the number of required table joins, potentially making certain indexing strategies less effective.

Denormalization

Denormalization intentionally introduces redundancy to improve read performance, trading storage space for query speed. It reduces the need for complex joins.

Considerations for Denormalization:

Increases storage requirements.
Risks data inconsistency if redundant data is not synchronized.
Can complicate data maintenance, especially with frequent updates.
May not provide benefits for small datasets and can overcomplicate design.

Appropriate Use Cases:

When the redundant field is rarely updated but frequently queried.
For preserving historical snapshots (e.g., order shipping addresses that should not change after an order is placed).
In data warehousing scenarios where analysis of historical data is prioritized over real-time update efficiency.

BCNF (Boyce-Codd Normal Form)

BCNF is a stricter form of 3NF. It eliminates scenarios where a prime attribute is functionally dependent on part of a candidate key. A relation R is in BCNF if, for every non-trivial functional dependency X → Y, X is a superkey.

Entity-Relationship (ER) Modeling

ER modeling describes real-world entities, their attributes, and relationships. It aids in clarifying business logic during database design.

Components:

Entity: A real-world object (e.g., Customer, Product). Represented by a rectangle.
Attribute: A property of an entity (e.g., customer_name, product_price). Represented by an oval.
Relationship: An association between entities (e.g., Purchases). Represented by a diamond.

Relationship Cardinalities:

One-to-One (1:1): A single instance of one entity relates to a single instance of another.
One-to-Many (1:M): A single instance of one entity relates to multiple instances of another.
Many-to-Many (M:N): Multiple instances of one entity relate to multiple instances of another.

General Design Guidelines

A common principle is "Fewer Tables, Fewer Fields, Fewer Composite Keys, More Keys" (三少一多).

Fewer Tables: Simplifies the E-R diagram and operations.
Fewer Fields Per Table: Reduces redundancy potential, but balance with retrieval efficiency.
Fewer Fields in Composite Primary Keys: Minimizes index space and complexity.
More Use of Keys (Primary/Foreign): Increases entity reuse and inter-table relationships, enhancing independence and association. This principle emphasizes simplicity and reusability.

Database Object Implementation Standards

Database Naming

Length: Keep within 32 characters.
Characters: Use only lowercase English letters, numbers, and underscores. Start with a letter.
Format: business_subsystem (e.g., crm_fund).
Avoid Keywords: Do not use reserved words like type, order.
Character Set: Explicitly specify as utf8 or utf8mb4 during creation.
Permissions: Application accounts should have minimal, database-specific permissions, ideally without DROP privileges.
Temporary/Backup Databases: Prefix with tmp_ or bak_, suffixed with a date.

Table and Column Naming

Length: Keep names within 32 characters.
Case: Use lowercase with underscores separating words.
Prefix: Use a module-related prefix for tables within the same module (e.g., crm_fund_item).
Character Set & Engine: Explicitly specify (e.g., utf8mb4, InnoDB).
Comment: Every table must have a COMMENT.
Avoid Keywords: Do not use reserved words.
Field Naming: Use meaningful English words/abbreviations (e.g., corp_id, not corporation_id).
Boolean Fields: Prefix with is_ (e.g., is_enabled).
Binary Data: Do not store large files (images, documents) directly. Store file paths instead.
Primary Key: Name the auto-incrementing primary key id (type INT UNSIGNED or BIGINT UNSIGNED). For business identifiers (e.g., user_id, order_id), create a UNIQUE KEY index instead of using them as the primary key to avoid InnoDB page splitting from random inserts.
Audit Columns: Core tables should include create_time and update_time timestamp fields.
NOT NULL: Prefer NOT NULL columns with defined DEFAULT values over NULL.
Consistency: Columns storing the same data across tables must have identical names and data types.
Intermediate/Backup Tables: Prefix with tmp_ or bak_ and clean them periodically.

Example Table Creation:

CREATE TABLE user_info (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
    `user_id` BIGINT NOT NULL COMMENT 'User identifier',
    `full_name` VARCHAR(45) NOT NULL COMMENT 'User full name',
    `email` VARCHAR(30) NOT NULL COMMENT 'Email address',
    `nickname` VARCHAR(45) NOT NULL COMMENT 'Nickname',
    `birth_date` DATE NOT NULL COMMENT 'Date of birth',
    `gender` TINYINT DEFAULT '0' COMMENT 'Gender',
    `bio` VARCHAR(150) DEFAULT NULL COMMENT 'Short biography',
    `resume_path` VARCHAR(300) NOT NULL COMMENT 'Path to resume file',
    `registration_ip` INT NOT NULL COMMENT 'IP at registration',
    `create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    `approval_status` TINYINT NOT NULL COMMENT 'Profile status: 1=Approved, 2=Pending, 3=Rejected',
    PRIMARY KEY (`id`),
    UNIQUE KEY `uk_user_id` (`user_id`),
    KEY `idx_full_name` (`full_name`),
    KEY `idx_created_status` (`create_time`, `approval_status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Base user information';

Indexing Guidelines

Primary Key: For InnoDB, use id (INT/BIGINT AUTO_INCREMENT). Do not update primary key values.
Index Type: Use BTREE for InnoDB/MyISAM.
Naming:
- Primary key: pk_ prefix.
- Unique key: uk_ prefix.
- Regular index: idx_ prefix.
- Use lowercase; suffix with column name/abbreviation (e.g., idx_sample_mid for member_id on sample).
Limit: Aim for ≤6 indexes per table.
Composite Indexes: Prefer composite indexes. Place the most selective column first.
Join Optimization: Ensure indexed join columns on the driven table.
Avoid Redundancy: Eliminate redundant indexes (e.g., if idx(a,b) exists, idx(a) is redundant).

SQL Authoring Practices

SELECT: Specify columns explicitly; avoid SELECT *.
INSERT: Specify column names.
DML Operations: Always include a WHERE clause using an indexed lookup (except for small static tables).
Bulk INSERT: Limit value lists to ~5000 rows per statement to avoid replication lag.
UNION: Prefer UNION ALL over UNION; limit to 5 or fewer union clauses.
JOINs: Limit to 5 tables per JOIN in production.
Sorting/Grouping: Minimize ORDER BY, GROUP BY, DISTINCT in the database. Perform sorting in application logic when possible. Keep filtered result sets for these operations under 1000 rows.
Schema Changes: Consolidate multiple ALTER operations. Major alters on large tables (>100M rows) require review and should run during off-peak hours.
Batch Operations: Control transaction duration with necessary sleep intervals.
Transaction Size: Keep transactions short (≤5 SQL statements).
Updates in Transactions: Use primary key or unique key conditions in UPDATE/DELETE WHERE clauses to minimize gap locks and deadlocks.

Database Performance Tuning Strategies

Tuning Objectives and Diagnosis

Performance tuning aims to maximize resource efficiency, increase throughput, improve response times, and eliminate bottlenecks. Problem identification relies on user feedback, log analysis, and monitoring server resources (CPU, memory, I/O) and internal database metrics (active sessions, locks, transactions).

Tuning Methodology

Step 1: Select the Appropriate DBMS

Choose based on requirements: commercial RDBMS (SQL Server, Oracle) for high transaction integrity; open-source MySQL with InnoDB (transactions) or MyISAM (non-transactional); NoSQL (key-value, document, columnar) for specific scalability or flexibility needs.

Step 2: Optimize Schema Design

Adhere to normalization principles (1NF, 2NF, 3NF) for clean, maintainable schemas.
Apply denormalization strategically for read-heavy, complex query patterns.
Choose optimal data types: use numeric over character types where possible; use CHAR for fixed-length strings, VARCHAR for variable; prefer smaller types.

Step 3: Optimize Logical Queries

Rewrite SQL for efficiency: transform subqueries, simplify conditions, eliminate unnecessary joins, avoid functions on indexed columns in WHERE clauses.

Step 4: Optimize Physical Queries / Access Paths

Focus on effective index creation and usage. Determine optimal access paths: table scan vs. index scan; choose efficient join algorithms (nested loops, hash, merge); optimize multi-table join order to reduce search space.

Step 5: Implement Caching

Use in-memory stores like Redis or Memcached for frequently accessed data to reduce database load.

Step 6: Database-Level Optimizations

Read/Write Splitting: Direct writes to a master, reads to replicas.
Sharding (Partitioning): Distribute data across multiple database instances. Increases complexity.

MySQL Server Configuration Tuning

Hardware Optimization

Increase memory for caching.
Use high-speed disk systems (e.g., SSDs).
Distribute disk I/O effectively.
Utilize multiple processors.

Key Configuration Parameters (`my.cnf` / `my.ini`)

innodb_buffer_pool_size: Cache for InnoDB data and indexes. Critical for performance. Set to ~70-80% of available memory on a dedicated server.
key_buffer_size: Size of the buffer used for MyISAM index caching (if using MyISAM tables).
table_open_cache: Number of open tables cached. Increase if Opened_tables status variable is high.
query_cache_type & query_cache_size: Query caching (deprecated in MySQL 8.0). Use with caution.
sort_buffer_size & join_buffer_size: Per-connection buffers for sorting and join operations without indexes. Increase cautiously.
read_buffer_size: Buffer for sequential table scans.
innodb_flush_log_at_trx_commit: Durability vs. performance trade-off for InnoDB redo log flushing. (1=fully durable, 2=commit writes log buffer to OS cache every second, 0=log buffer to OS cache every second).
innodb_log_buffer_size: Size of the InnoDB redo log buffer.
max_connections: Maximum permitted simultaneous client connections.
back_log: Connection request backlog size.
thread_cache_size: Number of threads cached for reuse.
wait_timeout / interactive_timeout: Seconds a connection can be idle before being closed.

Schema Optimization Techniques

1. Splitting Tables (Hot/Cold Data Separation)

Separate frequently accessed columns from infrequently accessed ones into different tables (e.g., users and user_profiles).

2. Introducing Summary / Aggregate Tables

For expensive, frequent aggregate queries, create and periodically refresh a summary table.

3. Adding Redundant Columns

Intentionally add redundant columns to expensive join paths, trading storage for query speed.

4. Optimizing Data Types

Use the smallest data type that accommodates the required range (e.g., TINYINT vs INT).
Prefer UNSIGNED for non-negative integers.
Store IP addresses as integers (INET_ATON(), INET_NTOA()).
Avoid TEXT/BLOB types if possible; store separately.
Use TIMESTAMP (4 bytes) over DATETIME (8 bytes) for time within its range.
Use DECIMAL for precise fixed-point arithmetic (financial).
Use ENUM sparingly due to alteration costs; consider TINYINT.

5. Optimizing Data Insertion Speed

For InnoDB:

Temporarily disable unique checks (SET UNIQUE_CHECKS=0).
Temporarily disable foreign key checks (SET FOREIGN_KEY_CHECKS=0).
Use multi-row INSERT statements.
For bulk loads, use LOAD DATA INFILE.
Disable autocommit and commit in batches.

6. Using `NOT NULL` Constraints

Improves storage efficiency and simplifies comparisons.
Makes indexing more efficient.

7. Table Maintenance

ANALYZE TABLE tbl_name: Updates index cardinality statistics (read lock).
CHECK TABLE tbl_name: Checks table for errors (read lock).
OPTIMIZE TABLE tbl_name: Reclaims unused space and defragments tables with variable-length rows (e.g., after many DELETEs). Equivalent to ALTER TABLE ... FORCE for InnoDB.

Large Table Optimization Strategies

Limit Query Range: Always scope queries with WHERE conditions (e.g., last 30 days).
Read/Write Splitting: Separate reads to replicas.
Vertical Partitioning: Split a table by columns into different tables/servers. Reduces I/O per row but can increase join complexity.
Horizontal Partitioning / Sharding: Split a table by rows across multiple databases/servers.
- Client-side Sharding: Logic embedded in application (e.g., ShardingSphere, TDDL).
- Proxy-based Sharding: Logic in a middleware layer (e.g., MyCat, Vitess).

Additional Tuning Considerations

Statement Timeout: Use SET [GLOBAL|SESSION] MAX_EXECUTION_TIME=<milliseconds> in MySQL 8.0+ to limit query runtime.
Hidden Indexes (MySQL 8.0+): Mark an index as invisible to the optimizer (ALTER TABLE ... ALTER INDEX ... INVISIBLE) for testing performance impact without dropping the index.
General Tablespaces: Create shared tablespaces for better storage management.

Tags: MySQL Database Design Normalization Performance Tuning SQL Optimization

Posted on Mon, 08 Jun 2026 17:50:19 +0000 by GoodWill

Freaks City