MySQL Database Architecture, Querying, and Performance Tuning

Data Definition and Schema Management

Table Construction and Inspection

Defining the schema involves specifying column names, data types, and constraints. Tables can be inspected using metadata commands to verify structure.

Data Type Selection

Choosing the correct storage type impacts efficiency and accuracy.

Numeric Types Integers and decimals serve mathematical operations.

String Types Char and Varchar handle text, with Varchar being variable length.

Temporal Types Both DATETIME and TIMESTAMP store date and time combinations including year, month, day, hour, minute, and second. They often support fractional seconds.

Distinctions

  1. Range: DATETIME supports a vast range ('1000-01-01' to '9999-12-31'), suitable for historical or future dates. TIMESTAMP is limited by the Unix epoch ('1970-01-01' to '2038-01-19'), constrained by 32-bit integers.
  2. Time Zone Awareness: DATETIME stores values exactly as inserted without conversion. TIMESTAMP converts input from the session time zone to UTC for storage and converts back to the session time zone upon retrieval.
  3. Automatic Updates: TIMESTAMP columns can be configured to auto-update on row modification using DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP. DATETIME lacks this automatic behavior.
  4. Storage Footprint: DATETIME typically consumes 8 bytes. TIMESTAMP uses 4 bytes.
  5. Usage Context: Use DATETIME for absolute times like birth dates or scheduled events where time zone conversion is undesirable. Use TIMESTAMP for auditing records like created_at or updated_at where global consistency across time zones is required.

Schema Alterations

Structure changes are performed via alteration statements.

ALTER TABLE employees CHANGE COLUMN dept_code department_id VARCHAR(30);

Data Manipulation Operations

Insertion Records

Bulk insertion improves performance over single-row statements.

INSERT INTO employees (emp_id, full_name, gender, age, birth_date, location) 
VALUES (1001, 'John Doe', 'M', 25, '1998-05-10', 'New York'),
       (1002, 'Jane Smith', 'F', 30, '1993-11-20', 'London'),
       (1003, 'Alice Johnson', 'F', 28, '1995-03-15', 'Paris');

Updates and Deletions

Modifications require careful filtering to avoid unintended data loss.

Integrity Constraints

Primary Keys

Unique identifiers ensure row uniqueness. Single Column: Defined directly on the column. Composite Keys: Multiple columns form a unique identifier; none can be NULL.

Auto Increment

Automatically generates sequential numeric values for primary keys.

Not Null and Unique

Not Null: Enforces value presence. Unique: Ensures distinct values across rows. Note that multiple NULL values are permitted in a unique column as NULL is not equal to NULL.

ALTER TABLE employees DROP INDEX unique_email;

Defaults and Zero Fill

Default values apply when none are provided. Zero fill pads numeric values with leading zeros (e.g., 123 becomes 0000000123 in a defined width).

Foreign Keys

Enforces referential integrity between tables.

CREATE TABLE divisions (
  div_id VARCHAR(20) PRIMARY KEY,
  div_name VARCHAR(20)
);

CREATE TABLE personnel (
  person_id VARCHAR(20) PRIMARY KEY,
  person_name VARCHAR(20),
  age INT,
  division_id VARCHAR(20),
  CONSTRAINT fk_div FOREIGN KEY (division_id) REFERENCES divisions(div_id)
);

ALTER TABLE personnel ADD CONSTRAINT fk_div FOREIGN KEY(division_id) REFERENCES divisions(div_id);

Deleting data from a parent table referenced by a child table is restricted to maintain integrity, whereas child records can typically be removed freely.

Data Querying Techniques

Basic Selection and Filtering

Queries utilize operators for filtering and sorting. Sorting applies to numeric, string, and temporal data types.

Aggregation and Grouping

Aggregates compute summary data.

SELECT category_id, SUM(price) AS total_value 
FROM inventory 
GROUP BY category_id 
HAVING total_value >= 6000 
ORDER BY total_value DESC;

When grouping, the SELECT clause should only contain grouped columns or aggregate functions. Filtering groups requires HAVING, not WHERE.

Pagination

Retrieving specific pages involves calculating offsets.

-- Fetch page n with 50 items per page
SELECT * FROM inventory LIMIT 50 OFFSET (n - 1) * 50;

-- Fetch first 5 items of page n
SELECT * FROM inventory LIMIT 5 OFFSET (n - 1) * 50;

Data can be aggregated into summary tables via insertion from selection.

INSERT INTO inventory_summary SELECT category_id, COUNT(*) FROM inventory GROUP BY category_id;

Pattern Matching

Regular expressions provide advanced filtering capabilities.

-- Starts with 'Item'
SELECT * FROM inventory WHERE item_name REGEXP '^Item';

-- Ends with 'Item'
SELECT * FROM inventory WHERE item_name REGEXP 'Item$';

-- Complex pattern
SELECT * FROM inventory WHERE item_name REGEXP '(ab){2,4}';

Patterns match sequences, with quantifiers controlling repetition.

Multi-Table Relationships

Join Types

Inner Join: Returns only matching records from both tables. Outer Join: Includes non-matching records.

  • Left Join: All records from the left table, NULLs for missing right matches.
  • Right Join: All records from the right table, NULLs for missing left matches.
  • Full Join: All records from both tables (simulated in MySQL using UNION).

Union: Combines result sets, removing duplicates. UNION ALL retains duplicates.

Subqueries

Subqueries nest within main queries. Scalar: Returns a single value.

SELECT employee_name, salary
FROM staff
WHERE salary > (SELECT AVG(salary) FROM staff);

Row: Returns a single row with multiple columns.

SELECT employee_name, salary, dept_id
FROM staff
WHERE (salary, dept_id) = (SELECT MAX(salary), dept_id FROM staff);

Table: Returns a dataset used as a temporary table or in IN clauses.

SELECT employee_name, salary, dept_id
FROM staff
WHERE (dept_id, salary) IN (
    SELECT dept_id, MAX(salary)
    FROM staff
    GROUP BY dept_id
);

Correlated subqueries reference outer table columns, ensuring logical dependencies like manager IDs existing within the employee ID set.

Functions and Analytics

Aggregate Functions

String aggregation concatenates values.

SELECT GROUP_CONCAT(emp_name SEPARATOR ';') FROM personnel;

SELECT department, GROUP_CONCAT(emp_name ORDER BY salary DESC SEPARATOR ';') 
FROM personnel GROUP BY department;

Window Functions

Perform calculations across rows related to the current row. Ranking: ROW_NUMBER(), RANK(), DENSE_RANK() identify top earners per department. Running Totals: SUM() OVER() accumulates values. With out ORDER BY, it sums the entire partition. With ORDER BY, it performs a running sum.

-- Sum from start to current row
SUM(salary) OVER (PARTITION BY dept ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

-- Sum of current row and 3 preceding rows
SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)

Distribution: CUME_DIST() and PERCENT_RANK() calculate relative standing. Navigation: LAG() and LEAD() access previous or subsequent rows. FIRST_VALUE() and LAST_VALUE() retrieve boundary values within a window.

Views and Programmability

Views

Virtual tables simplify complex queries.

CREATE OR REPLACE VIEW staff_view AS 
SELECT emp_name, job_title FROM personnel;

Views are queried like standard tables.

Stored Procedures

Encapsulate logic server-side. Delimiters may need changing (e.g., to //). Variables: Local, User-defined, and System variables manage state. Control Flow: IF, CASE, WHILE, REPEAT, LOOP handle logic. LEAVE and ITERATE control loop execution. Cursors: Iterate over result sets row-by-row. Handlers manage exceptions, such as error code 1329 for no data found. Functions: Return single values compared to procedures.

Triggers

Automate actions on data events (INSERT, UPDATE, DELETE) using NEW and OLD references.

Indexing and Storage

Index Types

Normal: Standard B-Tree lookup. Unique: Enforces uniqueness. Primary: Implicitly unique and not null. Composite: Multiple columns. Full-Text: Optimized for text search (best created after data load). Spatial: For geometric data.

Index Structures

Hash: Direct key-value lookup. B-Tree: Balanced tree structure for range queries. B+ Tree: Common in MySQL, leaves contain data pointers.

Optimization Strategies

Explain Plan: Analyze id, select_type, type, and key_len to understand execution. Leftmost Prefix: Composite indexes require matching the leftmost columns first.

-- Index on (name, status)
WHERE name = 'John' -- Uses index
WHERE name = 'John' AND status = 'Active' -- Uses index
WHERE status = 'Active' -- Skips index

Cardinality: High cardinality columns (unique values) benefit more from indexing than low cardinality ones (e.g., gender). Bulk Loading: LOAD DATA is faster than individual inserts. Sorting data by primary key before loading improves efficiency. Temporarily disabling unique checks during bulk loads can speed up insertion.

Transactions and Concurrency

ACID Properties

Transactions ensure Atomicity, Consistency, Isolation, and Durability. Rollbacks undo uncommitted changes.

Isolation Levels

Read Uncommitted: Allows dirty reads. Read Committed: Prevents dirty reads, allows non-repeatable reads. Repeatable Read: Default in MySQL. Prevents non-repeatable reads but may allow phantoms (though MVCC mitigates this). Serializable: Highest isolation, locks ranges.

Locking Mechanisms

Table Locks: Lock the entire table; restrictive for concurrency. Row Locks: Lock specific records; higher concurrency support.

Logging and Diagnostics

Log Types

Error Log: Records startup/shutdown and critical errors. Binary Log: Records all changes for replication and recovery. Query Log: Records all queries (high overhead). Slow Query Log: Captures queries exceeding time thresholds.

Performence Tuning

Profiling: SHOW PROFILES analyzes execution time per stage. Optimizer Trace: Provides deep insight into query optimization decisions. SQL Optimization:

  • Avoid SELECT *.
  • Optimize ORDER BY to use indexes.
  • Replace subqueries with joins where possible.
  • Optimize LIMIT with large offsets by using deferred join techniques.

Connectivity

Application connectivity via JDBC or PyMySQL requires managing connection pools and preparing statements to prevent SQL injection. Result sets should be processed efficiently to minimize memory usage.

Tags: MySQL database sql Optimization Backend

Posted on Thu, 11 Jun 2026 17:44:32 +0000 by Grayda