Data Definition and Schema Management
Table Construction and Inspection
Defining the schema involves specifying column names, data types, and constraints. Tables can be inspected using metadata commands to verify structure.
Data Type Selection
Choosing the correct storage type impacts efficiency and accuracy.
Numeric Types Integers and decimals serve mathematical operations.
String Types Char and Varchar handle text, with Varchar being variable length.
Temporal Types
Both DATETIME and TIMESTAMP store date and time combinations including year, month, day, hour, minute, and second. They often support fractional seconds.
Distinctions
- Range:
DATETIMEsupports a vast range ('1000-01-01' to '9999-12-31'), suitable for historical or future dates.TIMESTAMPis limited by the Unix epoch ('1970-01-01' to '2038-01-19'), constrained by 32-bit integers. - Time Zone Awareness:
DATETIMEstores values exactly as inserted without conversion.TIMESTAMPconverts input from the session time zone to UTC for storage and converts back to the session time zone upon retrieval. - Automatic Updates:
TIMESTAMPcolumns can be configured to auto-update on row modification usingDEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP.DATETIMElacks this automatic behavior. - Storage Footprint:
DATETIMEtypically consumes 8 bytes.TIMESTAMPuses 4 bytes. - Usage Context: Use
DATETIMEfor absolute times like birth dates or scheduled events where time zone conversion is undesirable. UseTIMESTAMPfor auditing records likecreated_atorupdated_atwhere global consistency across time zones is required.
Schema Alterations
Structure changes are performed via alteration statements.
ALTER TABLE employees CHANGE COLUMN dept_code department_id VARCHAR(30);
Data Manipulation Operations
Insertion Records
Bulk insertion improves performance over single-row statements.
INSERT INTO employees (emp_id, full_name, gender, age, birth_date, location)
VALUES (1001, 'John Doe', 'M', 25, '1998-05-10', 'New York'),
(1002, 'Jane Smith', 'F', 30, '1993-11-20', 'London'),
(1003, 'Alice Johnson', 'F', 28, '1995-03-15', 'Paris');
Updates and Deletions
Modifications require careful filtering to avoid unintended data loss.
Integrity Constraints
Primary Keys
Unique identifiers ensure row uniqueness. Single Column: Defined directly on the column. Composite Keys: Multiple columns form a unique identifier; none can be NULL.
Auto Increment
Automatically generates sequential numeric values for primary keys.
Not Null and Unique
Not Null: Enforces value presence. Unique: Ensures distinct values across rows. Note that multiple NULL values are permitted in a unique column as NULL is not equal to NULL.
ALTER TABLE employees DROP INDEX unique_email;
Defaults and Zero Fill
Default values apply when none are provided. Zero fill pads numeric values with leading zeros (e.g., 123 becomes 0000000123 in a defined width).
Foreign Keys
Enforces referential integrity between tables.
CREATE TABLE divisions (
div_id VARCHAR(20) PRIMARY KEY,
div_name VARCHAR(20)
);
CREATE TABLE personnel (
person_id VARCHAR(20) PRIMARY KEY,
person_name VARCHAR(20),
age INT,
division_id VARCHAR(20),
CONSTRAINT fk_div FOREIGN KEY (division_id) REFERENCES divisions(div_id)
);
ALTER TABLE personnel ADD CONSTRAINT fk_div FOREIGN KEY(division_id) REFERENCES divisions(div_id);
Deleting data from a parent table referenced by a child table is restricted to maintain integrity, whereas child records can typically be removed freely.
Data Querying Techniques
Basic Selection and Filtering
Queries utilize operators for filtering and sorting. Sorting applies to numeric, string, and temporal data types.
Aggregation and Grouping
Aggregates compute summary data.
SELECT category_id, SUM(price) AS total_value
FROM inventory
GROUP BY category_id
HAVING total_value >= 6000
ORDER BY total_value DESC;
When grouping, the SELECT clause should only contain grouped columns or aggregate functions. Filtering groups requires HAVING, not WHERE.
Pagination
Retrieving specific pages involves calculating offsets.
-- Fetch page n with 50 items per page
SELECT * FROM inventory LIMIT 50 OFFSET (n - 1) * 50;
-- Fetch first 5 items of page n
SELECT * FROM inventory LIMIT 5 OFFSET (n - 1) * 50;
Data can be aggregated into summary tables via insertion from selection.
INSERT INTO inventory_summary SELECT category_id, COUNT(*) FROM inventory GROUP BY category_id;
Pattern Matching
Regular expressions provide advanced filtering capabilities.
-- Starts with 'Item'
SELECT * FROM inventory WHERE item_name REGEXP '^Item';
-- Ends with 'Item'
SELECT * FROM inventory WHERE item_name REGEXP 'Item$';
-- Complex pattern
SELECT * FROM inventory WHERE item_name REGEXP '(ab){2,4}';
Patterns match sequences, with quantifiers controlling repetition.
Multi-Table Relationships
Join Types
Inner Join: Returns only matching records from both tables. Outer Join: Includes non-matching records.
- Left Join: All records from the left table, NULLs for missing right matches.
- Right Join: All records from the right table, NULLs for missing left matches.
- Full Join: All records from both tables (simulated in MySQL using UNION).
Union: Combines result sets, removing duplicates. UNION ALL retains duplicates.
Subqueries
Subqueries nest within main queries. Scalar: Returns a single value.
SELECT employee_name, salary
FROM staff
WHERE salary > (SELECT AVG(salary) FROM staff);
Row: Returns a single row with multiple columns.
SELECT employee_name, salary, dept_id
FROM staff
WHERE (salary, dept_id) = (SELECT MAX(salary), dept_id FROM staff);
Table: Returns a dataset used as a temporary table or in IN clauses.
SELECT employee_name, salary, dept_id
FROM staff
WHERE (dept_id, salary) IN (
SELECT dept_id, MAX(salary)
FROM staff
GROUP BY dept_id
);
Correlated subqueries reference outer table columns, ensuring logical dependencies like manager IDs existing within the employee ID set.
Functions and Analytics
Aggregate Functions
String aggregation concatenates values.
SELECT GROUP_CONCAT(emp_name SEPARATOR ';') FROM personnel;
SELECT department, GROUP_CONCAT(emp_name ORDER BY salary DESC SEPARATOR ';')
FROM personnel GROUP BY department;
Window Functions
Perform calculations across rows related to the current row.
Ranking: ROW_NUMBER(), RANK(), DENSE_RANK() identify top earners per department.
Running Totals: SUM() OVER() accumulates values. With out ORDER BY, it sums the entire partition. With ORDER BY, it performs a running sum.
-- Sum from start to current row
SUM(salary) OVER (PARTITION BY dept ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-- Sum of current row and 3 preceding rows
SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
Distribution: CUME_DIST() and PERCENT_RANK() calculate relative standing.
Navigation: LAG() and LEAD() access previous or subsequent rows. FIRST_VALUE() and LAST_VALUE() retrieve boundary values within a window.
Views and Programmability
Views
Virtual tables simplify complex queries.
CREATE OR REPLACE VIEW staff_view AS
SELECT emp_name, job_title FROM personnel;
Views are queried like standard tables.
Stored Procedures
Encapsulate logic server-side. Delimiters may need changing (e.g., to //).
Variables: Local, User-defined, and System variables manage state.
Control Flow: IF, CASE, WHILE, REPEAT, LOOP handle logic. LEAVE and ITERATE control loop execution.
Cursors: Iterate over result sets row-by-row. Handlers manage exceptions, such as error code 1329 for no data found.
Functions: Return single values compared to procedures.
Triggers
Automate actions on data events (INSERT, UPDATE, DELETE) using NEW and OLD references.
Indexing and Storage
Index Types
Normal: Standard B-Tree lookup. Unique: Enforces uniqueness. Primary: Implicitly unique and not null. Composite: Multiple columns. Full-Text: Optimized for text search (best created after data load). Spatial: For geometric data.
Index Structures
Hash: Direct key-value lookup. B-Tree: Balanced tree structure for range queries. B+ Tree: Common in MySQL, leaves contain data pointers.
Optimization Strategies
Explain Plan: Analyze id, select_type, type, and key_len to understand execution.
Leftmost Prefix: Composite indexes require matching the leftmost columns first.
-- Index on (name, status)
WHERE name = 'John' -- Uses index
WHERE name = 'John' AND status = 'Active' -- Uses index
WHERE status = 'Active' -- Skips index
Cardinality: High cardinality columns (unique values) benefit more from indexing than low cardinality ones (e.g., gender).
Bulk Loading: LOAD DATA is faster than individual inserts. Sorting data by primary key before loading improves efficiency. Temporarily disabling unique checks during bulk loads can speed up insertion.
Transactions and Concurrency
ACID Properties
Transactions ensure Atomicity, Consistency, Isolation, and Durability. Rollbacks undo uncommitted changes.
Isolation Levels
Read Uncommitted: Allows dirty reads. Read Committed: Prevents dirty reads, allows non-repeatable reads. Repeatable Read: Default in MySQL. Prevents non-repeatable reads but may allow phantoms (though MVCC mitigates this). Serializable: Highest isolation, locks ranges.
Locking Mechanisms
Table Locks: Lock the entire table; restrictive for concurrency. Row Locks: Lock specific records; higher concurrency support.
Logging and Diagnostics
Log Types
Error Log: Records startup/shutdown and critical errors. Binary Log: Records all changes for replication and recovery. Query Log: Records all queries (high overhead). Slow Query Log: Captures queries exceeding time thresholds.
Performence Tuning
Profiling: SHOW PROFILES analyzes execution time per stage.
Optimizer Trace: Provides deep insight into query optimization decisions.
SQL Optimization:
- Avoid
SELECT *. - Optimize
ORDER BYto use indexes. - Replace subqueries with joins where possible.
- Optimize
LIMITwith large offsets by using deferred join techniques.
Connectivity
Application connectivity via JDBC or PyMySQL requires managing connection pools and preparing statements to prevent SQL injection. Result sets should be processed efficiently to minimize memory usage.