Fundamental Data Types and Storage Classes
Understanding how relational databases store information is critical for schema design. The most frequently utilized data categories include:
| Data Type | Purpose & Characteristics |
|---|---|
INT / BIGINT |
Stores whole numbers. Suitable for primary keys, counts, and identifiers. |
DECIMAL(m, n) |
Fixed-point numeric type. m defines total digit count, n specifies decimal precision. Ideal for financial or scientific calculations. |
VARCHAR(n) |
Variable-length string storage. Occupies only the space required plus overhead, up to n characters. |
BLOB / TEXT |
Handles large binary or textual objects such as images, documents, or logs. |
DATE |
Stores calendar dates (YYYY-MM-DD format) without time components. |
TIMESTAMP / DATETIME |
Records precise date and time (YYYY-MM-DD HH:MM:SS), often tracking creation or modification events. |
Database and Table Lifecycle Management
Schema operations follow standard DDL conventions. Using backticks or double quotes for identifiers prevents collisions with reserved keywords.
-- Provision a new schema
CREATE DATABASE IF NOT EXISTS `enterprise_catalog`;
USE `enterprise_catalog`;
-- Define a structured table with constraints
CREATE TABLE `project_members` (
`member_code` INT AUTO_INCREMENT PRIMARY KEY,
`full_name` VARCHAR(50) NOT NULL,
`department` VARCHAR(30) DEFAULT 'General',
`onboard_date` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Populate records with explicit column mapping
INSERT INTO `project_members` (`full_name`, `department`)
VALUES ('Alex Chen', 'Engineering');
-- Insert multiple rows in a single transaction
INSERT INTO `project_members` (`full_name`, `department`, `onboard_date`)
VALUES ('Maria Rossi', 'Analytics', '2023-05-12 09:00:00'),
('David Kim', 'Engineering', '2023-06-01 10:30:00');
To modify an existing schema dynamically:
-- Append a new attribute to an existing table
ALTER TABLE `project_members` ADD COLUMN `performance_score` DECIMAL(4,2);
-- Remove an obsolete column
ALTER TABLE `project_members` DROP COLUMN `performance_score`;
Constraint Enforcement and Data Integrity
Constraints guarantee reliability and prevent invalid state transitions during DML operations:
NOT NULL: Blocks insertion or updates that would leave a column empty.UNIQUE: Ensures every row contains a distinct value for the specified column.DEFAULT 'value': Automatically supplies a fallback when no explicit input is provided.AUTO_INCREMENT: Generates sequential integers automatically, heavily utilized for surrogate primary keys.
Data Manipulation and Query Execution
Core DML statements manage the lifecycle of stored records:
INSERT INTO: Appends new rows to a table.UPDATE: Modifies existing records based on a filtering condition.DELETE FROM: Removes specific rows (always pair with aWHEREclause in production to avoid full table wipes).DROP: Permanently eliminates entire tables or databases from the schema.
Query retrieval supports ordering, pagination, and value filtering:
-- Retrieve unique departments, sorted descending, limiting to top 3 results
SELECT DISTINCT department
FROM project_members
ORDER BY department DESC
LIMIT 3;
-- Filter rows using pattern matching and set inclusion
SELECT full_name, department
FROM project_members
WHERE department IN ('Engineering', 'Product')
OR full_name LIKE '%son';
Aggregatoin and Post-Group Filtering
Aggregate functions compute metrics acros multiple rows. Common operators include COUNT(), AVG(), SUM(), MIN(), and MAX().
Common Pitfall: COUNT(*) tallies every row regardless of nulls, whereas COUNT(column_name) skips NULL entries. Conversely, SUM() ignores NULL values entirely without throwing errors, and treats 0 as a valid additive operand.
The HAVING clause filters aggregated groups, whereas WHERE filters raw rows before grouping occurs:
-- Identify departments with an average salary exceeding 85,000
SELECT department, COUNT(member_code) AS headcount, AVG(salary) AS avg_pay
FROM project_members
GROUP BY department
HAVING avg_pay > 85000;
-- Incorrect approach (WHERE cannot reference aggregates)
-- WHERE avg_pay > 85000 -- Syntax Error
Set Operations and Relational Joins
| Operator | Behavior |
|---|---|
UNION |
Merges result sets from two queries. Requires identical column counts and compatible data types. Automatically removes duplicates. |
JOIN ... ON |
Combines rows based on a matching predicate, returning only intersecting records. |
LEFT JOIN |
Returns all records from the left table, padding unmatched right-side columns with NULL. |
RIGHT JOIN |
Returns all records from the right table, padding unmatched left-side columns with NULL. |
Date and Time Manipulation
Temporal calculations rely on built-in functions (syntax may vary slightly between MySQL and SQL Server):
CURRENT_TIMESTAMP/GETDATE(): Returns the exact execution time.DATE_ADD()/DATEADD(): Shifts a date by a specified interval (e.g., days, months).DATEDIFF(): Computes the span between two dates.EXTRACT()/DATEPART(): Isolates specific components like year, month, or hour from a datetime value.
Conditional Logic and NULL Coalescing
The CASE expression enables conditional branching within queries:
SELECT
full_name,
CASE
WHEN performance_score >= 90 THEN 'Excellent'
WHEN performance_score >= 75 THEN 'Satisfactory'
ELSE 'Needs Improvement'
END AS rating
FROM project_members;
For handling missing values, COALESCE() returns the first non-NULL expression from a list, streamlining fallback logic.
Numeric and String Function Pitfalls
ROUND(x, d)applies standard arithmetic rounding toddecimal places.TRUNCATE(x, d)orTRUNC()simply discards digits beyonddwithout rounding.MOD(a, b)ora % bcalculates the remainder.POWER(a, b)orPOW(a, b)computes exponentiation.LENGTH()orCHAR_LENGTH()measures string size. An empty string''yields0, whileNULLreturnsNULL.STUFF(string, start, length, replacement)deletes a substring and inserts new content at the specified position.VALUESis the standard syntax for multi-row insertion. The legacyVALUEkeyword is largely deprecated and should be avoided.SELECT ... INTO new_table FROM ...creates a new table implicitly and copies data (DDL/DML hybrid), whereasINSERT INTO existing_table SELECT ...populates an already defined structure.
Advanced Objects and Transaction Controls
Cursors enable row-by-row processing when set-based operations are insufficient:
DECLARE emp_cursor CURSOR FOR
SELECT full_name, base_salary FROM payroll_records WHERE status = 'ACTIVE';
DECLARE @emp_name VARCHAR(100);
DECLARE @current_pay DECIMAL(10,2);
DECLARE @status_code INT;
OPEN emp_cursor;
FETCH NEXT FROM emp_cursor INTO @emp_name, @current_pay;
SET @status_code = @@FETCH_STATUS;
WHILE @status_code = 0
BEGIN
-- Apply custom row-level logic here
IF @current_pay > 100000 THEN
UPDATE payroll_records SET tier = 'Senior' WHERE full_name = @emp_name;
END IF;
FETCH NEXT FROM emp_cursor INTO @emp_name, @current_pay;
SET @status_code = @@FETCH_STATUS;
END;
CLOSE emp_cursor;
DEALLOCATE emp_cursor;
Views encapsulate complex queries into virtual tables, enhancing security and abstraction. User-Defined Functions (UDFs) allow custom logic reuse, though they remain database-scoped and cannot cross schema boundaries without explicit references.
Privilege Management controls access via GRANT and REVOKE. Administrators assign system roles (e.g., CONNECT, RESOURCE) or object-level permissions (SELECT, UPDATE, DELETE). Always follow the principle of least privilege.
Transaction Isolation Levels dictate how concurrent transactions interact:
- Read Uncommitted: Allows dirty reads; lowest consistency, highest concurrency.
- Read Committed: Prevents dirty reads but permits non-repeatable reads.
- Repeatable Read: Guarantees consistent reads within a transaction, preventing non-repeatable reads but potentially allowing phantom rows.
- Serializable: Strictest level; executes transactions sequential to eliminate all anomalies, at the cost of performance.