Multi-Version Concurrency Control (MVCC)
1. What is MVCC?
MVCC stands for Multi-Version Concurrency Control. It allows concurrency control by managing multiple versions of data rows. This technology makes consistent reads possible under InnoDB transaction isolation levels. It allows reading rows that are being updated by another transaction, seeing their values before modification, without waiting for locks to be released.
MVCC does not have a formal standard and its implementation varies across DBMS. This chapter focuses on InnoDB's MVCC implementation.
2. Snapshot Read vs. Current Read
MVCC in MySQL InnoDB primarily improves database concurrency by handling read-write conflicts better, enabling lock-free, non-blocking concurrent reads (snapshot reads), as opposed to current reads (which are lock-based and pessimistic). MVCC is essentially an implementation of optimistic locking.
2.1 Snapshot Read
A snapshot read, also known as a consistent read, reads snapshot data. Simple SELECT statements that do not lock are snapshot reads, i.e., non-blocking reads without locks. Snapshot reads are based on MVCC and often avoid locking overhead to improve concurrency. Snapshot reads may return historical versions, not necessarily the latest data. This requires an isolation level other than SERIALIZABLE; SERIALIZABLE degrades snapshot reads to current reads.
2.2 Current Read
A currant read reads the latest version of a record and locks it to prevent modification by other concurrent transactions. Locked SELECT queries, as well as INSERT, UPDATE, and DELETE operations, use current reads.
Examples:
SELECT * FROM student LOCK IN SHARE MODE; -- Shared lock
SELECT * FROM student FOR UPDATE; -- Exclusive lock
INSERT INTO student VALUES ...; -- Exclusive lock
DELETE FROM student WHERE ...; -- Exclusive lock
UPDATE student SET ...; -- Exclusive lock
3. Background Review
3.1 Transaction Isolation Levels
There are 4 isolation levels addressing 3 concurrency problems:
![image-20220405153617536]
![image-20220405153632021]
3.2 Hidden Fields and Undo Log Version Chain
For InnoDB tables, each clustered index record contains two essential hidden columns:
trx_id: The transaction ID that last modified the record.roll_pointer: A pointer to the previous version of the record stored in the Undo log.
4. ReadView: The Core of MVCC
MVCC relies on hidden fields, Undo Log, and ReadView.
4.1 What is a ReadView?
A ReadView is a read view created when a transaction performs a snapshot read. When a transaction starts, InnoDB creates a snapshot of the current database system and constructs an array of active transaction IDs (those that have started but not yet committed).
4.2 Design Principles
- For
READ UNCOMMITTED, the latest record version is read directly (uncommitted changes are visible). - For
SERIALIZABLE, InnoDB uses locks to access records. - For
READ COMMITTEDandREPEATABLE READ, the transaction must only see changes that have been committed. The key problem is determining which version in the version chain is visible to the current transaction. The ReadView solves this.
The ReadView contains 4 important fields:
creator_trx_id: The ID of the transaction that created this ReadView.- Note: Transaction IDs are only assigned when modifying records (INSERT, DELETE, UPDATE); read-only transactions have an ID of 0.
trx_ids: A list of IDs of all active read-write transactions at the time of ReadView creation.up_limit_id: The smallest ID among active transactions.low_limit_id: The next transaction ID to be assigned. It is the maximum transaction ID in the system (not necessarily the maximum in trx_ids). Example: If transactions 1, 2, 3 exist, and transaction 3 commits, a new read transaction will have trx_ids = [1,2], up_limit_id = 1, and low_limit_id = 4.
4.3 ReadView Visibility Rules
To determine if a record version is visible for a given ReadView:
- If
trx_idof the version equalscreator_trx_id, it's a modification by the current transaction, so the version is visible. - If
trx_id<up_limit_id, the version was created by a transaction that committed before the ReadView, so it is visible. - If
trx_id>=low_limit_id, the version was created by a transaction started after the ReadView, so it is not visible. - If
up_limit_id<=trx_id<low_limit_id:- If
trx_idis intrx_ids, the transaction was still active when the ReadView was created, so the version is not visible. - Otherwise, the transaction had already committed, so the version is visible.
- If
4.4 MVCC Read Operation Flow
- Get the transaction's own ID.
- Get the ReadView.
- Compare the retrieved data's version with the ReadView.
- If the version does not match the rules, fetch an older snapshot from the Undo Log.
- Return the data that matches the rules.
For READ COMMITTED, each SELECT in a transaction fetches a new ReadView:
![image-20220405154948505]
Note: Different ReadViews may cause non-repeatable reads or phantoms.
For REPEATABLE READ, the first SELECT fetches a ReadView, and subsequent SELECTs reuse it:
![image-20220405155041964]
5. Examples
5.1 Under READ COMMITTED
A new ReadView is generated before each data read.
5.2 Under REPEATABLE READ
Only the first query generates a ReadView; subsequent queries reuse it.
5.3 Solving Phantom Reads
Table student initially has one row (id=1, trx_id=10).
![image-20220405155640520]
Transaction A (id=20) and Transaction B (id=30) execute concurrently.
Step 1: Transaction A runs:
SELECT * FROM student WHERE id >= 1;
Before querying, MySQL generates a ReadView for A: trx_ids=[20,30], up_limit_id=20, low_limit_id=31, creator_trx_id=20.
The row with id=1 has trx_id=10, which is less than up_limit_id (20), so it is visible. Transaction A sees one row (id=1).
Step 2: Transaction B inserts two new rows and commits:
INSERT INTO student(id,name) VALUES(2,'Li Si');
INSERT INTO student(id,name) VALUES(3,'Wang Wu');
The table now has three rows.
![image-20220405155909223]
Step 3: Transaction A executes the same SELECT again. Since it is REPEATABLE READ, A reuses the same ReadView. All three rows match the WHERE clause, so they are retrieved. Then ReadView rules are applied:
- Row id=1:
trx_id=10<up_limit_id→ visible. - Rows id=2 and 3:
trx_id=30is betweenup_limit_id(20) andlow_limit_id(31). Since 30 is intrx_ids([20,30]), these rows are not visible (they were created by a concurrent, active transaction at ReadView creation time).
![image-20220405155941753]
Conclusion: Transaction A sees only the original row (id=1), same as before. No phantom reads occurred, confirming that MySQL's REPEATABLE READ prevents phantom reads.
6. Summary
MVCC manages the version chain during snapshot reads under READ COMMITTED and REPEATABLE READ isolation levels, allowing concurrent read-write and write-read operations without blocking, thus improving system performance.
The key is the ReadView mechanism. The main difference between the two isolation levels is when a ReadView is generated:
READ COMMITTED: A new ReadView is generated before each regular SELECT.REPEATABLE READ: A ReadView is generated only once, before the first regular SELECT, and reused for subsequent SELECT statements.
Other MySQL Logs
1. Supported Log Types
MySQL supports several log files:
- Slow Query Log: Records all queries taking longer than
long_query_time. Useful for optimization. - General Query Log: Records connection start/end times and all commands sent from clients. Useful for auditing and troubleshooting.
- Error Log: Records issues during startup, operation, or shutdown. Useful for server maintenance.
- Binary Log: Records all data-changing statements. Used for replication and disaster recovery.
- Relay Log: On replicas, temporarily stores the master's binary log contents for replication.
- Data Definition Log: Records DDL metadata operations (new in MySQL 8).
Except for the binary log, all logs are plain text files, created by default in the MySQL data directory.
1.2 Drawbacks of Logging
- Logging can reduce MySQL performance.
- Logs can consume significant disk space.
2. General Query Log
This log records all user operations, including connections, SQL commands, etc. It is helpful for recreating scenarios after data anomalies.
2.1 Check Current Status
SHOW VARIABLES LIKE '%general%';
2.2 Enable Logging
- Permanent:
[mysqld]
general_log=ON
general_log_file=[path/[filename]]
- Temporary:
SET GLOBAL general_log=on;
SET GLOBAL general_log_file='path/filename';
SET GLOBAL general_log=off; -- to disable
SHOW VARIABLES LIKE 'general_log%';
2.3 Disable Logging
- Permanent:
[mysqld]
general_log=OFF
- Temporary:
SET GLOBAL general_log=off;
SHOW VARIABLES LIKE 'general_log%';
3. Error Log
This log is enabled by default and cannot be disabled.
3.1 Enable Error Logging
[mysqld]
log-error=[path/[filename]]
3.2 View Error Log
SHOW VARIABLES LIKE 'log_err%';
3.3 Delete / Flush Error Log
# Replace with empty file
install -omysql -gmysql -m0644 /dev/null /var/log/mysqld.log
# Flush logs
mysqladmin -uroot -p flush-logs
4. Binary Log (binlog)
4.1 Default Settings
SHOW VARIABLES LIKE '%log_bin%';
4.2 Configure Binary Log
- Permanent:
[mysqld]
log-bin=atguigu-bin
binlog_expire_logs_seconds=600
max_binlog_size=100M
To use a specific directory:
[mysqld]
log-bin="/var/lib/mysql/binlog/atguigu-bin"
Make sure the directory is owned by the MySQL user:
chown -R -v mysql:mysql binlog
- Temporary (session-level):
SET sql_log_bin=0;
Note: sql_log_bin is a session variable, not a global variable.
4.3 View Binary Log
# View with SQL statements
mysqlbinlog -v "/var/lib/mysql/binlog/atguigu-bin.000002"
# Suppress base64-encoded row events
mysqlbinlog -v --base64-output=DECODE-ROWS "/var/lib/mysql/binlog/atguigu-bin.000002"
More options:
mysqlbinlog --no-defaults --help
# View last 100 lines
mysqlbinlog --no-defaults --base64-output=decode-rows -vv atguigu-bin.000002 | tail -100
# Search by position
mysqlbinlog --no-defaults --base64-output=decode-rows -vv atguigu-bin.000002 | grep -A20 '4939002'
To display binlog events as a table:
SHOW BINLOG EVENTS [IN 'log_name'] [FROM pos] [LIMIT [offset,] row_count];
Parameters:
IN 'log_name': Specifies the binlog file (default is the first file).FROM pos: Starting position.LIMIT [offset]: Offset (default 0).row_count: Number of rows (default all).
Example:
SHOW BINLOG EVENTS IN 'atguigu-bin.000002';
4.4 Restore Data Using Binary Log
Syntax:
mysqlbinlog [option] filename | mysql -uuser -ppass
Where filename is the log file. Important options:
--start-dateand--stop-date: Specify the time range.--start-positionand--stop-position: Specify the position range.
Note: Restore smaller-numbered binlog files first.
4.5 Delete Binary Logs
Delete logs before a specific file or date:
PURGE {MASTER | BINARY} LOGS TO 'specified_log_filename';
PURGE {MASTER | BINARY} LOGS BEFORE 'specified_date';
5. Binary Log In-Depth
5.1 Write Mechanism
During transaction execution, logs are written to a binlog cache. At commit, the cache is flushed to the binlog file. Since a transaction's binlog must be written as one unit, each thread is allocated a block of memory for the cache.
![image-20220405163025361]
The timing of write and fsync is controlled by the sync_binlog parameter (default 0).
sync_binlog=0: Onlywriteon commit; the system decides when tofsync. This improves performance but risks losing binlog in the page cache on crash.
![image-20220405163125180]
sync_binlog=1:fsyncon every commit, similar to redo log flush.
![image-20220405163205364]
sync_binlog=N(N > 1):writeon every commit, butfsynconly after N transactions. This is a compromise between performance and safety. On crash, the last N transactions may be lost.
5.2 Binary Log vs. Redo Log
- Redo log: A physical log that records "what was changed on which data page." It is generated by the InnoDB storage engine.
- Binlog: A logical log that records the original SQL logic (e.g., "add 1 to column c of row id=2"). It is generated by the MySQL Server layer.
- Both provide persistence, but for different purposes:
- Redo log enables crash recovery for InnoDB.
- Binlog ensures data consistency in MySQL cluster architectures.
5.3 Two-Phase Commit
When updating data, both redo log and binlog are written. Redo log can be written incrementally during transaction execution, while binlog is written only at commit. This difference in write timing can cause inconsistency between the two logs. To solve this, InnoDB uses a two-phase commit protocol.
![image-20220405163716222]
If a crash occurs after binlog write but before commit, the two-phase commit ensures consistency:
![image-20220405163902977]
If a crash occurs after prepare but before commit, the transaction is not automatically rolled back. MySQL checks whether the binlog is complete; if the transaction ID matches, the transaction is committed.
![image-20220405163927129]
The system checks if the redo log (in prepare state) has a matching binlog. If yes, the transaction is considered complete and committed.
6. Relay Log
6.1 Introduction
Relay logs exist only on replica servers in a master-slave architecture. The replica reads the binary log from the master and writes it to its local relay log. Then the replica's SQL thread reads the relay log and applies changes, keeping the replica synchronized.
6.2 Common Recovery Error
If a replica server crashes and its hostname changes, the relay log (which contains the server name) may become unreadable. The solution is to revert the hostname to its original value.
Master-Slave Replication
1. Overview
1.1 Improving Database Concurrency
Applications typically have a "read-more, write-less" pattern. A common solution to handle high read load is a data base cluster with a master-slave architecture and read-write splitting. However, before adopting this, consider simpler approaches first:
- Optimize SQL and indexes.
- Use caching (e.g., Redis) to store hot data in memory.
- Implement master-slave replication with read-write splitting.
1.2 Benefits of Replication
- Read-write splitting: Reduces load on master.
- Data backup: Provides an additional copy of data.
- High availability: Allows failover to a replica if the master fails.
2. Replication Mechanism
2.1 How Replication Works
Replication is based on binlog synchronization and involves three threads:
![image-20220405164559961]
- Binlog dump thread (master): Sends the binary log to replicas.
- I/O thread (replica): Connects to the master, requests binlog updates, and writes them to the relay log.
- SQL thread (replica): Reads the relay log and applies the events to synchronize data.
![image-20220405164718627]
Replication Steps:
- The master records write operations in its binary log.
- The replica copies the master's binary log events to its relay log.
- The replica executes events from the relay log, applying changes.
MySQL replication is asynchronous and serialized. After a restart, replication continues from the last checkpoint.
Main Problem: Replication delay (lag).
2.2 Replication Rules
- Each replica has one master.
- Each replica must have a unique server ID.
- A master can have multiple replicas.
3. Data Consistency Issues
Requirements:
- Read and write data are eventually consistent.
- Writes must go to the write (master) database.
- Reads may or may not go to the read (replica) database.
3.1 Understanding Replication Lag
Since binlog is transferred over the network, there is always some delay, often around 500 ms. This can cause the replica to serve stale data, leading to inconsistency.
3.2 Causes of Replication Lag
With a stable network, the transmission time is negligible. The main contributor to lag is the time difference between when the replica receives the binlog and when it finishes executing the transaction. This happens when the replica's SQL thread consumes relay log events slower than the master produces binlog events. Reasons include:
- Lower replica machine performance.
- High replica load (e.g., queries).
- Large transactions.
3.3 Reducing Replication Lag
- Reduce concurrent large transactions and optimize business logic.
- Optimize SQL, avoid slow queries, and batch operations in smaller chunks (e.g., update-sleep loops).
- Upgrade replica hardware to reduce the performance gap.
- Minimize network distance between master and replica and increase bandwidth.
- For critical real-time reads, route them to the master; use replicas for backup and read-heavy workloads.
3.4 Solving Consistency Issues with Replication Modes
Different replication modes offer a trade-off between consistency and performance:
Method 1: Asynchronous Replication
![image-20220405165455998]
Method 2: Semi-Synchronous Replication
![image-20220405165513025]
Method 3: Group Replication
Multiple nodes form a replication group. For write transactions, the group must reach consensus (majority, N/2 + 1) before commit. Read transactions can commit without consensus.
![image-20220405165650425]
Database Backup and Recovery
1. Physical vs. Logical Backup
- Physical backup: Backs up data files directly. Faster recovery but more disk space. Example tool:
xtrabackup. - Logical backup: Exports database objects (e.g., SQL statements). Slower recovery but more flexible and smaller space. Example tool:
mysqldump.
2. Logical Backup with mysqldump
2.1 Backup a Single Database
mysqldump -u username -h hostname -p database_name [table1 table2 ...] > backup_file.sql
Examples:
mysqldump -uroot -p atguigu > atguigu.sql
mysqldump -uroot -p atguigudb1 > /var/lib/mysql/atguigu.sql
2.2 Backup All Databases
mysqldump -uroot -p --all-databases > all_database.sql
mysqldump -uroot -p -A > all_database.sql
2.3 Backup Specific Databases
mysqldump -u user -h host -p --databases db1 [db2 ...] > backup.sql
Example:
mysqldump -uroot -p --databases atguigu atguigu12 > two_database.sql
mysqldump -uroot -p -B atguigu atguigu12 > two_database.sql
2.4 Backup Specific Tables
mysqldump -u user -h host -p database_name table1 [table2 ...] > backup.sql
Examples:
mysqldump -uroot -p atguigu book > book.sql
mysqldump -uroot -p atguigu book account > 2_tables_bak.sql
2.5 Backup Part of a Table
mysqldump -uroot -p atguigu student --where="id < 10" > student_part_id10_low_bak.sql
2.6 Exclude Specific Tables
mysqldump -uroot -p atguigu --ignore-table=atguigu.student > no_stu_bak.sql
2.7 Backup Only Structure or Only Data
- Only structure:
mysqldump -uroot -p atguigu --no-data > atguigu_no_data_bak.sql
- Only data:
mysqldump -uroot -p atguigu --no-create-info > atguigu_no_create_info_bak.sql
2.8 Include Stored Procedures, Functions, Events
mysqldump -uroot -p -R -E --databases atguigu > fun_atguigu_bak.sql
3. Restoring Data with mysql
Syntax:
mysql -u root -p [database_name] < backup.sql
3.1 Restore a Single Database from Its Backup
If the backup file includes CREATE DATABASE statements:
mysql -uroot -p < atguigu.sql
If not:
mysql -uroot -p atguigu4 < atguigu.sql
3.2 Full Backup Restore
mysql -u root -p < all.sql
3.3 Restore a Single Database from a Full Backup
Extract the relevant part using sed:
sed -n '/^-- Current Database: `atguigu`/,/^-- Current Database: `/p' all_database.sql > atguigu.sql
Then import atguigu.sql to restore the single database.
3.4 Restore a Single Table from a Database Backup
Extract the table structure and data separately:
cat atguigu.sql | sed -e '/./{H;$!d;}' -e 'x;/CREATE TABLE `class`/!d;q' > class_structure.sql
cat atguigu.sql | grep --ignore-case 'insert into `class`' > class_data.sql
Then import them:
USE atguigu;
SOURCE class_structure.sql;
SOURCE class_data.sql;
4. Exporting and Importing Tables
4.1 Exporting Tables
4.1.1 Using SELECT ... INTO OUTFILE
SHOW GLOBAL VARIABLES LIKE '%secure%';
SELECT * FROM account INTO OUTFILE "/var/lib/mysql-files/account.txt";
4.1.2 Using mysqldump to Export as Text File
mysqldump -uroot -p -T "/var/lib/mysql-files/" atguigu account
# With custom delimiters
mysqldump -uroot -p -T "/var/lib/mysql-files/" atguigu account --fields-terminated-by=',' --fields-optionally-enclosed-by='"'
4.1.3 Using mysql to Export as Text File
mysql -uroot -p --execute="SELECT * FROM account;" atguigu > "/var/lib/mysql-files/account.txt"
4.2 Importing Tables
4.2.1 Using LOAD DATA INFILE
LOAD DATA INFILE '/var/lib/mysql-files/account_0.txt' INTO TABLE atguigu.account;
-- With delimiters
LOAD DATA INFILE '/var/lib/mysql-files/account_1.txt' INTO TABLE atguigu.account FIELDS TERMINATED BY ',' ENCLOSED BY '"';
4.2.2 Using mysqlimport
mysqlimport -uroot -p atguigu '/var/lib/mysql-files/account.txt' --fields-terminated-by=',' --fields-optionally-enclosed-by='"'