In database systems, locks serve as a fundamental concurrency control mechanism. However, for in-memory database systems, the lock manager often becomes a performance bottleneck.
Research indicates that 16-25% of time in memory databases is spent interacting with the lock manager
Traditional lock managers typically implement a hash table where data primary keys serve as IDs and linked lists of lock requests for that data serve as values. These linked lists track lock status and support mutual exclusion operations.
For disk-based databases, hash table lookups, lock acquisition, and linked list operations are negligible. But for in-memory databases, these additional memory accesses, cache misses, CPU cycles, and critical sections can approach or exceed the cost of executing actual transaction logic.
VLL introduces two major changes to the traditional lock manager:
- Moving lock information from central hash tables to the raw data itself
- Removing information about which transactions have pending requests for specific locks, replacing it with counters (CX for pending write requests, CS for pending read requests) maintained in the raw data
When transactions acquire locks, they request all locks at once and are ordered globally based on the sequence of lock requests. This global transaction ordering determines which transaction should be unblocked and executed first.
However, VLL is not a silver bullet. Under high contention workloads, it may lead to reduced concurrency and poor CPU utilization. To address this, Selective Contention Analysis (SCA) was introduced to efficiently compute the most useful subset of contention information when needed.
VLL Algorithm Overview
Lock Structure
The VLL lock management structure consists of:
- Counters for pending write (CX) and read (CS) lock requests stored with each raw data item
- A central global transaction request queue (TxnQueue)
Workflow
The process works as follows:
- When a transaction arrives at a partition, it attempts to lock all records it will access during its lifecycle
Each lock request increments the corresponding record's CX or CS counter 2. If after the request CX=1 and CS=0, the requesting transaction has acquired an exclusive lock
If CX=0, the requesting transaction has acquired a shared lock 3. After acquiring locks, the transaction is added to TxnQueue
Both lock requests and queue additions occur within the same critical section 4. When leaving the critical section, if the transaction acquired all requested locks, it's marked as free; otherwise, it's blocked. Only free transactions can execute immediately
Challenges
Several issues arise:
- In single-threaded VLL, waiting for remote node coordination can waste resources
- With out a central lock management structure, how to release blocked transactions?
- If locks can only be acquired when CX=0, but write requests always increment CX, CX will remain >0 when multiple write transactions are blocked
- As TxnQueue grows, the probability of new transactions acquiring locks immediately decreases
To address these:
- Introduce a "waiting" state for transactions that need remote results to continue, allowing them to suspend without resource waste
- Use background threads to check CX and CS values for blocked transactions in TxnQueue. When CX drops to 1 and CS to 0, the transaction has acquired a exclusive lock; when CX is 0 and CS > 0, it has acquired a shared lock
- Unblock and execute the first blocked transaction in the queue. Being at the queue head means all preceding transactions have completed
- Set a threshold for TxnQueue. When exceeded, stop processing new transactions and search for unblockable transactions. Within the threshold, prioritize new requests
Code and Process Demonstration
Here's the implementation:
func startTransaction(tx *Transaction) {
// Initialize transaction status as Free
tx.Status = FreeStatus
// Check local read set for conflicts
for _, key := range tx.ReadSet {
if isLocal(key) {
data[key].readCount++
if data[key].writeCount > 1 {
tx.Status = BlockedStatus
}
}
}
// Check local write set for conflicts
for _, key := range tx.WriteSet {
if isLocal(key) {
data[key].writeCount++
if data[key].readCount > 0 || data[key].writeCount > 1 {
tx.Status = BlockedStatus
}
}
}
// For distributed transactions that are free, mark as waiting
if tx.Type == DistributedTx && tx.Status == FreeStatus {
tx.Status = WaitingStatus
}
}
func commitTransaction(tx *Transaction) {
// Release read locks
for _, key := range tx.ReadSet {
if isLocal(key) {
data[key].readCount--
}
}
// Release write locks
for _, key := range tx.WriteSet {
if isLocal(key) {
data[key].writeCount--
}
}
}
func runVLL() {
for {
// Handle remote transaction messages
tx, received, msg := selectRemoteMessage()
if received {
processReadResult(tx, msg)
if canExecute(tx) {
executeTransaction(tx)
commitTransaction(tx)
transactionQueue.remove(tx)
}
continue
}
// Handle blocked transactions at queue front
if transactionQueue.peek().Status == BlockedStatus {
tx = transactionQueue.peek()
if tx.Type == DistributedTx {
tx.Status = WaitingStatus
// Send requests to other nodes
} else {
// Unblock and execute head transaction
tx.Status = FreeStatus
executeTransaction(tx)
commitTransaction(tx)
transactionQueue.remove(tx)
}
} else if !transactionQueue.isFull() {
// Process new transaction requests when queue isn't full
tx = getNewRequest()
startTransaction(tx)
switch tx.Status {
case FreeStatus:
executeTransaction(tx)
commitTransaction(tx)
case WaitingStatus:
// Send requests to other nodes
transactionQueue.add(tx)
case BlockedStatus:
transactionQueue.add(tx)
}
}
}
}
Consider this scenario with the following transaction requests:
| txn | read set | write set |
|---|---|---|
| A | x, y | x |
| B | x, y | x |
| C | x | |
| D | y | z |
| E | y |
- Initially, all CX and CS counters are 0, and TxnQueue is empty
- Transaction A requests to read x, y and write x. CX for x increments, CS for y increments, and A is added as a free transaction to TxnQueue
- Transaction B requests to read x, y and write x. Since x.CX > 0, B cannot acquire the write lock and is added as a blocked transaction to TxnQueue
- After A completes, it releases locks (x.CX--, y.CS--), is removed from TxnQueue, and B becomes free, acquiring locks to begin execution
- Transaction C requests to read x, incrementing x.CS. Since x.CX > 0, C cannot guarantee the read and is added as a blocked transaction
- Transaction D requests to read y and write z. y.CS++ and z.CX=1. Since y.CX and z.CX were 0, D acquires all locks and is added as a free transaction
- Transaction E requests to write y. Since y.CS > 0, E fails and is added as a blocked transaction
- After B completes, it releases locks and is removed. C, now at the queue front, becomes free and acquires the x read lock
- Finally, E can safely execute as it doesn't conflict with any preceding transactions. However, VLL only allows the queue head to execute, so E still cannot proceed. SCA removes this limitation
Array VLL
Array VLL uses vector or array data structures to store lock information contiguously. This separates data and lock information in to two separate requests, but in single-threaded environments, performance benefits from core caching.
Challenges with Acquiring All Locks at Once
- Transaction read/write sets may be unknown before execution
- With TxnQueue per node and local critical sections, different nodes may process in inconsistent orders, potentially causing deadlocks
Solutions include:
- Allow transactions to perform necessary reads before entering the critical section
- For deadlocks from inconsistent node ordering:
- Allow deadlocks and use detection protocols to abort deadlocked transactions
- Coordinate nodes for consistent transaction ordering
VLL uses node coordination, which improves performance despite added coordination overhead.
SCA - Improving Performance Under High Contention
For high contention and multi-partition workloads, VLL introduces SCA to simulate standard lock managers by detecting which transactions should inherit released locks.
SCA activates only when needed, such as during CPU idle time. It scans from queue head to tail, analyzing conflicts with preceding transactions.
func performSCA(queue TxnQueue) *Transaction {
// Initialize conflict detection arrays
writeConflicts := make([]int8, 100<<10)
readConflicts := make([]int8, 100<<10)
for _, item := range queue {
// Check blocked transactions
if item.Status == BlockedStatus {
canProceed := true
// Check read set for write conflicts
for _, key := range item.ReadSet {
hashed := hashKey(key)
if writeConflicts[hashed] == 1 {
canProceed = false
}
// Mark read set as occupied
readConflicts[hashed] = 1
}
// Check write set for read/write conflicts
for _, key := range item.WriteSet {
hashed := hashKey(key)
if readConflicts[hashed] == 1 || writeConflicts[hashed] == 1 {
canProceed = false
}
// Mark write set as occupied
writeConflicts[hashed] = 1
}
// Return unblocked transaction
if canProceed {
return item
}
} else {
// For unblocked transactions, just mark occupied
for _, key := range item.ReadSet {
hashed := hashKey(key)
readConflicts[hashed] = 1
}
for _, key := range item.WriteSet {
hashed := hashKey(key)
writeConflicts[hashed] = 1
}
}
}
return nil
}
VLLR - VLL for Range Locking
For frequent operations on many consecutive rows in a single transaction, range locking is more efficient than individual row locking. It prevents phantom reads and is particularly useful for deleting entities sharing common key prefixes.
VLLR locks range prefixes. When locking a range key, the range is represented as set R, generating set P where any element in R has a prefix in P. The simplest approach uses the longest common prefix of the minimum and maximum bit strings in R.
After obtaining P, VLLR uses hierarchical locking: intent lock at database level, then table, then page, finally row lock.
Additionally, VLLR introduces LX and LS counters for the P set on top of CX and CS to track pending write and read requests for prefix ranges.
func requestRangeLock(tx *Transaction, range *KeyRange, exclusive bool) bool {
// Add transaction to queue
transactionQueue.add(tx)
// Get range prefixes
prefixes := range.toPrefixes()
// Request prefix locks
success := true
for _, prefix := range prefixes {
success = success && requestPrefixLock(prefix, exclusive)
}
return success
}
func requestPrefixLock(key string, exclusive bool) bool {
success := true
hashed := hashKey(key)
// Check current key lock availability
if exclusive {
success = success && (readCount[hashed] == 0)
}
success = success && (writeCount[hashed] == 0)
// Check prefix lock availability
if exclusive {
success = success && (prefixReadCount[hashed] == 0)
}
success = success && (prefixWriteCount[hashed] == 0)
// Check all prefix locks
for i := 1; i < len(key); i++ {
prefixHash := hashKey(key[:i])
success = success && (writeCount[prefixHash] == 0)
if exclusive {
success = success && (readCount[prefixHash] == 0)
}
}
// Mark current key lock
if exclusive {
writeCount[hashed]++
} else {
readCount[hashed]++
}
// Mark prefix locks
for i := 1; i < len(key); i++ {
prefixHash := hashKey(key[:i])
if exclusive {
prefixWriteCount[prefixHash]++
} else {
prefixReadCount[prefixHash]++
}
}
return success
}
func releaseRangeLock(tx *Transaction, range *KeyRange, exclusive bool) {
// Release prefix locks
prefixes := range.toPrefixes()
for _, prefix := range prefixes {
releasePrefixLock(prefix, exclusive)
}
// Remove from queue
transactionQueue.remove(tx)
}
func releasePrefixLock(key string, exclusive bool) {
// Release current key lock
hashed := hashKey(key)
if exclusive {
writeCount[hashed]--
} else {
readCount[hashed]--
}
// Release prefix locks
for i := 1; i < len(key); i++ {
prefixHash := hashKey(key[:i])
if exclusive {
prefixWriteCount[prefixHash]--
} else {
prefixReadCount[prefixHash]--
}
}
}