Redis (REmote Dicsionary Service) is an open-source, in-memory data structure store used as a database, cache, and message broker. Unlike traditional databases, Redis stores data primarily in memory, enabling extremely high throughput (often exceeding 100,000 operations per second) and low latency access.
While it is commonly recognized for its role as a caching layer to alleviate database load, Redis distinguishes itself from other in-memory stores like Memcached through several key features:
- Rich Data Types: Support for Strings, Lists, Sets, Sorted Sets, Hashes, and more.
- Persistence: Mechanisms like RDB and AOF to save state to disk.
- Advanced Features: Built-in support for transactions, pub/sub messaging, Lua scripting, and keys with time-to-live (TTL).
- Replication and Clustering: Native support for master-slave replication and distributed partitioning.
Redis Strings: The Foundation
The String is the most fundamental data type in Redis. It can hold various forms of data, including text strings, integers, floating-point values, and binary data (such as images or serialized objects). Redis Strings are binary safe, meaning they can contain any data type without issues related to specific character encodings or null terminators.
Common Operations
Basic operations involve setting and retrieving values. Redis provides a rich set of commands for manipulating strings:
# Basic set and get
SET user:1001 "John Doe"
GET user:1001
# Batch operations (atomic)
MSET page:index "home" page:title "Welcome"
MGET page:index page:title
# String manipulation
APPEND user:1001 " - Admin"
GETRANGE user:1001 0 3
STRLEN user:1001
Numeric Operations
If a string value can be interpreted as a number, Redis allows for atomic increment and decrement operations. This is highly useful for counters and ID generation.
# Incrementing values
SET view_count 10
INCR view_count # Returns 11
INCRBY view_count 5 # Returns 16
# Decrementing values
DECR view_count # Returns 15
DECRBY view_count 5 # Returns 10
# Floating point operations
SET temp 36.5
INCRBYFLOAT temp 0.2 # Returns 36.7
Distributed Locking
Strings can be used to implement simple distributed locks using the SET command with the NX (only set if not exists) and EX (set expiration) options. This ensures the operation is atomic, avoiding race conditions where a lock is set but expiration fails to apply.
# Set a lock with a 10-second expiry, only if the key does not exist
SET lock:resource "unique_token" EX 10 NX
Internal Implementation
To understand how Redis handles data, we must look at its core structures: redisDb, dictEntry, and redisObject.
Key-Value Storage Model
At the top level, Redis organizes data into databases (DB 0 to DB 15 by default). Each database is represented by a redisDb structure, which contains a dictionary (dict). This dictionary uses a hash table implementation to store all key-value pairs. Each entry in the hash table is a dictEntry, which holds the key and a pointer to the value.
The redisObject Wrapper
While keys are always stored as Simple Dynamic Strings (SDS), values are wrapped in a structure called redisObject. This wrapper allows Redis to manage type information, reference counting for memory management, and encoding details.
// Simplified definition from server.h
typedef struct redisObject {
unsigned type:4; // Data type (String, List, Hash, etc.)
unsigned encoding:4; // Internal encoding (raw, embstr, int, etc.)
unsigned lru:LRU_BITS; // LRU time for eviction policies
int refcount; // Reference count for object sharing and cleanup
void *ptr; // Pointer to the actual data structure
} robj;
The encoding field is crucial because the same data type can be stored in different underlying representations depending on the content, optimizing memory usage.
Simple Dynamic Strings (SDS)
Redis does not use standard C strings (null-terminated char arrays) directly. Instead, it implements a custom library called Simple Dynamic Strings (SDS). SDS structures vary slightly by size (sdshdr5, sdshdr8, sdshdr16, etc.) but generally contain the following metadata:
- len: The current length of the string.
- alloc: The total allocated memory size.
- flags: Header type indicator.
- buf[]: The actual byte array storing data.
Why SDS over C Strings?
SDS offers significant advantages over standard C strings:
- O(1) Length Complexity: SDS stores the length explicitly, making
STRLENinstantaneous. C strings require O(N) traversal to count characters. - Binary Safety: C strings rely on the null terminator (
\0) to mark the end. SDS uses thelenproperty, allowing it to store binary data (images, video) containing null characters without being truncated. - Prevention of Buffer Overflow: SDS APIs automatically check available space before writing and expand the memory if necessary, preventing memory corruption.
- Memory Management Optimization: SDS uses space pre-allocation and lazy freeing strategies to reduce the number of
reallocsystem calls required during frequent string modifications.
Internal Encodings for Strings
The String type utilizes three distinct internal encodings to balance performance and memory usage:
- INT: Used when the value is an integer within the range of a long (2^63-1). Redis stores the integer directly inside the
void *ptrfieldd of theredisObject(saving memory by avoiding a pointer to a separate structure). - EMBSTR (Embedded String): Used for strings shorter than 44 bytes. In this mode, the
redisObjectand the SDS structure are allocated in a single contiguous memory block. This reduces memory fragmentation and allocation overhead. - RAW: Used for strings longer than 44 bytes. Redis allocates the
redisObjectand the SDS separately, requiring two memory allocations and pointer dereferencing.
Encoding Transitions
Redis dynamically converts between these encodings based on operations:
- An
INTencoded object converts toRAWif a non-integer value is appended to it. - An
EMBSTRobject is effectively read-only. Any modification (e.g.,APPEND) will convert it to aRAWencoding, as the contiguous memory layout cannot be easily reallocated.
SET small_val "hello" # Encoding: embstr
APPEND small_val " world" # Encoding: raw (modified)
SET large_val [string > 44 bytes] # Encoding: raw
SET count 100 # Encoding: int
These transitions are one-way (from compact memory representations to larger ones) to ensure data consistency and safety during modifications.
Application Scenarios
The flexibility of String types enables various use cases:
- Caching: Storing serialized objects (JSON, XML) or HTML fragments to accelerate read-heavy applications.
- Counters: Using
INCRandDECRfor real-time statistics like video views, likes, or article reads. - Distributed Session: Centralizing user session tokens in a shared Redis instance accessible by multiple application servers.
- Distributed Locks: Implementing mutual exclusion across multiple service instances using the
SET ... NXpattern. - Global ID Generation: Generating unique identifiers using atomic increment operations.
Handling Complex Objects
When storing complex objects (like a user profile), developers have two primary choices:
- JSON Serialization: Store the entire object as a single String key. This is simple but inefficient if you only need to update specific fields (requires reading, modifying, and writing the whole object).
- Key Naming Convention: Use a structured key approach to separate fields.
# Storing user fields separately
MSET user:1001:name "Alice" user:1001:email "alice@example.com" user:1001:age 30
MGET user:1001:name user:1001:email