Redis Data Types: In-Depth Analysis of Strings and Internal Implementation

Redis (REmote Dicsionary Service) is an open-source, in-memory data structure store used as a database, cache, and message broker. Unlike traditional databases, Redis stores data primarily in memory, enabling extremely high throughput (often exceeding 100,000 operations per second) and low latency access.

While it is commonly recognized for its role as a caching layer to alleviate database load, Redis distinguishes itself from other in-memory stores like Memcached through several key features:

Rich Data Types: Support for Strings, Lists, Sets, Sorted Sets, Hashes, and more.
Persistence: Mechanisms like RDB and AOF to save state to disk.
Advanced Features: Built-in support for transactions, pub/sub messaging, Lua scripting, and keys with time-to-live (TTL).
Replication and Clustering: Native support for master-slave replication and distributed partitioning.

Redis Strings: The Foundation

The String is the most fundamental data type in Redis. It can hold various forms of data, including text strings, integers, floating-point values, and binary data (such as images or serialized objects). Redis Strings are binary safe, meaning they can contain any data type without issues related to specific character encodings or null terminators.

Common Operations

Basic operations involve setting and retrieving values. Redis provides a rich set of commands for manipulating strings:

# Basic set and get
SET user:1001 "John Doe"
GET user:1001

# Batch operations (atomic)
MSET page:index "home" page:title "Welcome"
MGET page:index page:title

# String manipulation
APPEND user:1001 " - Admin"
GETRANGE user:1001 0 3
STRLEN user:1001

Numeric Operations

If a string value can be interpreted as a number, Redis allows for atomic increment and decrement operations. This is highly useful for counters and ID generation.

# Incrementing values
SET view_count 10
INCR view_count      # Returns 11
INCRBY view_count 5  # Returns 16

# Decrementing values
DECR view_count      # Returns 15
DECRBY view_count 5  # Returns 10

# Floating point operations
SET temp 36.5
INCRBYFLOAT temp 0.2 # Returns 36.7

Distributed Locking

Strings can be used to implement simple distributed locks using the SET command with the NX (only set if not exists) and EX (set expiration) options. This ensures the operation is atomic, avoiding race conditions where a lock is set but expiration fails to apply.

# Set a lock with a 10-second expiry, only if the key does not exist
SET lock:resource "unique_token" EX 10 NX

Internal Implementation

To understand how Redis handles data, we must look at its core structures: redisDb, dictEntry, and redisObject.

Key-Value Storage Model

At the top level, Redis organizes data into databases (DB 0 to DB 15 by default). Each database is represented by a redisDb structure, which contains a dictionary (dict). This dictionary uses a hash table implementation to store all key-value pairs. Each entry in the hash table is a dictEntry, which holds the key and a pointer to the value.

The redisObject Wrapper

While keys are always stored as Simple Dynamic Strings (SDS), values are wrapped in a structure called redisObject. This wrapper allows Redis to manage type information, reference counting for memory management, and encoding details.

// Simplified definition from server.h
typedef struct redisObject {
    unsigned type:4;        // Data type (String, List, Hash, etc.)
    unsigned encoding:4;    // Internal encoding (raw, embstr, int, etc.)
    unsigned lru:LRU_BITS;  // LRU time for eviction policies
    int refcount;           // Reference count for object sharing and cleanup
    void *ptr;              // Pointer to the actual data structure
} robj;

The encoding field is crucial because the same data type can be stored in different underlying representations depending on the content, optimizing memory usage.

Simple Dynamic Strings (SDS)

Redis does not use standard C strings (null-terminated char arrays) directly. Instead, it implements a custom library called Simple Dynamic Strings (SDS). SDS structures vary slightly by size (sdshdr5, sdshdr8, sdshdr16, etc.) but generally contain the following metadata:

len: The current length of the string.
alloc: The total allocated memory size.
flags: Header type indicator.
buf[]: The actual byte array storing data.

Why SDS over C Strings?

SDS offers significant advantages over standard C strings:

O(1) Length Complexity: SDS stores the length explicitly, making STRLEN instantaneous. C strings require O(N) traversal to count characters.
Binary Safety: C strings rely on the null terminator (\0) to mark the end. SDS uses the len property, allowing it to store binary data (images, video) containing null characters without being truncated.
Prevention of Buffer Overflow: SDS APIs automatically check available space before writing and expand the memory if necessary, preventing memory corruption.
Memory Management Optimization: SDS uses space pre-allocation and lazy freeing strategies to reduce the number of realloc system calls required during frequent string modifications.

Internal Encodings for Strings

The String type utilizes three distinct internal encodings to balance performance and memory usage:

INT: Used when the value is an integer within the range of a long (2^63-1). Redis stores the integer directly inside the void *ptr fieldd of the redisObject (saving memory by avoiding a pointer to a separate structure).
EMBSTR (Embedded String): Used for strings shorter than 44 bytes. In this mode, the redisObject and the SDS structure are allocated in a single contiguous memory block. This reduces memory fragmentation and allocation overhead.
RAW: Used for strings longer than 44 bytes. Redis allocates the redisObject and the SDS separately, requiring two memory allocations and pointer dereferencing.

Encoding Transitions

Redis dynamically converts between these encodings based on operations:

An INT encoded object converts to RAW if a non-integer value is appended to it.
An EMBSTR object is effectively read-only. Any modification (e.g., APPEND) will convert it to a RAW encoding, as the contiguous memory layout cannot be easily reallocated.

SET small_val "hello"     # Encoding: embstr
APPEND small_val " world" # Encoding: raw (modified)
SET large_val [string > 44 bytes] # Encoding: raw
SET count 100             # Encoding: int

These transitions are one-way (from compact memory representations to larger ones) to ensure data consistency and safety during modifications.

Application Scenarios

The flexibility of String types enables various use cases:

Caching: Storing serialized objects (JSON, XML) or HTML fragments to accelerate read-heavy applications.
Counters: Using INCR and DECR for real-time statistics like video views, likes, or article reads.
Distributed Session: Centralizing user session tokens in a shared Redis instance accessible by multiple application servers.
Distributed Locks: Implementing mutual exclusion across multiple service instances using the SET ... NX pattern.
Global ID Generation: Generating unique identifiers using atomic increment operations.

Handling Complex Objects

When storing complex objects (like a user profile), developers have two primary choices:

JSON Serialization: Store the entire object as a single String key. This is simple but inefficient if you only need to update specific fields (requires reading, modifying, and writing the whole object).
Key Naming Convention: Use a structured key approach to separate fields.

# Storing user fields separately
MSET user:1001:name "Alice" user:1001:email "alice@example.com" user:1001:age 30
MGET user:1001:name user:1001:email

Tags: Redis Data Structures string SDS In-Memory Database

Posted on Sat, 23 May 2026 22:14:59 +0000 by Toy

Freaks City