Core Differences Among Cache Instability Issues
| Issue Type | Primary Definition | Typical Trigger | Immediate Consequence |
|---|---|---|---|
| Cache Penetration | Querying for a non-existent identifier, bypassing both cache and database | Malicious scrapers querying invalid IDs, application parameter errors | Database overwhelmed by fruitless queries, connection pool exhaustion |
| Cache Stampede (Breakdown) | A high-traffic identifier expires suddenly, causing a massive direct hit to the database | Flash sale item TTL ending, trending leaderboard cache being evicted | Database spike in single-point load, potential timeout or crash |
| Cache Avalanche | Mass simultaneous TTL expirations or complete cache service failure | Identifiers configured with identical durations, Redis cluster outage | Database crushed under total request volume, system-wide outage |
Cache Penetration: Queries for Non-Existent Data
Mechanism
A client requests data using a key that neither exists in the cache nor the underlying database. Since the cache finds nothing to store, every subsequent request for that same invalid key bypasses the cache layer and strikes the database directly. High-frequency malicious requests can rapidly deplete database resources.
Solutions
Approach 1: Request Validation
Filter out illegitimate queries before they reach the data retrieval layers. Reject requests with malformed identifiers, negative numbers, or impossible formats.
public Account fetchAccountById(Long accountId) {
// Discard requests with fundamentally invalid identifiers
if (accountId == null || accountId <= 0 || accountId > 999999999) {
return null;
}
// Proceed to standard data access logic...
}Approach 2: Empty Result Caching
When a database query yields no result, persist a placeholder value in the cache with a brief expiration window. This prevents the database from being repeatedly queried for the same missing entity.
public Account fetchAccountById(Long accountId) {
String lookupKey = "acct:" + accountId;
String storedData = distributedCache.get(lookupKey);
// Recognize the placeholder for a previously confirmed missing record
if ("BLANK".equals(storedData)) {
return null;
}
if (storedData != null) {
return deserialize(storedData, Account.class);
}
// Cache miss, query the database
Account record = dbRepository.findById(accountId);
if (record == null) {
// Persist a short-lived placeholder to block repeated database hits
distributedCache.set(lookupKey, "BLANK", 300, TimeUnit.SECONDS);
return null;
}
// Persist the valid record with a standard expiration
distributedCache.set(lookupKey, serialize(record), 1800, TimeUnit.SECONDS);
return record;
}Approach 3: Bloom Filters
For systems managing vast datasets, a Bloom filter acts as a probabilistic pre-check. All valid identifiers are hashed into the filter during initialization. If the filter asserts an identifier is absent, the request is immediately rejected without touching the cache or database.
private BloomFilter<Long> accountPresenceFilter = BloomFilter.create(
Funnels.longFunnel(), 500_000, 0.005
);
@PostConstruct
public void initializeFilter() {
dbRepository.streamAllIds().forEach(accountPresenceFilter::put);
}
public Account fetchAccountById(Long accountId) {
// Instant rejection if the identifier is definitively absent
if (!accountPresenceFilter.mightContain(accountId)) {
return null;
}
// Proceed to standard data access logic...
}Approach 4: Rate Limiting and Circuit Breaking
Deploy throttling mechanisms (e.g., Sentinel, Resilience4j) at the API gateway or service layer. When request volumes exceed defined thresholds or database latency spikes, automatically sever the request chain to protect downstream resources.
Cache Stampede: Sudden Hot-Data Expiration
Mechanism
A disproportionately popular data point (e.g., a flash sale item, a viral post) reaches its time-to-live (TTL) and is evicted. The massive concurrent traffic that previously hit the cache instantly redirects to the database, creating a sudden, extreme load spike that can incapacitate the storage layer.
Solutions
Approach 1: Logical Eternal TTL
Configure critical, high-traffic keys without a physical expiration. Instead, manage freshness asynchronously; a background task or application trigger updates the cache upon source data modification, ensuring the cache never empties passively.
Approach 2: Distributed Mutex Locking
When a high-traffic cache entry vanishes, the first request to detect the miss acquires a distributed lock (e.g., via Redis SETNX). That single thread queries the database and repopulates the cache. Concurrent requests that fail to acquire the lock simply wait, retrying the cache read after a brief interval rather than storming the database.
public Product fetchFeaturedProduct(Long productId) {
String dataKey = "featured_prod:" + productId;
String mutexKey = "mutex:featured_prod:" + productId;
String payload = distributedCache.get(dataKey);
if (payload != null) {
return deserialize(payload, Product.class);
}
// Attempt to acquire an exclusive rebuild lock
boolean acquired = distributedCache.setIfAbsent(mutexKey, "locked", 10, TimeUnit.SECONDS);
if (acquired) {
try {
Product entity = dbRepository.findById(productId);
if (entity != null) {
distributedCache.set(dataKey, serialize(entity), 3600, TimeUnit.SECONDS);
}
return entity;
} finally {
distributedCache.delete(mutexKey);
}
} else {
// Wait briefly for the lock holder to finish, then retry cache access
Thread.sleep(100);
return fetchFeaturedProduct(productId);
}
}Approach 3: Cache Pre-warming
Proactively refresh high-value keys before they actually expire. Scheduled tasks can monitor approaching TTLs and reload data into the cache during off-peak periods, preventing any expiration gap.
Approach 4: Multi-Level Caching
Introduce an in-process local cache (e.g., Caffeine, Guava Cache) as a secondary barrier. Requests evaluate the local cache first, the distributed cache second, and the database last. Even if the distributed cache fails, the local cache absorbs a significant portion of the strain.
Cache Avalanche: Mass Expiration or Service Failure
Mechanism
An avalanche occurs in two primary scenarios: a large batch of keys expiring simultaneously due to identical TTL configurations, or a total collapse of the distributed cache infrastructure. The resulting deluge of unbuffered queries overwhelms the database, potentially causing cascading system failures.
Solutions
Scenario 1: Mass Simultaneous Expiration
Prevent keys from expiring in unison by injecting randomized jitter into their TTLs. A base expiration duration combined with a random offset ensures keys evaporate gradually rather than all at once.
int coreTtl = 1800;
int jitter = ThreadLocalRandom.current().nextInt(600);
distributedCache.set(dataKey, payload, coreTtl + jitter, TimeUnit.SECONDS);Scenario 2: Cache Infrastructure Failure
- High Availability Topologies: Deploy cache clusters using Sentinel or Redis Cluster architectures to eliminate single points of failure.
- Circuit Breaking and Degradation: Monitor cache health continuously. If the cache layer becomes unreachable, trigger circuit breakers to immediately halt downstream requests, returning graceful fallback responses instead of hammering the database.
- Request Throttling: Enforce global rate limits so the database only processes a sustainable volume of direct queries during a cache outage.
- Multi-Level Caching: Rely on local in-process caches to serve stale data temporarily while the distributed system recovers.
public Account fetchAccountById(Long accountId) {
try {
String payload = distributedCache.get("acct:" + accountId);
if (payload != null) {
return deserialize(payload, Account.class);
}
} catch (CacheUnavailableException ex) {
log.warn("Distributed cache unreachable, falling back to local", ex);
return localCache.retrieve(accountId);
}
// Fallback to database if cache is empty but reachable
return dbRepository.findById(accountId);
}Scenario 3: Database Layer Hardening
As an ultimate safeguard, configure the database itself to withstand bursts:
- Implement read replicas to distribute query load.
- Enforce strict connection pool limits to prevent resource exhaustion.
- Utilize internal database query caches (e.g., MyBatis secondary caching).
Core Strategy Comparison
| Issue Type | Root Cause | Primary Defense | Fallback Defense |
|---|---|---|---|
| Cache Penetration | Querying absent identifiers | Empty result caching, Bloom filters | Input validation, Rate limiting |
| Cache Stampede | High-traffic key expiration | Distributed mutex locks | Logical eternal TTL, Pre-warming |
| Cache Avalanche | Mass TTL expiration, Cache outage | TTL jittering, HA clustering | Circuit breaking, Database throttling |