Today we're going to explore an interesting load balancing algorithm from Dubbo's source code. While I encountered this in Dubbo's implementation, it's not exclusive to Dubbo—it's a general algorithm concept. You can find Go implementations in go-zero as well.
The official documentation provides two diagrams illustrating these strategies.
When service provider machines have balanced configurations with similar processing capabilities, the P2C algorithm shows significently better throughput. However, when machines have varying configurations—some powerful, some weaker—the adaptive strategy performs notably better.
This article focuses on the adaptive strategy.
Demo
Let's start by building a demonstration. Dubbo's website now offers an Initializer module that makes setting up a demo extremely quick compared to manual SpringBoot integration.
Select Dubbo version 3.2.0 since adaptive load balancing support was introduced in this version.
I created a Single Module project with the following structure after downloading: one provider and one consumer. The consumer implements CommandLineRunner to automatically trigger a service call after startup.
Start the project directly after dependencies are loaded. The console will output from the Consumer class, completing a Dubbo RPC call.
To debug the service invocation process, place breakpoints in the service provider implementation class and examine the call stack.
But have you noticed something strange? We didn't start a registry center and still completed a Dubbo call. This is because the demo uses the injvm protocol for local calls.
For this demo, we need to modify it to use remote cals since we're debugging load balancing strategies. Add scope="remote" to the service reference configuration and set up a registry center in the configuration file. Start Zookeeper locally and use ZooInspector to verify the service provider running on port 20880.
Similarly, configure providers on ports 20881 and 20882, resulting in three service providers total.
You might wonder why three is the minimum? With only one provider, load balancing isn't necessary. With two providers, adaptive load balancing doesn't make sense. Why?
Looking at the description: adaptive load balancing uses P2C algorithm to select the node with the lowest load among two randomly chosen nodes. P2C requires randomly selecting two nodes—if you only have two nodes, there's no selection involved. Therefore, testing adaptive load balancing requires at least three service provider nodes, though more is better.
Source Code
To enable the adaptive load balancing algorithm, specify it in the service reference configuration and set breakpoints at the appropriate location:
org.apache.dubbo.rpc.cluster.loadbalance.AdaptiveLoadBalance#doSelect
The method selectByP2C, which handles the initial selection logic, contains the core P2C implementation. The invokers parameter size of 3 represents our three service providers on ports 20880, 20881, and 20882.
The core logic in doSelect has two main parts. The first part handles random selection:
int pos1 = ThreadLocalRandom.current().nextInt(length);
int pos2 = ThreadLocalRandom.current().nextInt(length - 1);
if (pos2 >= pos1) {
pos2 = pos2 + 1;
}
Let me break down this code. In our demo, length equals the size of invokers, which is 3. The variables pos1 and pos2 represent indices into the invokers List.
The first random call produces pos1 from [0, 3). The second call produces pos2 from [0, 2). If the first random doesn't select 2 (the last index), then the last element cannot be selected in the second random because of the subtraction operation. This means the final element has fewer chances to be selected compared to other elements.
The purpose of the condition is to handle cases where both positions randomly select the same index—the second position gets incremented. Since pos2's maximum value is always the second-to-last element, only pos2 can safely be incremented without risking an out-of-bounds error.
The algorithm is logically sound and correctly implements selecting two random elements. However, the last service provider in the list receives approximately half the selection probability of other elements. This could be problematic if that particular provider happens to be the most powerful server.
An alternative implementation might look like:
Object firstInvoker = invokerList.remove(ThreadLocalRandom.current().nextInt(invokerList.size()-1));
Object secondInvoker = invokerList.remove(ThreadLocalRandom.current().nextInt(invokerList.size()-1));
I notice go-zero uses the same approach as Dubbo here. Moving forward—the code described implements the P2C concept quite simply.
In our demo, positions 2 and 1 are selected, representing two invokers. Now the question becomes: which invoker should handle the request? This logic resides in the chooseLowLoadInvoker method, which selects the lower-loaded invoker. Two critical variables, load1 and load2, are calculated using specific formulas.
When load1 equals load2, meaning both invokers are viable, one is selected based on weight. When they differ, the lower load value wins. The remaining question is how load values are computed.
The implementation in AdaptiveMetrics#getStatus retrieves an AdaptiveMetrics object containing a ConcurrentHashMap structured as "ip:port:method" keys with various fields for load calculation.
One key field is pickTime, used during load calculation. The algorithm subtracts pickTime from the current time; if the difference exceeds twice the timeout value, that invoker is selected directly. For example, with a 5-second timeout, if a server hasn't been selected for over 10 seconds, it returns 0, indicating no load.
The pickTime value is set in doSelect after selectByP2C chooses a server, updating the ConcurrentHashMap. Simultaneously, a startTime is stored in the context representing when the request execution began.
The AdaptiveLoadBalanceFilter class handles response processing:
org.apache.dubbo.rpc.filter.AdaptiveLoadBalanceFilter#onResponse
When a response arrives, onResponse retrieves startTime to calculate the rt (response time) value. It also updates AdaptiveMetrics with consumerSuccess (successful requests) and errorReq (failed requests).
Additional code processes the ADAPTIVE_LOADBALANCE_ATTACHMENT_KEY field. The value being set is "mem,load", indicating the client requests memory and load metrics from the server. The server's ProfilerServerFilter receives this request:
org.apache.dubbo.rpc.filter.ProfilerServerFilter#onResponse
Here, the filter checks if the attachment key exists, performs some processing, and assigns a new StringBuilder containing curTime and load. The implementation doesn't actually process memory metrics or use the load value meaningfully—it only checks for the key's existence. This appears to be a design choice rather than a bug.
After tracing through AdaptiveLoadBalanceFilter#onResponse, the metricsMap contains only curTime, load, and rt. The setProviderMetrics method in AdaptiveMetrics uses these three values to populate other fields:
org.apache.dubbo.rpc.AdaptiveMetrics#setProviderMetrics
The final two lines implement a mathematical formula: Vt = β * Vt-1 + (1 - β ) * θt
This is the exponentially weighted moving average (EWMA) technique, which estimates local mean values and allows updates to depend on historical data over a period.
Returning to the load calculation logic in AdaptiveMetrics#getLoad, the final computation multiplies providerCPULoad by the square root of ewma plus one, then by inflight plus one, dividing by a weighted success rate factor plus one. The providerCPULoad derives from calculations in ProfilerServerFilter's onResponse method.
The ewma value updates in setProviderMetrics using lastLatency, which is calculated as the response time in ProfilerServerFilter. The inflight counter tracks active requests on each provider, while consumerSuccess and consumerReq record successful and total invocations respectively. Weight comes from the provider's configuration.
Each service provider receives a computed load value—lower values indicate lighter loads and make better targets for routing requests. Since load recalculates continuously, it reflects the server's actual capacity at any moment. The load balancing logic uses these values to determine which provider should handle incoming requests, which is why this approach qualifies as adaptive load balancing.
One might wonder why not calculate load for every provider and select the one with minimum load instead. The answer lies in complexity: random selection of two providers takes constant time, whereas evaluating all providers scales linearly with their count. P2C maintains predictable performance characteristics.
Adaptive Throttling
The adaptive load balancing discussion focused on the client side deciding which provider receives requests. But providers need agency too. When all providers carry high load values, P2C might still route requests to overloaded machines, which could degrade performance. Providers should have the ability to reject requests when approaching capacity limits.
The challenge lies in determining when to reject—ideally a dynamic threshold that adapts to service capabilities. Dubbo implements adaptive throttling to address this need.
In theory, service providers have finite processing capacity. When requests surge, unprocessed requests accumulate, causing overload. This creates cascading problems: requests wait longer, potentially crashing the entire service, and sustained overload risks server failure.
This topic involves considerable complexity. The corresponding pull request remains open. For those interested in deeper explorasion, this competition provides additional context on the implementation.