Service Registration and Discovery: From Theory to Practice
1. What is Service Registration and Discovery?
Let's start by understanding what service registration and discovery means.
Service Registration is the process of registering information about a module that provides a specific service (typically its IP address and port) with a common infrastructure component (such as ZooKeeper, Consul, or etcd).
Service Discovery enables newly registered service modules to be discovered by other consumers in a timely manner. This automatic discovery applies to both service additions and removals.
You can think of it as:
// Service Registration
NameServer->register(newServer);
// Service Discovery
NameServer->getAllServer();
Why is this necessary? Before answering that question, let's take a look at the evolution of data request models.
2. Web 1.0 Data Request Model Architecture
In traditional data request architectures, there was no concept of service registration and discovery because the request model was simple enough. Below is a diagram of the traditional service request model:
[Image: Traditional service request model]
Clients directly request a single server, and all business logic resides within that server. This architecture is suitable for small services because it is stable and simple. Server updates and maintenance are also straightforward.
3. Web 2.0 Data Request Model Architecture
As user numbers grow, a single server may not be able to handle the load. This leads to the adoption of load balancing techniques, adding multiple servers to distribute the load. The backend database can also use a master-slave architecture to increase concurrency, as shown in the diagram below:
[Image: Load balanced architecture with master-slave database]
At this stage, service registration and discovery are still not needed because the architecture remains relatively simple and clear. The overall stability can be ensured by continually increasing the number of backend servers. Server updates and maintenance are still manageable.
4. Service Management in the Microservices Era
The need for service registration and discovery arises in the distributed microservices era.
In the microservices era, all services are decomposed into the smallest possible granularity. Instead of having all logic in one monolithic server, the system is split into N service modules based on functionality or domain objects. The advantage is deep decoupling—each module focuses on its own responsibility, enabling rapid iteration. The disadvantages include increased complexity in service management and control, higher manual maintenance difficulty, and potential performance degradation due to network overhead during inter-service calls.
For example, the previous model architecture evolves into something like this in the microservices era:
[Image: Complex microservices architecture with many interconnected services]
Each microservice is independent, potentially running on multiple machines or instances, and they are interconnected in complex ways.
In the diagram above, the single server has been broken into User Service, Order Service, Goods Service, Search Service, etc. Each service may have N machines or instances, and they are interrelated. This complex network significantly increases maintenance difficulty.
Without service registration, how would you manage this complex relationship network? The answer: hardcoding! You would hardcode the IP and port of other modules in your configuration files or even directly in your code. To add or remove a service instance would require notifying all other related services to change their configurations. This leads to repeated updates of configuration files across projects, massive IP modifications at intervals, and machine decommissions—a very painful process.
In the microservices era, with cloud, Kubernetes, and Docker, a service can be created and deployed very frequently. The dependencies of each interface can change dynamically. Maintaining configurations manually is a disaster for both operations and development teams.
Service registration and discovery were invented to solve this problem, automating management and freeing developers and operators from manual work.
5. Service Registration
Let's revisit the microservices example and see how service registration and discovery change the network request model. First, let's look at how service registration works.
[Image: Services registering with a name service cluster]
Every machine or instance belonging to a service registers itself with the name service cluster when it starts. For example, if User Service has 6 Docker instances, each instance registers its information with the name service module upon startup. The same applies to Order Service.
Pseudo-code can represent this as:
// Request a unique name for the User service
UserNameServer = NameServer->apply('User');
// Each of the 6 Docker instances of User Service registers itself
UserServer1 = {ip: 192.178.1.1, port: 3445}
UserNameServer->register(UserServer1);
...
UserServer6 = {ip: 192.178.1.6, port: 3445}
UserNameServer->register(UserServer6);
// Request a unique name for the Order service
OrderNameServer = NameServer->apply('Order');
// Start registration
OrderServer1 = {ip: 192.178.1.1, port: 3446}
OrderNameServer->register(OrderServer1);
// Request a unique name for the Search service
SearchNameServer = NameServer->apply('Search');
// Start registration
SearchServer1 = {ip: 192.178.1.1, port: 3447}
SearchNameServer->register(SearchServer1);
This way, each machine instance completes registration after startup. Different name service software supports various registration methods (HTTP interfaces, RPC, JSON-based configuration tables, etc.), but the result is the same.
6. Service Discovery
After registering the machine instances of each service with the name server, the next step is service discovery.
Let's see how service discovery works:
[Image: A service discovering another service via the registry]
In the diagram above, when the Order service wants to get information about the User service, it sends a request to the registry cluster and receives the relevant information about the User service.
Pseudo-code:
// Service discovery: get the list of User services
list = NameServer->getAllServer('User');
// Content of list
[
{
"ip": "192.178.1.1",
"port": 3445
},
{
"ip": "192.178.1.2",
"port": 3445
},
......
{
"ip": "192.178.1.6",
"port": 3445
}
]
// Service discovery: get the list of Goods services
list = NameServer->getAllServer('Goods');
// Content of list
[
{
"ip": "192.178.1.1",
"port": 3788
},
{
"ip": "192.178.1.2",
"port": 3788
},
......
{
"ip": "192.178.1.4",
"port": 3788
}
]
After obtaining the list of all IPs for the User module, we can use a load balancing algorithm, or simply pick one IP randomly, to make the call.
Some registry software also provides DNS resolution or load balancing functionality, returning a single usable IP directly, so you don't need to implement your own selection logic.
Once we have the service's IP information, we can proceed with the call, as shown in the diagram:
[Image: Client calling a service after discovery]
Different name service software provides different discovery methods. Some require the client to periodically poll an HTTP interface and update its local configuration if changes are detected. Others support automatic service discovery through a real-time sub/pub mechanism—when the subscribed service content is updated, the client's configuration is updated in real-time. RPC is another common method. Although the methods differ, the result is the same.
Service registration and discovery allow us to maintain the IP list of each service dynamically. Modules only need to query the registry for a service's IP list, eliminating hardcoded IPs. This makes service management much easier and truly automates the process!
7. Health Checks
You might think that adding this intermediary is a detour—is it only for dynamically obtaining IP addresses? Certainly not!
Service registration and discovery not only solve the problem of hardcoded IPs and messy management but also provide health checks to manage server availability.
Many name service software packages offer health checks. When a machine in a registered service group goes down or the service process dies, the registry marks that instance as faulty or removes it entirely. This enables automated monitoring and management.
Health checks can be implemented in various ways. For example, a heartbeat request might be sent every few seconds; if the HTTP status code returned is not 200, the service is considered unavailable and is marked accordingly. Another approach is executing a shell script and evaluating the result.
[Image: Health check mechanism with heartbeat]
In the diagram above, heartbeats are used to check health status. If one machine becomes abnormal, we can know its health status when we perform service discovery.
Pseudo-code:
// Service discovery: get the list of User services
list = NameServer->getAllServer('User');
// Content of list
[
{
"ip": "192.178.1.1",
"port": 3445,
"status": "success"
},
{
"ip": "192.178.1.2",
"port": 3445,
"status": "success"
},
......
{
"ip": "192.178.1.6",
"port": 3445,
"status": "error" // Faulty
}
]
We check the status field to verify if a service is available. Some name service software with DNS resolution will automatically exclude problematic machines, so the list you get contains only healthy services.
When a service becomes unavailable, some name service software can also send emails or alerts to notify you promptly. This helps us avoid issues and reduce impact.
When the faulty service is fixed and restarted, the health check will pass, and the machine will be marked as healthy again, allowing service discovery to find it once more.
This completes the lifecycle of service registration and discovery.
8. Challenges of Service Registration and Discovery
From the examples above, we've seen the benefits. But if you were to build your own service registration and discovery software, how hard would it be?
Answer: Very, very hard!
Consider the required features:
- Clustering: The registry must form a cluster to avoid a single point of failure.
- Data Synchronization: Data must be synchronized across the cluster so that registration information is visible everywhere.
- Strong Consistency: Data must be strongly consistent to ensure no conflicts.
- High Concurrency and High Availability: The system must remain usable under heavy load.
- Election Mechanism: A master node must be elected to coordinate operations in a cluster, requiring a fair and stable election algorithm.
- Distribution: With microservices in the cloud, machines may be geographically distributed, requiring reliable communication across environments.
- Ease of Installation: Simplicity of installation and debugging significantly impacts adoption.
Developing a robust registry is a significant challenge.
9. Industry Solutions
Several mature solutions for service registration and discovery exist, including ZooKeeper, Consul, and etcd. They are powerful, secure, stable, and offer high concurrency, high availability, and strong consistency.
Here's a comparison of these three popular options:
[Table: Comparison of ZooKeeper, Consul, and etcd. See original article for details.]
Consul is a newer player, gaining popularity due to its ease of installation, powerful features (health checks, web UI, multi-datacenter support), and convenient HTTP APIs. A notable limitation is its lack of native sub/pub mechanism, requiring clients to poll for changes.
A future article will cover Consul's installation and usage in more detail.
10. A Case Study: L5 (Tencent's Solution)
Tencent has its own internal solution, L5, which is widely used within the company for service discovery.
How does L5 work?
Step 1: Create an SID (Service ID). An SID consists of two numbers, like 13232323:5332323232. This SID acts as the service name for registration and discovery.
Pseudo-code:
// Generic name server
UserNameServer = NameServer->apply('User');
// L5 name server
UserNameServer = L5->apply('User');
// UserNameServer => 13232323:5332323232
Step 2: Service Registration. Register the machine IP and port with the SID. This can be done via an API or through the CL5 platform's UI.
[Image: screenshot of CL5 platform interface for registering services]
Step 3: Service Discovery. In code, use the language-specific L5 extension function to discover a service. L5 also includes built-in load balancing. Instead of returning a full list of IPs, it uses a load balancing algorithm to directly return a single usable IP and port.
Here's a simplified PHP example:
$l5Info = [
'modId' => $modId,
'cmdId' => $cmdId,
];
$ret = L5ApiGetRoute($l5Info, 0.2);
// Obtain IP and port
$ip = $l5Info['hostIp'];
$port = $l5Info['hostPort'];
// Other business logic and reporting omitted
This concludes the overview of service registration and discovery, illustrating its necessity and the productivity gains it provides.
Deep Dive into RPC: Service Registration and Discovery
In previous analyses of RPC principles, we focused mainly on the Client and Server roles. However, a mature service governance framework typically includes a third role: the Registry (or Service Registry). The following diagram illustrates the main responsibilities of a registry.
[Image: Diagram showing the roles of Registry, Server, and Client in an RPC framework]
- Registry: Used by service providers to register remote services and by service consumers to discover them.
- Server (Provider): Exposes backend services and registers its service information with the registry.
- Client (Consumer): Obtains the registration information of remote services from the registry and makes remote procedure calls.
Open-source frameworks like ZooKeeper, Eureka, Consul, and etcd are commonly used to implement registries. Some internet companies also develop their own solutions (e.g., Meituan's MNS, Sina Weibo's Vintage).
This section assumes the reader has a basic understanding of registries and will explore the concept from an abstract perspective.
Abstracting the Registry
Core interfaces from open-source frameworks help us understand the registry at a higher level. For example, consider the following interfaces from the Motan framework:
Service Registration Interface
public interface RegistryService {
// Register a service with the registry
void register(URL url);
// Remove a service from the registry
void unregister(URL url);
// Mark a service as available for clients
void available(URL url);
// Mark a service as unavailable (clients cannot discover it)
void unavailable(URL url);
// Get all registered service URLs
Collection<URL> getRegisteredServiceUrls();
}
Service Discovery Interface
public interface DiscoveryService {
// Subscribe to a service
void subscribe(URL url, NotifyListener listener);
// Unsubscribe from a service
void unsubscribe(URL url, NotifyListener listener);
// Discover service instances
List<URL> discover(URL url);
}
The key methods are RegistryService#register(URL) and DiscoveryService#discover(URL). The URL class is central, encapsulating all necessary information.
public class URL {
private String protocol; // Protocol name (e.g., motan)
private String host;
private int port;
private String path; // Interface name (service path)
private Map<String, String> parameters;
private volatile transient Map<String, Number> numbers;
}
Essentially, a registry provides a storage medium where providers and consumers connect, and the primary stored information is these URL objects. Let's see what a URL actually looks like in practice.
A Practical Look at Registered Information
Using ZooKeeper as an example, let's explore what information is stored and how it persists the URL.
I created a demo RPC service interface com.sinosoft.student.api.DemoApi, exposed its implementation on port 6666 (the provider), and a client on port 6667. Both connect to a local ZooKeeper instance. My local IP is 192.168.150.1.
Using zkCli.sh to connect to ZooKeeper:
[zk: localhost:2181(CONNECTED) 1] ls /motan/demo_group/com.sinosoft.student.api.DemoApi
> [client, server, unavailableServer]
ZooKeeper uses a hierarchical namespace. /motan/demo_group/com.sinosoft.student.api.DemoApi has the structure /framework_identifier/group_name/interface_name. The client, server, and unavailableServer nodes are key to registration and discovery.
Let's examine their contents.
Server node:
[zk: localhost:2181(CONNECTED) 2] ls /motan/demo_group/com.sinosoft.student.api.DemoApi/server
> [192.168.150.1:6666]
[zk: localhost:2181(CONNECTED) 3] get /motan/demo_group/com.sinosoft.student.api.DemoApi/server/192.168.150.1:6666
> motan://192.168.150.1:6666/com.sinosoft.student.api.DemoApi?serialization=hessian2&protocol=motan&isDefault=true&maxContentLength=1548576&shareChannel=true&refreshTimestamp=1515122649835&id=motanServerBasicConfig&nodeType=service&export=motan:6666&requestTimeout=9000000&accessLog=false&group=demo_group&
Client node:
[zk: localhost:2181(CONNECTED) 4] ls /motan/demo_group/com.sinosoft.student.api.DemoApi/client
> [192.168.150.1]
[zk: localhost:2181(CONNECTED) 5] get /motan/demo_group/com.sinosoft.student.api.DemoApi/client/192.168.150.1
> motan://192.168.150.1:0/com.sinosoft.student.api.DemoApi?singleton=true&maxContentLength=1548576&check=false&nodeType=service&version=1.0&throwException=true&accessLog=false&serialization=hessian2&retries=0&protocol=motan&isDefault=true&refreshTimestamp=1515122631758&id=motanClientBasicConfig&requestTimeout=9000&group=demo_group&
unavailableServer is a transitional node and typically contains no data under normal circumstances. Its purpose is explained below.
From this data, we can see that one responsibility of the registry is storing service call information. The provider registers on the server node, while the consumer, using the same interface path, accesses the same location to discover providers. The consumer also registers itself under the client node.
Detailed Explanation of Registered Information
Server Node
The server node is crucial. It is created by the service provider and consumed by service consumers to locate the provider. In my demo, only one instance exists (192.168.150.1:6666). Getting the node reveals detailed information:
motan://192.168.150.1:6666/com.sinosoft.student.api.DemoApi?serialization=hessian2&protocol=motan&isDefault=true&maxContentLength=1548576&shareChannel=true&refreshTimestamp=1515122649835&id=motanServerBasicConfig&nodeType=service&export=motan:6666&requestTimeout=9000000&accessLog=false&group=demo_group&
The value looks like a URL with the scheme motan://. It contains all the information needed for a client to call the service:
serialization: Serialization method (e.g., hessian2).protocol: Communication protocol (e.g., motan).maxContentLength: Maximum message body size.shareChannel: Transport layer parameter.group: Service group for isolation.
Client Node
When using ZooKeeper in Motan, the client registers itself upon subscribing to a service. This is primarily for management and statistics but is not strictly necessary (e.g., Consul doesn't do this).
motan://192.168.150.1:0/com.sinosoft.student.api.DemoApi?singleton=true&maxContentLength=1548576&check=false&nodeType=service&version=1.0&throwException=true&accessLog=false&serialization=hessian2&retries=0&protocol=motan&isDefault=true&refreshTimestamp=1515122631758&id=motanClientBasicConfig&requestTimeout=9000&group=demo_group&
Notice the retries parameter, indicating retry counts, and check, which determines whether to check for the existence of a provider before invoking.
UnavailableServer Node
This node exists for graceful startup and shutdown:
- Graceful startup: The server first registers with
unavailableServer, enters a 'warm-up' phase, and after readiness, moves to theservernode (or changes state) to start accepting traffic. - Graceful shutdown: During maintenance, the server first stops accepting new traffic (e.g., by moving to
unavailableServeror disabling heartbeats), allowing clients to drain existing requests before the server is finally shut down.
Detecting Service Downtime
Registration is active; deregistration is more complex, especially in failure scenarios.
Ephemeral Nodes + Long Connections
ZooKeeper offers ephemeral nodes that are automatically deleted when the client session ends. This is ideal for service registration—if a provider crashes or network is lost, the ephemeral node disappears, allowing clients to detect the change immediately. This is widely adopted.
Active Deregistration + Heartbeat
Some registries (like Eureka) don't have ephemeral nodes. Instead, they rely on active deregistration: the client sends a notification to the server when shutting down. To handle crashes, a heartbeat mechanism is used. The registry periodically sends heartbeats to registered providers. If a provider fails to respond (e.g., after 3 missed heartbeats), it is considered dead and removed from the registry.
Registry Comparison
[Table comparing Consul, ZooKeeper, etcd, and Eureka. See original article for details. Key points include:
- Health checking mechanisms
- Consistency models (CAP theorem)
- Multi-datacenter support
- APIs (HTTP, DNS, gRPC)
- Watch/long polling support
- Integration with Spring Cloud
- Security]
Different registries suit different scenarios. ZooKeeper is a distributed coordination service, not designed purely for service discovery. Some argue its strong consistency (CP in CAP) is overkill for registration, as eventual consistency is often sufficient. Consul excels in multi-datacenter environments. Eureka is AP (prioritizing availability), and is the default in Spring Cloud.
Summary
A registry decouples service callers from service locations, a fundamental challenge in distributed systems. This article has covered the core concepts, practical examples, and key considerations for choosing a registry.