Advanced Service Design: Beyond Best Practices

Introduction

In today's rapidly evolving technology landscape, service design is not just about following common design specifications and best practices. It goes deeper into ensuring that services can flexibly adapt to future changes and meet user expectations while adhering to these standards. This article aims to explore the key factors to consider in the service design process, beyond following common design specifications. We hope that through this series of shares, we can discuss together how to make API designs easy to integrate, easy to understand, and easy to extend.

System Overview

JD Enterprise Business VOP (Smart Procurement): Provides JD's supply chain capabilities to customers in the form of APIs. VOP offers hundreds of standard API services, providing underlying capabilities for thousands of customers to build their own procurement malls.

Service Design Tips

1. Service Path and Module

  • The service path should start with a unified first-level path after the domain name, such as domain/api/. The benefit of this is that all API requests are centralized under a clear path, simplifying the implementation of security controls. For example, at the Nginx or API Gateway level, specific security policies such as authentication, authorization, logging, rate limiting, and monitoring can be applied to this specific path, thereby enhancing API security.
  • Beyond that is the business grouping, for example, orders /api/order, products /api/product, etc. This makes it clear to the caller which business module the interface belongs to.
  • Also consider the granularity of each service. For example, should the product price be a separate service, or should it be coupled and returned directly within the product details interface? Should order logistics information be included in the order details service? When considering this, the current business scenario should be the foundation. Taking VOP as an example, because there are customer-level product price change messages, after the caller pulls the product price change message, they can query the price service separately to update their locally stored prices, without having to query the full product information. Therefore, decoupling is better. While ensuring the completeness of business scenarios, provide callers with services that can be flexibly assembled as much as possible.

2. Service Request Method

For most services, AJAX requests should be supported, and a fixed data format (JSON/XML, etc.) should be specified. This facilitates future expansion and supports the majority of customer business processing in the industry. However, for some special scenarios, such as checkout counters, login/logout interfaces, etc., non-AJAX requests are required, and the server side handles subsequent forwarding and redirect processing.

3. Service Input and Output Parameters

  • When defining the input/output parameters of an API service, especially for externally facing services, extensibility must be considered first. This prevents future expansion limitations that might force the addition of new interfaces, leading to redundant development, unclear interfaces, and multiple similar versions of the same interface. Secondly, for the definition and description of each field, common specifications do not need elaboration, but the following should be emphasized:
  • For future extensibility and to shield callers from internal service changes, certain numeric fields should be designed as Long or even directly as strings. This prevents additional rework for callers when internal service changes occur (e.g., when an Integer field is insufficient for storage and needs to be converted to Long).
  • In some cases, a field may need to distinguish between "not set" and "zero value". In this case, consider designing the field as a wrapper type. Thus, null can represent "not set", while specific zero values (like 0, 0.0) have specific meanings.
  • Field selection: Do not expose output fields unnecessarily. Minimize field exposure while satisfying the interface functional requirements. For sensitive fields, such as customer addresses, phone numbers, etc., carefully consider whether to expose the field. If necessary, also consider field masking and encryption/decryption.
  • Field definition and description: Detailed and compliant field definitions and descriptions can reduce the caller's cost of understanding service fields, ensuring data consistency and accuracy. Common descriptions include the following:
    • Field length limitations
    • Range for numeric type fields
    • Consistency of field names with the same meaning across business domains
    • Field format description: For fields with special formats like dates and times, specify their format clearly
    • Enumeration description: For enumeration fields, list all possible enumeration values and explain the meaning of each value
  • Adding new input/output parameters to a published service: If a new input/output parameter is added to an already published service, the impact on the caller must be considered, such as serialization/deserialization, field processing, etc. If the caller uses strict mode when deserializing the output parameters of the interface, the server arbitrarily adding new return fields will cause the caller's deserialization to fail, thus affecting the business. We can add an extension field to the service input parameters, and only when the caller passes specific data, the service adds new output parameters in the response body.
  • Unified output format: External services are required to have a fixed format, clearly indicating processing results, status codes, business data, etc., allowing callers to clearly handle business and exceptions, for example:
{
    "success": true,
    "resultMessage": "success",
    "resultCode": "0000",
    "result": {}
}

4. Suggestions for Business Processing within Services

  • For service logic processing, try to support batch data processing. For the caller, this allows them to focus on the data processing logic rather than how to call the service with high concurrency to process this data, effectively reducing the pressure on the service side. Examples include order/product queries, invoice issuance services, etc.
  • Implement horizontal privilege escalation prevention: Prevent callers from querying data that does not belong to them. The strictness of the specific verification depends on the business. For example, VOP allows different pins under the same contract to query each other's data (orders, invoices, etc.) for most query services.
  • When critical business logic requires multi-threaded processing, try to use a separate thread pool to prevent threads in the pool from being preempted by other business processes and affecting important business.

5. Exception Handling

  • For open services, especially write operations, defining clear exception codes is particularly important. The caller system needs accurate error codes and error messages to automatically handle interface call results: retry, alarm, or ignore directly, business interruption or continuation. We have likely encountered interfaces without any documented error codes. The caller has no idea what error codes or descriptions the interface might throw or return, making it impossible to decide whether these exceptions can be displayed to the customer! (For example, technical error prompts like 'Exception calling XX interface', 'XX data is empty', etc.)
  • We mentioned in point 3 that the output format of the service needs to be unified. Therefore, when a business exception occurs, we need to package the appropriate exception code and exception information to return to the caller. Additionally, we need to pay attention to packaging the output when unknown exceptions occur (null pointers, timeouts calling dependent interfaces, etc.) to prevent directly throwing exceptions to the caller. An interceptor can be used at the outermost layer for fallback output handling of unknown exceptions.

6. Strong and Weak Dependencies of Services

When building a complex external service, it usually depends on multiple external interfaces. These dependencies can be divided into two categories: strong dependencies and weak dependencies. Interfaces with strong dependencies are indispensable parts of the service operation. If such interfaces encounter exceptions, the strategy we usually adopt is circuit breaking, i.e., immediately interrupting the current operation to avoid further error propagation or data inconsistency.

On the other hand, for interfaces with weak dependencies, their impact on the main flow is relatively small, and a certain degree of exceptions or failures can be tolerated. In this case, if a weak dependency interface encounters an exception, we can choose to ignore these exceptions and continue executing the core flow of the service. At the same time, to ensure service stability and problem traceability, we will implement a business alerting mechanism to promptly detect and handle these abnormal situations. This approach aims to ensure the overall availability of the service while ensuring that problems are not ignored without anyone knowing.

7. Service Monitoring and Logging

  • Logging: For an external service (especially one directly facing external customers), in addition to the common technical monitoring in the system, recording input and output parameter logs is a crucial part, especially for various types of write operations involving funds, orders, payments, etc. On one hand, we can monitor which fields are used by each caller in important services, thereby providing risk assessment data for adding, deprecating, or modifying certain fields later. On the other hand, we can analyze business data from the logged input and output parameters, providing data support for subsequent business decisions and business alerts. Furthermore, when data inconsistency between the caller and the service provider leads to serious consequences, these logs are also crucial information for identifying the problem point.
  • Call Data Collection: Statistics and analysis of traffic details for an open platform are important prerequisites for traffic governance and ensuring service security. By analyzing the calls from different callers, we can learn about their call frequency, patterns, and request reasonableness. This allows for effective identification of illegal calls (like fraudulent orders, scraping), and enables us to calculate the average call volume of callers, thereby providing data support for service rate limiting and security protection.

8. Reasonably Setting Degradation Points

When building complex services, especially in the context of distributed systems and microservice architectures, we typically need to rely on many third-party services or data. In this case, service degradation strategies become key to ensuring overall system stability. Service degradation mainly refers to actively reducing or disabling certain functions of some service modules when they encounter performance bottlenecks or failures, in order to ensure the core operation of the entire system.

To effectively implement service degradation, it is necessary to identify potential risk points during development and design, and accordingly divide the services into critical and non-critical modules. Embedding degradation logic at the code level of non-critical modules allows us to perform degradation operations manually or automatically when problems occur, thus ensuring the continuous availability of critical services. This design not only helps improve the robustness of the system but also effectively reduces the impact on users during system failures.

9. Handling Deprecated Services

Every system will have many services that were initially designed simply to meet simple business scenarios. As the business develops, many initially designed services will no longer satisfy new business scenarios. When the original service cannot be extended, we generally add new services (which need to cover the capabilities of the old services). Over time, many deprecated services that are almost never used by customers will be generated, potentially impacting code issue troubleshooting, triggering business scenario vulnerabilities, etc. For such services, we have two approaches:

  • If the logic is compatible, route the old service to the new service.
  • Handle via deprecation: It is not recommended to delete the code directly. Instead, uniformly intercept the paths that need to be deprecated and return a unified error code and prompt message to the caller. Once the interception alert is triggered, or the caller provides feedback, stop the interception first and communicate with the caller about the subsequent processing plan.

10. Trust No Caller and No Dependent Third Party

During service design and programming, maintain a premise of distrust for all dependent third-party services. Meticulously handle their returned data, service timeouts, exception returns, etc. Consider every possible problem point to prevent a problem in one dependent service from affecting the whole.

Similarly, do not trust the caller. Regardless of prior assurances or agreements, it is impossible to guarantee that the caller's behavior will be as expected. Therefore, before the service goes live, we need to consider and build multiple defensive mechanisms. This includes, but is not limited to: implementing effective rate limiting, performing strict input validation, and setting fine-grained permission controls. Through such design, we can not only enhance the stability and security of the service but also ensure our system maintains resilience and toughness when facing unpredictable external factors.

Measures to Ensure Service and Information Security

(1) Sensitive Fields

In B2B scenarios, enterprises have high requirements for user sensitive data, such as phone numbers, usernames, addresses, etc. This requires the service provider to encrypt such data when outputting interface parameters. Encrypting this information can prevent it from being intercepted and read by unauthorized third parties during data transmission, thus protecting the privacy and interests of customers and the enterprise.

Encryption can be implemented at different levels:

  • Transport Layer Encryption: Encrypts the entire communication between the client and server using protocols like SSL/TLS. This means all transmitted data, including sensitive fields, is encrypted, ensuring data security during transmission.
  • Application Layer Encryption: Encrypts specific sensitive fields directly at the application layer before data is sent. This usually involves encoding sensitive data using encryption algorithms (like AES, RSA, etc.). Even if transport layer encryption is already in place, application layer encryption provides an additional layer of security.
  • Database Layer Encryption: Encrypts sensitive fields stored in the database. This method is not widely used and is typically used to store information like passwords.

When encrypting sensitive fields, pay attention to the following:

  • Choose appropriate encryption standards and algorithms that are commonly used and proven in the industry.
  • Manage encryption keys securely. The generation, storage, exchange, and destruction of keys should be robust and reliable.
  • Consider the performance impact. Encryption and decryption operations increase computational complexity, so ensure that service performance is not significantly affected.

(2) System Access Control

Some customers with high system security requirements may request the service provider to intercept illegal callers, preventing unauthorized third parties from using customer identities to make calls after obtaining identity information, thereby causing financial losses or more data leaks. In this situation, we can use IP blacklist/whitelist mechanisms for interception, restricting a certain identity to only be accessed from specified IPs or preventing specified IPs from accessing.

IP Whitelist: Only IP addresses in the whitelist are allowed to access the system or interface. All IP addresses not in the list are rejected. Advantages: Provides a high level of security; at the customer level, only pre-approved IP addresses can access, and other IPs cannot access with that customer identity. Can effectively prevent unauthorized access attempts; in certain specific scenarios, it can even eliminate the need for authorization checks, where the IP itself represents the customer identity. Disadvantages: Management can be cumbersome, especially when the IP addresses of legitimate users change frequently. Not very friendly for users with dynamic IP addresses. Applicable scenarios: Suitable for environments with a limited number of visitors and fixed or stable IP addresses.

IP Blacklist: IP addresses in the blacklist are not allowed to access the system or interface. IP addresses not in the list can access. Advantages: Simple and easy to manage; only known malicious IP addresses need to be listed. Does not affect access for most legitimate users. Disadvantages: Lower security because new unknown attackers can still access the system. Malicious users can bypass blacklist control by changing their IP addresses. Applicable scenarios are few; currently, no VOP customer has chosen this IP verification type. Applicable scenarios: Suitable for highly open environments, or as a supplement to other security measures.

(3) Interface Input/Output Tamper Prevention

Ensure that data sent via the interface has not been modified during transmission. This is critical for ensuring the integrity and security of data for both parties. Common tamper prevention methods include:

a. Use HTTPS

Ensure all data transmission is over HTTPS, encrypting data in transit and reducing the risk of interception and modification.

b. Digital Signatures

Before sending data, the caller signs the data using a digital signature. The receiver can verify the signature using the corresponding public key to ensure the data has not been modified since signing.

Digital signature steps:

The sender uses a private key to sign the data (which could be a hash of the data or parameters concatenated into a string according to rules). The sender sends both the data and the signature to the receiver. Upon receiving the data, the receiver verifies the signature using the sender's public key. If signature verification is successful, the data has not been tampered with; if it fails, the data might have been tampered with during transmission.

c. Message Authentication Code (MAC)

Similar to digital signatures, MAC is a technique to insure message integrity. It uses a key and the message content to generate a MAC value. The receiver generates a MAC value using the same key and compares it with the MAC value provided by the sender.

d. Use API Keys and Timestamps

Combining API keys and timestamps can provide an additional layer of security. Timestamps can prevent replay attacks, and API keys ensure that only authorized users can send requests.

(4) Input/Output Encryption and Decryption

Input/output encryption and decryption refer to the process of encrypting and decrypting data when a client sends a request to the server (input decryption) and when the server returns a response to the client (output encryption). This protects data security during transmission, preventing sensitive information from being stolen or tampered with.

Input and output parameters are required to be entirely encrypted data. This requirement typically comes from banks or state-owned enterprises that place a high value on data security. As an optional item for interface data security, it is not applicable to all customers. Customers who require encrypted data transmission usually specify the encryption method, which could be a standard industry encryption method or an internal custom JAR package.

For the latter, we built an ECI platform that can load encryption SDKs from different customers. Through this platform, the encryption SDK provided by each customer is loaded in isolation, ensuring that a problem with one customer's SDK does not affect all customers. After uploading the SDK and loading it on the platform, an external service interface is provided that supports both encryption and decryption operations at the customer level. We only need to configure the corresponding encryption/decryption method for each customer dimension, and handle the encryption/decryption operations uniformly at the interceptor level, which is completely non-intrusive to the actual business code.

Pseudo-code example:

/**
 * @description Interceptor for encrypting/decrypting input/output parameters for specific customer interfaces.
 */
@Service
public class EncryptInterceptor extends HandlerInterceptorAdapter {
        // Decrypt before request enters, before business processing
    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        if (!(request instanceof DecryptionRequestWrapper)) {
            return true;
        }
        // Get customer authorization info to identify the current customer
        try {
            // Determine if this customer has interface encryption/decryption enabled
            // Organize parameters, get original input, perform decryption operation
            // After decryption, add the decrypted parameters back into the request's parameter map
            // Continue processing
            return true;
        } catch (Throwable throwable) {
            // Exception handling
        }
    }

        // Encrypt before response is returned
    @Override
    public void postHandle(HttpServletRequest request, HttpServletResponse response, Object handler,
            ModelAndView modelAndView) throws Exception {
        
        // Get customer identity identifier
        try {
            // Determine if this customer has interface encryption/decryption enabled
            // Call the original postHandle method to allow response data to be written to the wrapper's output stream
            // Call encryption method to get encrypted data
            // Write the encrypted data back to the response
        } catch (Throwable throwable) {
            // Exception handling
        }
    }
}

Conclusion

Building a robust service and system is not only the ultimate pursuit of technical professionals but also key to maintaining competitiveness in the face of evolving business needs and technical challenges. Beyond adhering to widely recognized design specifications, we must further explore and optimize to ensure our services can not only comprehensively cover various business scenarios but also excel in security, extensibility, scalability, and degradation capability.

This pursuit requires us to deeply consider potential risks and challenges at the design stage and also to continuously evaluate and optimize the service throughout the development process. Such a system will become a solid foundation supporting the continuous growth and innovation of the enterprise business, helping us stay ahead in the ever-changing technology and business environment!

Read this, and evolve your service design!

Tags: Service Design API Design Best Practices System Architecture Security

Posted on Sat, 04 Jul 2026 16:56:57 +0000 by muinej