Distributed Cron Scheduling with ElasticJob: Sharding, High Availability, and Runtime Rescaling

Consider a typical microservice deployment where an order processing service runs across multiple instances. A common requirement is to daily scan for successfully processed orders from the previous day and notify downstream analytics services.

A naive implemantation using Spring’s @Scheduled annotation leads to duplicate execution in production:

@Scheduled(cron = "0 0 10 * * ?")
public void sendOrder() {
    List<Order> orders = orderRepository.findSuccessOrdersFromYesterday();
    orders.forEach(this::notifyAnalyticsService);
}

In a two-instance setup, both nodes trigger this job at 10:00 AM — resulting in double notifications and potential data inconsistency.

A basic distributed lock approach (e.g., Redis SETNX or ZooKeeper ephemeral nodes) ensures only one instance executes the logic:

@Scheduled(cron = "0 0 10 * * ?")
public void sendOrder() {
    if (redisLock.tryLock("order-job-lock", Duration.ofMinutes(5))) {
        try {
            notifyAnalyticsService(orderRepository.findSuccessOrdersFromYesterday());
        } finally {
            redisLock.unlock();
        }
    }
}

This solves duplication but introduces a bottleneck: all 1 million orders are handled by a single node, underutilizing available capacity.

ElasticJob addresses both concerns via job sharding, enabling horizontal scaling of scheduled work. It partitions the dataset across running instances dynamically — without modifying business logic.

Here's a minimal working configuration using elasticjob-lite-spring-boot-starter:

pom.xml

<dependency>
    <groupId>org.apache.shardingsphere.elasticjob</groupId>
    <artifactId>elasticjob-lite-spring-boot-starter</artifactId>
    <version>3.0.1</version>
</dependency>

Custom Job Implementtaion

@Component
public class OrderNotificationJob implements SimpleJob {

    @Override
    public void execute(ShardingContext context) {
        int shardIndex = context.getShardingItem();
        int totalShards = context.getShardingTotalCount();
        String shardParam = context.getShardingParameter();

        // Partition logic: process orders where id % totalShards == shardIndex
        List<Order> batch = orderRepository.findSuccessOrdersFromYesterday(
            shardIndex, 
            totalShards
        );
        batch.forEach(this::notifyAnalyticsService);
    }
}

application.yml

elasticjob:
  regcenter:
    serverlists: 127.0.0.1:2181
    namespace: elasticjob-demo
  jobs:
    orderJob:
      elasticJobClass: com.example.OrderNotificationJob
      cron: "0 0 10 * * ?"
      shardingTotalCount: 2
      shardingItemParameters: "0=shard-0,1=shard-1"

When deployed on two instances, ElasticJob automatically assigns each node a unique shardingItem: one receives 0, the other 1. Each uses its shard index to compute wich subset of the full dataset it processes — e.g., orders with id % 2 == 0 vs id % 2 == 1.

Scaling to three instances requires only changing shardingTotalCount: 3 and updating shardingItemParameters to "0=shard-0,1=shard-1,2=shard-2". ElasticJob detects the new topology via ZooKeeper’s /instances ephemeral nodes and triggers automatic rebalancing through its leader-elected /leader/sharding coordination path.

The framework relies on ZooKeeper for:

  • Instance discovery: /instances/{ip:pid} (ephemeral)
  • Shard assignment: /sharding/{item}/instance
  • Leader election: /leader/election
  • Runtime configuration: /config (modifiable at runtime)

Rebalancing occurs when:

  • New instances join or existing ones exit
  • Shard count changes
  • Leader fails over

Internally, ShardingService.shardingIfNecessary() runs on the elected leader only. It reads current /instances, computes optimal shard distribution, and writes assignments to /sharding/{n}/instance. Non-leader instances skip re-sharding and simply read their assigned shard index before executing.

This design ensures:

  • Exactly-once execution per logical shard
  • Automatic failover (if instance A dies, its shards migrate to surviving nodes)
  • Zero-downtime rescaling (change config → ZooKeeper notifies all nodes → next execution uses new topology)
  • No shared database locks or custom coordination code

For large datasets like 1M orders, sharding distributes load linearly: N instances handle ~1/N of the workload — improving throughput while maintaining correctness.

Tags: distributed-systems scheduling elasticjob ZooKeeper microservices

Posted on Mon, 01 Jun 2026 16:08:38 +0000 by phillyrob