Optimizing High-Volume Financial Reconciliation Systems

Developing a robust financial reconciliation tool requires addressing significant performance bottlenecks associated with large datasets. Initial attempts often involve direct comparison between uploaded files and database records. However, when data volume exceeds standard limits, both backend services and database connections struggle to maintain stability.

Initial Architecture and Memory Constraints

The first approach involved loading file data directly into backend memory or a Redis cache using a key-value structure, where the reconciliation key served as the map key. Database records were fetched in batches and compared against the cached set. Matches were removed from the cache, leaving discrepancies for display.

Challenges encountered included:

  1. File Size Limits: Uplaoding massive files caused timeouts. Manual segmentation was required, limiting chunks to approximately 200,000 records or 10MB per file to prevent connection drops.
  2. Requirement Ambiguity: Development proceeded while specifications were still evolving, necessitating a flexible design capable of accommodating changing details.
  3. Backend Memory Pressure: Holding tens of thousands of records in application memory proved unstable. Offloading to Redis mitigated database load but introduced latency during the read/write phase.

Database-Centric Comparison Strategy

To avoid real-time timeout issues, the process was shifted to persist imported data into temporary database tables first. Differences were identified using SQL joins.

SELECT * FROM source_table AS A
LEFT JOIN target_table AS B ON A.ref_id = B.ref_id
WHERE B.ref_id IS NULL
UNION
SELECT * FROM source_table AS A
RIGHT JOIN target_table AS B ON A.ref_id = B.ref_id
WHERE A.ref_id IS NULL

While logically sound, this method degraded performance when tables reached millions of rows. Pagination applied to the result set did not reduce the cost of the join operation itself.

Asynchronous Batch Processing

The final optimization involved decoupling the reconciliation from the user request cycle. A reconciliation task is created with specific parameters, and the system calculates the required batches based on total record count. A scheduled job processes these batches asynchronously.

Instead of comparing full records, the system extracts key identifiers into a list. These identifiers are used in an IN clause against indexed columns in the main tables. A status flag is updated upon successful matching. Proper indexing on comparison columns is critical to prevent full table scans. Once all batches are processed, the task status is marked as complete.

<update id="markReconciledRecords">
    UPDATE core_transactions AS txn
    INNER JOIN imported_records AS bill ON txn.ref_id = bill.ref_id
    SET txn.verified_flag = 1, bill.verified_flag = 1
    WHERE txn.ref_id IN
    <foreach item="id" collection="referenceIds" open="(" separator="," close=")">
        #{id, jdbcType=VARCHAR}
    </foreach>
    AND txn.process_date BETWEEN #{startDate, jdbcType=INTEGER} AND #{endDate, jdbcType=INTEGER}
    AND txn.channel_code = #{channelId, jdbcType=TINYINT}
    AND txn.type_code = #{transType, jdbcType=TINYINT}
</update>

Future Integration Improvements

Manual file uploads introduce unnecessary latency and human error. A more robust solution involves integrating directly with banking APIs. Scheduled tasks can pull transaction data automatically, defaulting to the previous business day. Parameters for credit or debit flows can be passed dynamically, allowing the same interface to support both automated back-end jobs and manual triggers via the UI for specific date ranges.

Tags: finance reconciliation performance optimization java sql

Posted on Fri, 08 May 2026 04:45:20 +0000 by jcanker