Key Features and Enhancements in Apache Flink 1.14 to 1.17

Apache Flink 1.14.0 Highlights

Core Features

Checkpointing for Bounded Streams.
Mixed DataStream and Table/SQL Applications in Batch Execution Mode.
Introduction of the Hybrid Source for seamless reading across multiple sources.
Buffer Debloating to minimize checkpoint latency.
Fine-Grained Resource Management for dynamic Slot sizing.
New Pulsar Connector for the DataStream API.

Unified Stream & Batch Processing

Bounded Stream Checkpointing Enable the creation of checkpoints after some tasks finish and a final checkpoint at stream end to ensure output commits. Activate by setting execution.checkpointing.checkpoints-after-tasks-finish.enabled: true.

Hybrid Source Sequentially reads from multiple sources (e.g., S3 for historical data, then Kafka for new data) as a single logical stream.

Operational Enhancements

Buffer Debloating Automatically adjusts network buffer usage to ensure high throughput while minimizing buffered data for low checkpoint overhead.

Fine-Grained Resource Management Enables dynamic resizing of Slots within a TaskManager.

PyFlink Improvements

Performance gains via chaining of Python functions, similar to Java API operator chaining.
Loopback debug mode allows execution of Python UDFs in the client process, enabling IDE breakpoint debugging.
Support for Yarn Application mode and .tgz Python archives.

Deprecations

Legacy SQL engine and Mesos support were removed.

Apache Flink 1.15 Highlights

Operations and Management

Incremental Savepoints Savepoints become incremental when using native format with RocksDB state backend.

Reactive Mode & Adaptive Scheduler Improved Reactive Mode with working job-level metrics. The Adaptive Scheduler now records failure history and speeds up scale-down operations.

Adaptive Batch Scheduler Automatically determines optimal parallelism for batch job vertices based on their input data volume. Key benefits include ease of use, adaptability to changing data, and per-vertex granularity.

Cross-Source Watermark Alignment Sources using the new Source API can be grouped. If one source's watermark advances too far ahead, its consumption can be paused to align with others.

SQL Version Compatibility Ensures identical SQL queries produce the same topology after a Flink version upgrade, maintaining snapshot compatibility.

Changelog State Backend Continuously uploads state changes to enable shorter end-to-end latency, prdeictable checkpoint intervals, and less recovery work.

Repeatable Cleanup Retries cleanup operations on failure to avoid residual job data.

OpenAPI Support REST API now follows the OpenAPI standard.

Application Mode Improvements Enhanced support for stop-with-savepoint on job termination and improved recovery/cleanup with local state metadata persistence.

Unified Processing

Final Checkpoint Before Job End Enabled by default, ensuring a checkpoint completes before job termination.

Window Table-Valued Functions (TVF) in Batch Window TVFs are now supported in batch execution mode.

SQL Enhancements

Failed CAST operations now throw an error by default instead of returning null.
New JSON processing functions.

Community Contributions

New Elasticsearch Sink based on the latest Sink API, providing async writes and end-to-end consistency.
Flink's Java API can be used with any Scala version (including Scala 3).
PyFlink introduces a "thread" execution mode where Python UDFs run as threads within the JVM.
Support for CSV format and small file compaction.

Apache Flink 1.16 Highlights

Batch Processing

SQL Gateway Extends SQL Client with multi-tenancy and pluggable Endpoints (REST API, HiveServer2). Enables submitting stream/batch/OLAP jobs via standard tools.

Hive Syntax Compatibility Enhanced Hive dialect support. Using HiveServer2 protocol with SQL Gateway auto-registers Hive Catalog, switches dialect, and uses batch mode.

Join Hints Allow users to specify join strategies to override the optimizer's choice.

Adaptive Hash Join Hash Join operators can automatically fallback to Sort-Merge Join at the task level if they fail, improving job stability.

Speculative Execution for Batch Detects slow tasks on problematic machines, blacklists those machines, and launches duplicate task instances elsewhere. The first instance to finish wins.

Hybrid Shuffle (Experimental) Combines advantages of blocking and pipeline shuffle. Allows downstream tasks to start before upstream finishes without requiring simultaneous execution.

Blocking Shuffle Optimizations Improvements include adaptive network buffer allocation, sequential I/O, and result partition reuse for multiple consumers.

Dynamic Partition Pruning Reduces reading of invalid partitions from partitioned tables at runtime based on data from related tables.

Stream Processing

Generalized Incremental Checkpoint Enhances Changelog State Backend with support for state migration, local recovery, file caching, switching from checkpoints, and improved monitoring.

Improved RocksDB Rescaling Uses RocksDB's range deletion for faster rescaling, improving recovery speed 2-10x for high-state operations.

State Backend Monitoring RocksDB logs now reside in Flink's log directory by default. Added DB-level statistics (e.g., block cache hit/miss counts).

Overdraft Buffers Subtask can request extra network buffers (default 5) when backpressured, reducing checkpoint alignment time, especially with Unaligned Checkpoint.

Aligned Checkpoint Timeout Checkpoints start as Aligned (AC) but switch to Unaligned (UC) if the global checkpoint duration exceeds execution.checkpointing.aligned-checkpoint-timeout. Upstream subtasks can now switch to UC independently to propagate barriers.

Streaming Nondeterminism Detection Pre-runtime detection of potential correctness issues in complex streaming jobs, with suggestions for SQL adjustments.

Dimension Table Lookup Enhancements Generic caching, configurable async mode (ALLOW_UNORDERED), and retry mechanisms for delayed dimension data updates.

Async I/O Retry Built-in retry mechanism for Async I/O operations.

PyFlink

"Thread" mode execution is now supported for Python DataStream API and Python Table-Valued Functions, running UDFs via JNI in the JVM.

Other Features

New SQL Syntax

USING JAR for dynamic UDF JAR loading.
CREATE TABLE AS SELECT.
ANALYZE TABLE for manual statistics generation.

DataStream Cache DataStream#cache() caches transformation results in batch mode, useful for ML and interactive Python programming.

History Server & Job Info Enhanced WebUI with detailed execution time metrics, aggregated subtask statistics, environment info, and external log archive browsing.

Protobuf Format Support for Protobuf in Table API/SQL applications.

Configurable RateLimitingStrategy for Async Sink Customizable behavior for async sinks on request failure, defaulting to AIMDScalingStrategy.

Apache Flink 1.17 Highlights

Batch Processing

Speculative Execution Now supports Sink operators. Improved detection of slow tasks.

Adaptive Batch Scheduler Now the default batch scheduler. Configuration simplified; auto-parallelism no longer requires setting global parallelism to -1. Enhanced data distribution and removed power-of-two restriction for derived parallelism.

Hybrid Shuffle Optimizations Supports adaptive batch scheduler and speculative execution. Allows reuse of intermediate data. Improved stability for large-scale deployments.

SQL Client/Gateway SQL Client supports gateway mode to submit queries to SQL Gateway. SQL statements can manage job lifecycle (e.g., show info, stop jobs).

SQL API for Batch New Delete and Update APIs for batch mode, enabling connector implementations for row-level modifications. Extended ALTER TABLE syntax for adding/modifying/dropping columns, primary keys, and watermarks.

Hive Connector Optimizations Automatic file compaction now works in both stream and batch modes for Hive writes, reducing small files. Native Hive aggregate functions (SUM, COUNT, etc.) in HiveModule use hash-based aggregation for performance gains.

TPC-DS & Performance

Dynamic programming join-reorder algorithm (experimental, off by default).
Dynamic local hash aggregation based on data distribution.
Removal of unnecessary virtual function calls for faster execution.

Stream Processing

Streaming SQL Semantic Improvements Experimental PLAN_ADVICE feature detects potential correctness risks (e.g., Non-Deterministic Updates) and performance optimization opportunities, providing specific suggestions.

Enhanced Watermark Alignment Addresses data skew in event-time jobs by aligning data emission across partitions within a Source operator considering watermark boundaries, preventing excessive downstream buffering.

Extended StreamingFileSink Functional enhancements for file-based streaming outputs.

Unaligned Checkpoint (UC) File Management Mitigates the issue of UC creating excessive small files on HDFS. Provides REST API to manually trigger checkpoints with custom types during job runtime.

Upgrades RocksDBStateBackend and Calcite upgraded.

Observability & Performance

Performance regression monitoring via Slack channel reports.
Support for Task-level flame graphs.
Support for generic token mechanisms.

Tags: Apache Flink Stream Processing Batch Processing Release Notes Data Engineering

Posted on Thu, 07 May 2026 19:17:24 +0000 by kade119

Freaks City

Key Features and Enhancements in Apache Flink 1.14 to 1.17

Apache Flink 1.14.0 Highlights

Core Features

Unified Stream & Batch Processing

Operational Enhancements

PyFlink Improvements

Deprecations

Apache Flink 1.15 Highlights

Operations and Management

Unified Processing

SQL Enhancements

Community Contributions

Apache Flink 1.16 Highlights

Batch Processing

Stream Processing

PyFlink

Other Features

Apache Flink 1.17 Highlights

Batch Processing

Stream Processing

Observability & Performance

Hot Tags