From Static Tables to Continuous Streams: The Evolution of Streaming SQL
Modern data architectures are shifting from batch-oriented processing to real-time analysis. In traditional systems, data is stored in static tables and queried at a specific point in time. However, in today’s data-driven landscape, information is generated continuously by sensors, logs, and transactions. To handle this, engineers are moving be ...
Posted on Sun, 24 May 2026 20:00:35 +0000 by firemankurt
Setting Up a Flink Cluster in Standalone and YARN Modes
Configuring TaskManager Hostnames
Each TaskManager must be configured with its respective hostname in flink-conf.yaml:
taskmanager.host: hadoop103
On another node:
taskmanager.host: hadoop104
Starting and Stopping a Standalone Cluster
From the JobManager node (hadoop102):
# Start cluster
bin/start-cluster.sh
# Stop cluster
bin/stop-cluster.s ...
Posted on Wed, 20 May 2026 05:09:43 +0000 by quark76
Key Features and Enhancements in Apache Flink 1.14 to 1.17
Apache Flink 1.14.0 Highlights
Core Features
Checkpointing for Bounded Streams.
Mixed DataStream and Table/SQL Applications in Batch Execution Mode.
Introduction of the Hybrid Source for seamless reading across multiple sources.
Buffer Debloating to minimize checkpoint latency.
Fine-Grained Resource Management for dynamic Slot sizing.
New Puls ...
Posted on Thu, 07 May 2026 19:17:24 +0000 by kade119
Apache Flink Checkpoint Configuration Guide
Prerequisites
Exactly Once Processing
For exactly once semantics to work properly:
Source systems: Must support data retransmission (e.g., message queues like Kafka, distributed file systems like HDFS)
Sink systems: Must support idempotent operations (e.g., Doris supports deduplication)
At Least Once Processing
For at least once semantics:
S ...
Posted on Thu, 07 May 2026 06:14:41 +0000 by smpdawg