Engineering-Grade Serial Protocol Parser: Resolving FrameStick and FrameSplit in Embedded Systems

Serial communication remains a cornerstone in embedded and IoT systems due to its simplicity, reliability, and hardware-level support. However, developers new to serial protocols often face a persistent challenge: even when the sender transmits data in structured packets, the receiver frequently ingests incomplete, concatenated, or fragmented data streams. This phenomenon — known as frame stick (multiple frames merged) or frame split (a single frame fragmented across multiple reads) — is inherent to the byte-stream nature of serial transport.

1. Root Causes of Frame Stick and Frame Split

Unlike network protocols where the transport layer guarantees packet boundaries, serial communication exposes the underlying byte stream directly to the application. Without explicit framing, there is no built-in mechanism to delimit individual messages.

  • Frame stick occurs when two or more frames are received in a single read, e.g., 0xAA 0x01 0x02 0x55 0xAA 0x03 0x04 0x55 appears as one continuous chunk. Common in high-frequency transmission or delayed read loops.
  • Frame split happens when a single frame spans multiple reads — e.g., 0xAA 0x01 in one event, 0x02 0x55 in the next. Prevalent at low baud rates or with large payloads.

Contributing factors include:

  • Hardware FIFO buffering behavior
  • OS scheduling granularity
  • Asynchronous sender/receiver timing
  • Absence of application-layer framing rules

In a real-world project involving multi-sensor environmental monitoring, unhandled framing led to misaligned readings: humidity values were overwriten by adjacent sensor IDs, and temperature readings drift due to payload misassociation.


2. Framing Strategies for Robust Parsing

A reliable protocol parser mustUnblock on well-defined frame boundaries. Below are three battle-tested framing schemes, each suited to specific distribution and complexity trade-offs.

2.1 Fixed-Length Frames

Best for constrained, static-message systems (e.g., periodic sensor telemetry with known structure).

Frame layout:

  • 1-byte sync header (0xAA)
  • 2-byte sensor ID
  • 2-byte temperature (scaled ×10)
  • 2-byte humidity (scaled ×10)
  • 1-byte checksum
  • 1-byte footer (0x55)

Total fixed size: 9 bytes

Implementation sketch (C++-style parser logic):

struct Frame {
    uint8_t  sync;     // 0xAA
    uint16_t id;
    int16_t  temp;
    int16_t  hum;
    uint8_t  cksum;
    uint8_t  end;      // 0x55
} __attribute__((packed));

Optional<Frame> parseFrame(Span<const uint8_t>& buffer) {
    if (buffer.size() < sizeof(Frame)) 
        return std::nullopt; // Incomplete (half-frame)

    Frame f;
    std::memcpy(&f, buffer.data(), sizeof(Frame));

    if (f.sync != 0xAA || f.end != 0x55) {
        buffer.remove_prefix(1); // Discard erroneous byte and retry
        return std::nullopt;
    }

    uint8_t calcCksum = f.sync ^ f.id ^ (f.id >> 8) ^ f.temp ^ (f.temp >> 8) ^ 
                        f.hum ^ (f.hum >> 8);
    if (calcCksum != f.cksum) {
        buffer.remove_prefix(1);
        return std::nullopt;
    }

    buffer.remove_prefix(sizeof(Frame));
    return f;
}

Pros: Minimal CPU overhead, deterministic parsing, ideal for 8-bit MCUs. Cons: Inflexible; padding wastes bandwidth for variable-length payloads.


2.2 Length-Prefix Framing

Ideal for command/response or variable-length telemetry (e.g., firmware update packets, embedded logs).

Frame layout:

  • 2-byte sync: 0xAA 0x55
  • 1-byte length L (bytes following the length field, excluding checksum)
  • L-byte payload
  • 2-byte CRC16-CCITT

Example frame: 0xAA 0x55 0x03 0x12 0x34 0x56 0x78 0x9A

Design considerations:

  • L must encode only payload length (no sync, length, or CRC纳入)
  • Cap L at a safe upper bound (e.g., 255 → 1 KB internal buffer)
  • Use big-endian explicitly for cross-endianness compatibility

Parser skeleton (state machine):

class LengthPrefixParser:
    HEADER = bytes([0xAA, 0x55])
    MIN_LEN = len(HEADER) + 1  # sync + length
    MAX_PAYLOAD = 255

    def __init__(self):
        self._buf = bytearray()
        self._expected_len = None
        self._payload = bytearray()

    def push(self, data: bytes) -> list:
        self._buf.extend(data)
        frames = []
        while len(self._buf) >= self.MIN_LEN:
            if self._expected_len is None:
                # Try to locate header
                if self._buf[:2] != self.HEADER:
                    self._buf.pop(0)  # Discard non-sync bytes
                    continue
                if len(self._buf) < 3: 
                    break  # Wait until length byte arrives
                self._expected_len = self._buf[2]
                if self._expected_len > self.MAX_PAYLOAD:
                    raise ValueError("Frame length overflow")
                self._buf = self._buf[3:]
            # Now expecting payload + CRC
            needed = self._expected_len + 2  # payload + 2-byte CRC
            if len(self._buf) < needed:
                break  # Still incomplete
            payload_crc = self._buf[: self._expected_len + 2]
            payload, crc = payload_crc[:-2], payload_crc[-2:]
            if self._verify_crc(payload, crc):
                frames.append(payload)
            self._buf = self._buf[needed:]
            self._expected_len = None
        return frames

Why it’s preferred industrially: Supports variable payloads, decouples frame size from transmission rate, and avoids reliance on end markers (which may appear in data).


2.3 Delimiter-Based Framing (e.g., ASCII/Text Protocols)

Best for debug interfaces, JSON/CLI-style protocols, or human-readable logs.

Example:

$ID=0x10|T=235|H=61#<CR><LF>
$ID=0x11|T=237|H=59#<CR><LF>

  • Start delimiter: $
  • End delimiter: # + optional <CR><LF>
  • Delimiter escape handling: #\#

Typical parsing strategy:

std::vector<std::string> extractLines(BufferView& data) {
    std::vector<std::string> msgs;
    size_t start = 0;
    while (true) {
        auto end = find_span(data, start, '#');
        if (end == String::npos) break;
        auto line = data.substr(start, end - start + 1);
        // Remove CR/LF suffix if any
        while (!line.empty() && (line.back() == '\r' || line.back() == '\n'))
            line.pop_back();
        msgs.push_back(line);
        start = end + 1;
    }
    data.remove_prefix(start);
    return msgs;
}

Caveats: Requires escaping special characters in payloads or limiting character set to avoid ambiguity. Not suitable for binary/raw data unless escaping or base64 is applied.


3. Robust Radar: Error Resilience & Edge Cases

Beyond framing, production-grade parsers must handle:

  • Out-of-sync starts: Header detection with partial matches (e.g., 0xAA alone, then later 0x55)
  • Clock drift in timeouts: Use monotonic timestamps per byte instead of polling intervals
  • Buffer thrashing: Reset state on malformed CRC/frame length to prevent cascading errors
  • Memory pressure: Preflight allocation of bounded ring buffers; never malloc in hot parsing loop

In high-throughput industrial gateways, mixing framing schemes (e.g., protocol header uses length-prefix, payload contains delimited sub-frames) provides both throughput and introspectability.


Tags: serial-communication embedded-systems protocol-parser framing industrial-iot

Posted on Thu, 25 Jun 2026 17:08:10 +0000 by sandbudd