Choosing and Optimizing Packet Transmission Protocols Based on Performance Benchmarks

Scenario: In a local network, multiple machines capture packets via their network interfaces and need to synchronize these packets to a single machine.

Original Approach: Use tcpdump -w to write packets into files, then periodically use rsync to transfer them.

Revised Approach: Rewrite the capture and synchronization logic in Go to send captured packets directly over the network to a server, eliminating a unnecessary disk I/O step.

Creating a pcap file is straightforward — it requires writing a pcap header, followed by each packet with associated metadata.

The pcapgo library can help achieve this. The raw packet data is stored in p.buffer[:ci.CaptureLength].

ci := gopacket.CaptureInfo{
	CaptureLength: int(n),
	Length:        int(n),
	Timestamp:     time.Now(),
}
if ci.CaptureLength > len(p.buffer) {
	ci.CaptureLength = len(p.buffer)
}
w.WritePacket(ci, p.buffer[:ci.CaptureLength])

To distinguish packets from different machines, an identifier needs to be included. The structure includes metadata and raw packet data as follows:

// from github.com/google/gopacket
type CaptureInfo struct {
	// Timestamp is the time the packet was captured, if that is known.
	Timestamp time.Time `json:"ts" msgpack:"ts"`
	// CaptureLength is the total number of bytes read off of the wire.
	CaptureLength int `json:"cap_len" msgpack:"cap_len"`
	// Length is the size of the original packet. Should always be >= CaptureLength.
	Length int `json:"len" msgpack:"len"`
	// InterfaceIndex
	InterfaceIndex int `json:"iface_idx" msgpack:"iface_idx"`
}

type CapturePacket struct {
	CaptureInfo
	Id   uint32 `json:"id" msgpack:"id"`
	Data []byte `json:"data" msgpack:"data"`
}

A key question remains: what format should be used to transmit the packet data? JSON, MessagePack, or a custom binary protocol?

JSON and MessagePack have well-defined standards, offer broad compatibility, and reduce bugs due to their simplicity. However, they sacrifice some performance. A custom binary protocol allows for more control by removing unnecessary fields and keys, reducing memory allocations and GC pressure.

Optimizaton strategies for the custom binary protocol:

Represent fixed-size fields like CaptureInfo and Id in a compact byte layout. For example, CaptureLength and Length can be encoded in two bytes, and Id can be represented in one byte if its range is limited.
Memory reuse:
Avoid internal allocation during encoding by writing directly into an external buffer. If the buffer is synchronized, there will be zero allocations.
Similarly, decoding should not allocate memory internally; it should parse metadata and copy the Data slice. If synchronization is used, this also results in zero allocations.
For asynchronous operations, Data slices should be copied where necessary. Use sync.Pool to optimize memory management, using four pools for sizes 128, 1024, 8192, and 65536 bytes.

Key optimizations of sync.Pool:

In asynchronous scenarios, each Packet.Data requires its own memory space and cannot be reused; use sync.Pool to manage this.
Fixed-length buffers for metadata serialization should avoid triggering garbage collection.

func acquirePacketBuf(n int) ([]byte, func()) {
	var (
		buf   []byte
		putfn func()
	)
	if n <= CapturePacketMetaLen+128 {
		smallBuf := smallBufPool.Get().(*[CapturePacketMetaLen + 128]byte)
		buf = smallBuf[:0]
		putfn = func() { smallBufPool.Put(smallBuf) }
	} else if n <= CapturePacketMetaLen+1024 {
		midBuf := midBufPool.Get().(*[CapturePacketMetaLen + 1024]byte)
		buf = midBuf[:0]
		putfn = func() { midBufPool.Put(midBuf) }
	} else if n <= CapturePacketMetaLen+8192 {
		largeBuf := largeBufPool.Get().(*[CapturePacketMetaLen + 8192]byte)
		buf = largeBuf[:0]
		putfn = func() { largeBufPool.Put(largeBuf) }
	} else {
		xlargeBuf := xlargeBufPool.Get().(*[CapturePacketMetaLen + 65536]byte)
		buf = xlargeBuf[:0]
		putfn = func() { xlargeBufPool.Put(xlargeBuf) }
	}
	return buf, putfn
}

func (binaryPack) EncodeTo(p *CapturePacket, w io.Writer) (int, error) {
	buf := metaBufPool.Get().(*[CapturePacketMetaLen]byte)
	defer metaBufPool.Put(buf)

	binary.BigEndian.PutUint64(buf[0:], uint64(p.Timestamp.UnixMicro()))
    ...
	return nm + nd, err
}

Packet Size Comparison (By Tongyi Qianwen)

MethodOriginal Size (bytes)Encoded Size (bytes)Size Increase (bytes)Binary Pack7294+22Binary Pack10241046+22Binary Pack1638416406+22MsgPack72150+78MsgPack10241103+79MsgPack1638416463+79Json Pack72191+119Json Pack10241467+443Json Pack1638421949+5565Json Compress Pack72195+123Json Compress Pack10241114+90Json Compress Pack1638415504-120

Analysis

Binary Pack:
Small packets (72 bytes): 22-byte overhead.
Large packets (16384 bytes): 22-byte overhead.
Efficient overall with minimal extra size.
MsgPack:
Small packets (72 bytes): 78-byte overhead.
Large packets (16384 bytes): 79-byte overhead.
Less efficient for small data but consistent for large.
Json Pack:
Small packets (72 bytes): 119-byte overhead.
Large packets (16384 bytes): 5565-byte overhead.
Inefficient, especially for large data.
Json Compress Pack:
Small packets (72 bytes): 123-byte overhead.
Large packets (16384 bytes): 120-byte overhead.
Better compression for large data, showing reduced overhead.

Performance Benchmarks

JSON

Buffer reuse shows significant improvement, mainly due to reduced memory allocation.

BenchmarkJsonPack/encode#72-20                    17315143        647.1 ns/op         320 B/op      3 allocs/op
BenchmarkJsonPack/encode#1024-20                   4616841         2835 ns/op        1666 B/op      3 allocs/op
BenchmarkJsonPack/encode#16384-20                   365313        34289 ns/op       24754 B/op      3 allocs/op
BenchmarkJsonPack/encode_with_buf#72-20           24820188        447.4 ns/op         128 B/op      2 allocs/op
BenchmarkJsonPack/encode_with_buf#1024-20         13139395        910.6 ns/op         128 B/op      2 allocs/op
BenchmarkJsonPack/encode_with_buf#16384-20         1414260         8472 ns/op         128 B/op      2 allocs/op
BenchmarkJsonPack/decode#72-20                     8699952         1364 ns/op         304 B/op      8 allocs/op
BenchmarkJsonPack/decode#1024-20                   2103712         5605 ns/op        1384 B/op      8 allocs/op
BenchmarkJsonPack/decode#16384-20                   159140        73101 ns/op       18664 B/op      8 allocs/op

MessagePack

Similar benefits from buffer reuse. Performance divergence around 1024 bytes, with MessagePack outperforming for larger payloads and keeping memory usage stable.

BenchmarkMsgPack/encode#72-20                     10466427         1199 ns/op         688 B/op      8 allocs/op
BenchmarkMsgPack/encode#1024-20                    6599528         2132 ns/op        1585 B/op      8 allocs/op
BenchmarkMsgPack/encode#16384-20                   1478127         8806 ns/op       18879 B/op      8 allocs/op
BenchmarkMsgPack/encode_with_buf#72-20            26677507        388.2 ns/op         192 B/op      4 allocs/op
BenchmarkMsgPack/encode_with_buf#1024-20          31426809        400.2 ns/op         192 B/op      4 allocs/op
BenchmarkMsgPack/encode_with_buf#16384-20         22588560        494.5 ns/op         192 B/op      4 allocs/op
BenchmarkMsgPack/decode#72-20                     19894509        654.2 ns/op         280 B/op     10 allocs/op
BenchmarkMsgPack/decode#1024-20                   18211321        664.0 ns/op         280 B/op     10 allocs/op
BenchmarkMsgPack/decode#16384-20                  13755824        769.1 ns/op         280 B/op     10 allocs/op

JSON Compression

In a LAN environment, bandwidth is usually not a concern, making compression benchmarks less relevant.

BenchmarkJsonCompressPack/encode#72-20               19934       709224 ns/op     1208429 B/op     26 allocs/op
BenchmarkJsonCompressPack/encode#1024-20             17577       766349 ns/op     1212782 B/op     26 allocs/op
BenchmarkJsonCompressPack/encode#16384-20            11757       860371 ns/op     1253975 B/op     25 allocs/op
BenchmarkJsonCompressPack/decode#72-20              490164        28972 ns/op       42048 B/op     15 allocs/op
BenchmarkJsonCompressPack/decode#1024-20            187113        71612 ns/op       47640 B/op     23 allocs/op
BenchmarkJsonCompressPack/decode#16384-20            35790       346580 ns/op      173352 B/op     30 allocs/op

Custom Binary Protocol

After memory reuse, serialization and deserialization show marked performance improvements. In synchronous contexts, zero allocations are achievable. In asynchronous cases, using sync.Pool ensures fixed-size memory allocation.

BenchmarkBinaryPack/encode#72-20                  72744334        187.1 ns/op         144 B/op      2 allocs/op
BenchmarkBinaryPack/encode#1024-20                17048832        660.6 ns/op        1200 B/op      2 allocs/op
BenchmarkBinaryPack/encode#16384-20                2085050         6280 ns/op       18495 B/op      2 allocs/op
BenchmarkBinaryPack/encode_with_pool#72-20        34700313        109.2 ns/op          64 B/op      2 allocs/op
BenchmarkBinaryPack/encode_with_pool#1024-20      39370662        101.1 ns/op          64 B/op      2 allocs/op
BenchmarkBinaryPack/encode_with_pool#16384-20     18445262        177.2 ns/op          64 B/op      2 allocs/op
BenchmarkBinaryPack/encode_to#72-20              705428736        16.96 ns/op           0 B/op      0 allocs/op
BenchmarkBinaryPack/encode_to#1024-20            575312358        20.78 ns/op           0 B/op      0 allocs/op
BenchmarkBinaryPack/encode_to#16384-20           100000000        113.4 ns/op           0 B/op      0 allocs/op
BenchmarkBinaryPack/decode_meta#72-20           1000000000        2.887 ns/op           0 B/op      0 allocs/op
BenchmarkBinaryPack/decode_meta#1024-20         1000000000        2.882 ns/op           0 B/op      0 allocs/op
BenchmarkBinaryPack/decode_meta#16384-20        1000000000        2.876 ns/op           0 B/op      0 allocs/op
BenchmarkBinaryPack/decode#72-20                 100000000        85.63 ns/op          80 B/op      1 allocs/op
BenchmarkBinaryPack/decode#1024-20                 7252350        445.4 ns/op        1024 B/op      1 allocs/op
BenchmarkBinaryPack/decode#16384-20                 554329         5499 ns/op       16384 B/op      1 allocs/op
BenchmarkBinaryPack/decode_with_pool#72-20       109352595        33.97 ns/op          16 B/op      1 allocs/op
BenchmarkBinaryPack/decode_with_pool#1024-20      85589674        36.27 ns/op          16 B/op      1 allocs/op
BenchmarkBinaryPack/decode_with_pool#16384-20     26163607        140.4 ns/op          16 B/op      1 allocs/op

Summary

Tongyi Qianwen's Findings

Binary Pack:

encode_to: Fastest, nearly zero allocations, best for high-performance scenarios.
encode_with_pool: Uses memory pool optimization, reduces both time and memory overhead, suitable for most use cases.
encode: Standard method, higher resource consumption.

MsgPack:

encode_with_buf: Preallocated buffer improves performance, suitable for most scenarios.
encode: Standard method, higher resource consumption.
decode: Decoding performance is average, memory usage increases with data size.

Json Pack:

encode_with_buf: Preallocated buffer improves performance, suitable for most scenarios.
encode: Standard method, higher resource consumption.
decode: Poor decoding performance, high memory usage.

Json Compress Pack:

encode: Very high resource usage, not recommended for performance-sensitive tasks.
decode: Poor decoding performance, high memory usage.

Personal Summary

In LAN environments, bandwidth is typically not a bottleneck, so compression isn't necessary. As shown in benchmarks, compression consumes substantial resources. For high-volume packet transmission (e.g., pcap), a custom binary protocol may be preferable due to its predictable metadata parsing and lower memory footprint compared to JSON or MessagePack.

References

Benchmark results for packet construction: https://github.com/zxhio/benchmark/tree/main/pack

Tags: Golang Performance benchmark packet capture

Posted on Fri, 15 May 2026 17:54:37 +0000 by TPerez

Freaks City

Choosing and Optimizing Packet Transmission Protocols Based on Performance Benchmarks

Choosing and Optimizing Packet Transmission Protocols Based on Performance Benchmarks

Packet Size Comparison (By Tongyi Qianwen)

Analysis

Performance Benchmarks

JSON

MessagePack

JSON Compression

Custom Binary Protocol

Summary

Tongyi Qianwen's Findings

Personal Summary

References

Hot Tags