Choosing and Optimizing Packet Transmission Protocols Based on Performance Benchmarks
Scenario: In a local network, multiple machines capture packets via their network interfaces and need to synchronize these packets to a single machine.
Original Approach: Use tcpdump -w to write packets into files, then periodically use rsync to transfer them.
Revised Approach: Rewrite the capture and synchronization logic in Go to send captured packets directly over the network to a server, eliminating a unnecessary disk I/O step.
Creating a pcap file is straightforward — it requires writing a pcap header, followed by each packet with associated metadata.
The pcapgo library can help achieve this. The raw packet data is stored in p.buffer[:ci.CaptureLength].
ci := gopacket.CaptureInfo{
CaptureLength: int(n),
Length: int(n),
Timestamp: time.Now(),
}
if ci.CaptureLength > len(p.buffer) {
ci.CaptureLength = len(p.buffer)
}
w.WritePacket(ci, p.buffer[:ci.CaptureLength])
To distinguish packets from different machines, an identifier needs to be included. The structure includes metadata and raw packet data as follows:
// from github.com/google/gopacket
type CaptureInfo struct {
// Timestamp is the time the packet was captured, if that is known.
Timestamp time.Time `json:"ts" msgpack:"ts"`
// CaptureLength is the total number of bytes read off of the wire.
CaptureLength int `json:"cap_len" msgpack:"cap_len"`
// Length is the size of the original packet. Should always be >= CaptureLength.
Length int `json:"len" msgpack:"len"`
// InterfaceIndex
InterfaceIndex int `json:"iface_idx" msgpack:"iface_idx"`
}
type CapturePacket struct {
CaptureInfo
Id uint32 `json:"id" msgpack:"id"`
Data []byte `json:"data" msgpack:"data"`
}
A key question remains: what format should be used to transmit the packet data? JSON, MessagePack, or a custom binary protocol?
JSON and MessagePack have well-defined standards, offer broad compatibility, and reduce bugs due to their simplicity. However, they sacrifice some performance. A custom binary protocol allows for more control by removing unnecessary fields and keys, reducing memory allocations and GC pressure.
Optimizaton strategies for the custom binary protocol:
- Represent fixed-size fields like CaptureInfo and Id in a compact byte layout. For example, CaptureLength and Length can be encoded in two bytes, and Id can be represented in one byte if its range is limited.
- Memory reuse:
- Avoid internal allocation during encoding by writing directly into an external buffer. If the buffer is synchronized, there will be zero allocations.
- Similarly, decoding should not allocate memory internally; it should parse metadata and copy the Data slice. If synchronization is used, this also results in zero allocations.
- For asynchronous operations, Data slices should be copied where necessary. Use sync.Pool to optimize memory management, using four pools for sizes 128, 1024, 8192, and 65536 bytes.
Key optimizations of sync.Pool:
- In asynchronous scenarios, each Packet.Data requires its own memory space and cannot be reused; use sync.Pool to manage this.
- Fixed-length buffers for metadata serialization should avoid triggering garbage collection.
func acquirePacketBuf(n int) ([]byte, func()) {
var (
buf []byte
putfn func()
)
if n <= CapturePacketMetaLen+128 {
smallBuf := smallBufPool.Get().(*[CapturePacketMetaLen + 128]byte)
buf = smallBuf[:0]
putfn = func() { smallBufPool.Put(smallBuf) }
} else if n <= CapturePacketMetaLen+1024 {
midBuf := midBufPool.Get().(*[CapturePacketMetaLen + 1024]byte)
buf = midBuf[:0]
putfn = func() { midBufPool.Put(midBuf) }
} else if n <= CapturePacketMetaLen+8192 {
largeBuf := largeBufPool.Get().(*[CapturePacketMetaLen + 8192]byte)
buf = largeBuf[:0]
putfn = func() { largeBufPool.Put(largeBuf) }
} else {
xlargeBuf := xlargeBufPool.Get().(*[CapturePacketMetaLen + 65536]byte)
buf = xlargeBuf[:0]
putfn = func() { xlargeBufPool.Put(xlargeBuf) }
}
return buf, putfn
}
func (binaryPack) EncodeTo(p *CapturePacket, w io.Writer) (int, error) {
buf := metaBufPool.Get().(*[CapturePacketMetaLen]byte)
defer metaBufPool.Put(buf)
binary.BigEndian.PutUint64(buf[0:], uint64(p.Timestamp.UnixMicro()))
...
return nm + nd, err
}
Packet Size Comparison (By Tongyi Qianwen)
MethodOriginal Size (bytes)Encoded Size (bytes)Size Increase (bytes)Binary Pack7294+22Binary Pack10241046+22Binary Pack1638416406+22MsgPack72150+78MsgPack10241103+79MsgPack1638416463+79Json Pack72191+119Json Pack10241467+443Json Pack1638421949+5565Json Compress Pack72195+123Json Compress Pack10241114+90Json Compress Pack1638415504-120
Analysis
- Binary Pack:
- Small packets (72 bytes): 22-byte overhead.
- Large packets (16384 bytes): 22-byte overhead.
- Efficient overall with minimal extra size.
- MsgPack:
- Small packets (72 bytes): 78-byte overhead.
- Large packets (16384 bytes): 79-byte overhead.
- Less efficient for small data but consistent for large.
- Json Pack:
- Small packets (72 bytes): 119-byte overhead.
- Large packets (16384 bytes): 5565-byte overhead.
- Inefficient, especially for large data.
- Json Compress Pack:
- Small packets (72 bytes): 123-byte overhead.
- Large packets (16384 bytes): 120-byte overhead.
- Better compression for large data, showing reduced overhead.
Performance Benchmarks
JSON
Buffer reuse shows significant improvement, mainly due to reduced memory allocation.
BenchmarkJsonPack/encode#72-20 17315143 647.1 ns/op 320 B/op 3 allocs/op
BenchmarkJsonPack/encode#1024-20 4616841 2835 ns/op 1666 B/op 3 allocs/op
BenchmarkJsonPack/encode#16384-20 365313 34289 ns/op 24754 B/op 3 allocs/op
BenchmarkJsonPack/encode_with_buf#72-20 24820188 447.4 ns/op 128 B/op 2 allocs/op
BenchmarkJsonPack/encode_with_buf#1024-20 13139395 910.6 ns/op 128 B/op 2 allocs/op
BenchmarkJsonPack/encode_with_buf#16384-20 1414260 8472 ns/op 128 B/op 2 allocs/op
BenchmarkJsonPack/decode#72-20 8699952 1364 ns/op 304 B/op 8 allocs/op
BenchmarkJsonPack/decode#1024-20 2103712 5605 ns/op 1384 B/op 8 allocs/op
BenchmarkJsonPack/decode#16384-20 159140 73101 ns/op 18664 B/op 8 allocs/op
MessagePack
Similar benefits from buffer reuse. Performance divergence around 1024 bytes, with MessagePack outperforming for larger payloads and keeping memory usage stable.
BenchmarkMsgPack/encode#72-20 10466427 1199 ns/op 688 B/op 8 allocs/op
BenchmarkMsgPack/encode#1024-20 6599528 2132 ns/op 1585 B/op 8 allocs/op
BenchmarkMsgPack/encode#16384-20 1478127 8806 ns/op 18879 B/op 8 allocs/op
BenchmarkMsgPack/encode_with_buf#72-20 26677507 388.2 ns/op 192 B/op 4 allocs/op
BenchmarkMsgPack/encode_with_buf#1024-20 31426809 400.2 ns/op 192 B/op 4 allocs/op
BenchmarkMsgPack/encode_with_buf#16384-20 22588560 494.5 ns/op 192 B/op 4 allocs/op
BenchmarkMsgPack/decode#72-20 19894509 654.2 ns/op 280 B/op 10 allocs/op
BenchmarkMsgPack/decode#1024-20 18211321 664.0 ns/op 280 B/op 10 allocs/op
BenchmarkMsgPack/decode#16384-20 13755824 769.1 ns/op 280 B/op 10 allocs/op
JSON Compression
In a LAN environment, bandwidth is usually not a concern, making compression benchmarks less relevant.
BenchmarkJsonCompressPack/encode#72-20 19934 709224 ns/op 1208429 B/op 26 allocs/op
BenchmarkJsonCompressPack/encode#1024-20 17577 766349 ns/op 1212782 B/op 26 allocs/op
BenchmarkJsonCompressPack/encode#16384-20 11757 860371 ns/op 1253975 B/op 25 allocs/op
BenchmarkJsonCompressPack/decode#72-20 490164 28972 ns/op 42048 B/op 15 allocs/op
BenchmarkJsonCompressPack/decode#1024-20 187113 71612 ns/op 47640 B/op 23 allocs/op
BenchmarkJsonCompressPack/decode#16384-20 35790 346580 ns/op 173352 B/op 30 allocs/op
Custom Binary Protocol
After memory reuse, serialization and deserialization show marked performance improvements. In synchronous contexts, zero allocations are achievable. In asynchronous cases, using sync.Pool ensures fixed-size memory allocation.
BenchmarkBinaryPack/encode#72-20 72744334 187.1 ns/op 144 B/op 2 allocs/op
BenchmarkBinaryPack/encode#1024-20 17048832 660.6 ns/op 1200 B/op 2 allocs/op
BenchmarkBinaryPack/encode#16384-20 2085050 6280 ns/op 18495 B/op 2 allocs/op
BenchmarkBinaryPack/encode_with_pool#72-20 34700313 109.2 ns/op 64 B/op 2 allocs/op
BenchmarkBinaryPack/encode_with_pool#1024-20 39370662 101.1 ns/op 64 B/op 2 allocs/op
BenchmarkBinaryPack/encode_with_pool#16384-20 18445262 177.2 ns/op 64 B/op 2 allocs/op
BenchmarkBinaryPack/encode_to#72-20 705428736 16.96 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/encode_to#1024-20 575312358 20.78 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/encode_to#16384-20 100000000 113.4 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_meta#72-20 1000000000 2.887 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_meta#1024-20 1000000000 2.882 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_meta#16384-20 1000000000 2.876 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode#72-20 100000000 85.63 ns/op 80 B/op 1 allocs/op
BenchmarkBinaryPack/decode#1024-20 7252350 445.4 ns/op 1024 B/op 1 allocs/op
BenchmarkBinaryPack/decode#16384-20 554329 5499 ns/op 16384 B/op 1 allocs/op
BenchmarkBinaryPack/decode_with_pool#72-20 109352595 33.97 ns/op 16 B/op 1 allocs/op
BenchmarkBinaryPack/decode_with_pool#1024-20 85589674 36.27 ns/op 16 B/op 1 allocs/op
BenchmarkBinaryPack/decode_with_pool#16384-20 26163607 140.4 ns/op 16 B/op 1 allocs/op
Summary
Tongyi Qianwen's Findings
Binary Pack:
- encode_to: Fastest, nearly zero allocations, best for high-performance scenarios.
- encode_with_pool: Uses memory pool optimization, reduces both time and memory overhead, suitable for most use cases.
- encode: Standard method, higher resource consumption.
MsgPack:
- encode_with_buf: Preallocated buffer improves performance, suitable for most scenarios.
- encode: Standard method, higher resource consumption.
- decode: Decoding performance is average, memory usage increases with data size.
Json Pack:
- encode_with_buf: Preallocated buffer improves performance, suitable for most scenarios.
- encode: Standard method, higher resource consumption.
- decode: Poor decoding performance, high memory usage.
Json Compress Pack:
- encode: Very high resource usage, not recommended for performance-sensitive tasks.
- decode: Poor decoding performance, high memory usage.
Personal Summary
In LAN environments, bandwidth is typically not a bottleneck, so compression isn't necessary. As shown in benchmarks, compression consumes substantial resources. For high-volume packet transmission (e.g., pcap), a custom binary protocol may be preferable due to its predictable metadata parsing and lower memory footprint compared to JSON or MessagePack.
References
- Benchmark results for packet construction: https://github.com/zxhio/benchmark/tree/main/pack