Choosing and Optimizing Packet Transmission Protocols Based on Performance Benchmarks
Scenario: In a local network, multiple machines capture packets via their network interfaces and need to synchronize these packets to a single machine.
Original Approach: Use tcpdump -w to write packets into files, then periodically use rsync to transfer them.
Revised Approach: Rewrite the capture and synchronization logic in Go to send captured packets directly over the network to a server, eliminating an unnecessary disk I/O step.
Creating a pcap file is straightforward — it requires writing a pcap header, followed by each packet with associated metadata.
The pcapgo library can help achieve this. The raw packet data is stored in p.buffer[:ci.CaptureLength].
ci := gopacket.CaptureInfo{
CaptureLength: int(n),
Length: int(n),
Timestamp: time.Now(),
}
if ci.CaptureLength > len(p.buffer) {
ci.CaptureLength = len(p.buffer)
}
w.WritePacket(ci, p.buffer[:ci.CaptureLength])
To distinguish packets from different machines, an identifier needs to be included. The structure includes metadata and raw packet data as follows:
// from github.com/google/gopacket
type CaptureInfo struct {
// Timestamp is the time the packet was captured, if that is known.
Timestamp time.Time `json:"ts" msgpack:"ts"`
// CaptureLength is the total number of bytes read off of the wire.
CaptureLength int `json:"cap_len" msgpack:"cap_len"`
// Length is the size of the original packet. Should always be >= CaptureLength.
Length int `json:"len" msgpack:"len"`
// InterfaceIndex
InterfaceIndex int `json:"iface_idx" msgpack:"iface_idx"`
}
type CapturePacket struct {
CaptureInfo
Id uint32 `json:"id" msgpack:"id"`
Data []byte `json:"data" msgpack:"data"`
}
A key question remains: what format should be used to transmit the packet data? JSON, MessagePack, or a custom binary protocol?
JSON and MessagePack have well-defined standards, offer broad compatibility, and reduce bugs due to they simplicity. However, they sacrifice some performance. A custom binary protocol allows for more control by removing unnecessary fields and keys, reducing memory allocations and GC pressure.
Optimization strategies for the custom binary protocol:
- Represent fixed-size fields like CaptureInfo and Id in a compact byte layout. For example, CaptureLength and Length can be encoded in two bytes, and Id can be represented in one byte if its range is limited.
- Memory reuse:
- Avoid internal allocation during encoding by writing directly into an external buffer. If the buffer is synchronized, there will be zero allocations.
- Similarly, decoding should not allocate memory internally; it should parse metadata and copy the Data slice. If synchronization is used, this also results in zero allocations.
- For asynchronous operations, Data slices should be copied where necessary. Use
sync.Poolto optimize memory management, using four pools for sizes 128, 1024, 8192, and 65536 bytes.
Key optimizations of sync.Pool:
- In asynchronous scenarios, each Packet.Data requires its own memory space and cannot be reused; use
sync.Poolto manage this. - Fixed-length buffers for metadata serialization should avoid triggering garbage collection.
func acquirePacketBuf(n int) ([]byte, func()) {
var (
buf []byte
putfn func()
)
if n <= CapturePacketMetaLen+128 {
smallBuf := smallBufPool.Get().(*[CapturePacketMetaLen + 128]byte)
buf = smallBuf[:0]
putfn = func() { smallBufPool.Put(smallBuf) }
} else if n <= CapturePacketMetaLen+1024 {
midBuf := midBufPool.Get().(*[CapturePacketMetaLen + 1024]byte)
buf = midBuf[:0]
putfn = func() { midBufPool.Put(midBuf) }
} else if n <= CapturePacketMetaLen+8192 {
largeBuf := largeBufPool.Get().(*[CapturePacketMetaLen + 8192]byte)
buf = largeBuf[:0]
putfn = func() { largeBufPool.Put(largeBuf) }
} else {
xlargeBuf := xlargeBufPool.Get().(*[CapturePacketMetaLen + 65536]byte)
buf = xlargeBuf[:0]
putfn = func() { xlargeBufPool.Put(xlargeBuf) }
}
return buf, putfn
}
func (binaryPack) EncodeTo(p *CapturePacket, w io.Writer) (int, error) {
buf := metaBufPool.Get().(*[CapturePacketMetaLen]byte)
defer metaBufPool.Put(buf)
binary.BigEndian.PutUint64(buf[0:], uint64(p.Timestamp.UnixMicro()))
...
return nm + nd, err
}
Packet Size Comparison (By Tongyi Qianwen)
| Method | Original Size (bytes) | Encoded Size (bytes) | Size Increase (bytes) |
|---|---|---|---|
| **Binary Pack** | 72 | 94 | +22 |
| **Binary Pack** | 1024 | 1046 | +22 |
| **Binary Pack** | 16384 | 16406 | +22 |
| **MsgPack** | 72 | 150 | +78 |
| **MsgPack** | 1024 | 1103 | +79 |
| **MsgPack** | 16384 | 16463 | +79 |
| **Json Pack** | 72 | 191 | +119 |
| **Json Pack** | 1024 | 1467 | +443 |
| **Json Pack** | 16384 | 21949 | +5565 |
| **Json Compress Pack** | 72 | 195 | +123 |
| **Json Compresss Pack** | 1024 | 1114 | +90 |
| **Json Compress Pack** | 16384 | 15504 | -120 |
Analysis
- Binary Pack:
- Small packets (72 bytes): 22-byte overhead.
- Large packets (16384 bytes): 22-byte overhead.
- Efficient overall with minimal extra size.
- MsgPack:
- Small packets (72 bytes): 78-byte overhead.
- Large packets (16384 bytes): 79-byte overhead.
- Less efficient for small data but consistent for large.
- Json Pack:
- Small packets (72 bytes): 119-byte overhead.
- Large packets (16384 bytes): 5565-byte overhead.
- Inefficient, especially for large data.
- Json Compress Pack:
- Small packets (72 bytes): 123-byte overhead.
- Large packets (16384 bytes): 120-byte overhead.
- Better compression for large data, showing reduced overhead.
Performance Benchmarks
JSON
Buffer reuse shows significant improvement, mainly due to reduced memory allocation.
BenchmarkJsonPack/encode#72-20 17315143 647.1 ns/op 320 B/op 3 allocs/op
BenchmarkJsonPack/encode#1024-20 4616841 2835 ns/op 1666 B/op 3 allocs/op
BenchmarkJsonPack/encode#16384-20 365313 34289 ns/op 24754 B/op 3 allocs/op
BenchmarkJsonPack/encode_with_buf#72-20 24820188 447.4 ns/op 128 B/op 2 allocs/op
BenchmarkJsonPack/encode_with_buf#1024-20 13139395 910.6 ns/op 128 B/op 2 allocs/op
BenchmarkJsonPack/encode_with_buf#16384-20 1414260 8472 ns/op 128 B/op 2 allocs/op
BenchmarkJsonPack/decode#72-20 8699952 1364 ns/op 304 B/op 8 allocs/op
BenchmarkJsonPack/decode#1024-20 2103712 5605 ns/op 1384 B/op 8 allocs/op
BenchmarkJsonPack/decode#16384-20 159140 73101 ns/op 18664 B/op 8 allocs/op
MessagePack
Similar benefits from buffer reuse. Performance divergence around 1024 bytes, with MessagePack outperforming for larger payloads and keeping memory usage stable.
BenchmarkMsgPack/encode#72-20 10466427 1199 ns/op 688 B/op 8 allocs/op
BenchmarkMsgPack/encode#1024-20 6599528 2132 ns/op 1585 B/op 8 allocs/op
BenchmarkMsgPack/encode#16384-20 1478127 8806 ns/op 18879 B/op 8 allocs/op
BenchmarkMsgPack/encode_with_buf#72-20 26677507 388.2 ns/op 192 B/op 4 allocs/op
BenchmarkMsgPack/encode_with_buf#1024-20 31426809 400.2 ns/op 192 B/op 4 allocs/op
BenchmarkMsgPack/encode_with_buf#16384-20 22588560 494.5 ns/op 192 B/op 4 allocs/op
BenchmarkMsgPack/decode#72-20 19894509 654.2 ns/op 280 B/op 10 allocs/op
BenchmarkMsgPack/decode#1024-20 18211321 664.0 ns/op 280 B/op 10 allocs/op
BenchmarkMsgPack/decode#16384-20 13755824 769.1 ns/op 280 B/op 10 allocs/op
JSON Compression
In a LAN environment, bandwidth is usually not a concern, making compression benchmarks less relevant.
BenchmarkJsonCompressPack/encode#72-20 19934 709224 ns/op 1208429 B/op 26 allocs/op
BenchmarkJsonCompressPack/encode#1024-20 17577 766349 ns/op 1212782 B/op 26 allocs/op
BenchmarkJsonCompressPack/encode#16384-20 11757 860371 ns/op 1253975 B/op 25 allocs/op
BenchmarkJsonCompressPack/decode#72-20 490164 28972 ns/op 42048 B/op 15 allocs/op
BenchmarkJsonCompressPack/decode#1024-20 187113 71612 ns/op 47640 B/op 23 allocs/op
BenchmarkJsonCompressPack/decode#16384-20 35790 346580 ns/op 173352 B/op 30 allocs/op
Custom Binary Protocol
After memory reuse, serialization and deserialization show marked performance improvements. In synchronous contexts, zero allocations are achievable. In asynchronous cases, using sync.Pool ensures fixed-size memory allocation.
BenchmarkBinaryPack/encode#72-20 72744334 187.1 ns/op 144 B/op 2 allocs/op
BenchmarkBinaryPack/encode#1024-20 17048832 660.6 ns/op 1200 B/op 2 allocs/op
BenchmarkBinaryPack/encode#16384-20 2085050 6280 ns/op 18495 B/op 2 allocs/op
BenchmarkBinaryPack/encode_with_pool#72-20 34700313 109.2 ns/op 64 B/op 2 allocs/op
BenchmarkBinaryPack/encode_with_pool#1024-20 39370662 101.1 ns/op 64 B/op 2 allocs/op
BenchmarkBinaryPack/encode_with_pool#16384-20 18445262 177.2 ns/op 64 B/op 2 allocs/op
BenchmarkBinaryPack/encode_to#72-20 705428736 16.96 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/encode_to#1024-20 575312358 20.78 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/encode_to#16384-20 100000000 113.4 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_meta#72-20 1000000000 2.887 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_meta#1024-20 1000000000 2.882 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_meta#16384-20 1000000000 2.876 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode#72-20 100000000 85.63 ns/op 80 B/op 1 allocs/op
BenchmarkBinaryPack/decode#1024-20 7252350 445.4 ns/op 1024 B/op 1 allocs/op
BenchmarkBinaryPack/decode#16384-20 554329 5499 ns/op 16384 B/op 1 allocs/op
BenchmarkBinaryPack/decode_with_pool#72-20 109352595 33.97 ns/op 16 B/op 1 allocs/op
BenchmarkBinaryPack/decode_with_pool#1024-20 85589674 36.27 ns/op 16 B/op 1 allocs/op
BenchmarkBinaryPack/decode_with_pool#16384-20 26163607 140.4 ns/op 16 B/op 1 allocs/op
Summary
Tongyi Qianwen's Findings
Binary Pack:
- encode_to: Fastest, nearly zero allocations, best for high-performance scenarios.
- encode_with_pool: Uses memory pool optimization, reduces both time and memory overhead, suitable for most use cases.
- encode: Standard method, higher resource consumption.
MsgPack:
- encode_with_buf: Preallocated buffer improves performance, suitable for most scenarios.
- encode: Standard method, higher resource consumption.
- decode: Decoding performance is average, memory usage increases with data size.
Json Pack:
- encode_with_buf: Preallocated buffer improves performance, suitable for most scenarios.
- encode: Standard method, higher resource consumption.
- decode: Poor decoding performance, high memory usage.
Json Compress Pack:
- encode: Very high resource usage, not recommended for performance-sensitive tasks.
- decode: Poor decoding performance, high memory usage.
Personal Summary
In LAN environments, bandwidth is typically not a bottleneck, so compression isn't necessary. As shown in benchmarks, compression consumes substantial resources. For high-volume packet transmission (e.g., pcap), a custom binary protocol may be preferable due to its predictable metadata parsing and lower memory footprint compared to JSON or MessagePack.
References
- Benchmark results for packet construction: https://github.com/zxhio/benchmark/tree/main/pack