The Namespace Binding Challenge
When proxying traffic across multiple network namespaces, a common misconception is that each namespace requires a dedicated OS thread. This implies calling setns(2) on a new thread for every namespace, resulting in a 1:1 ratio between namespaces and threads. This approach consumes excessive system resources and complicates thread management. Offloading this to a separate C process introduces further overhead in terms of inter-process communication and protocol parsing.
The key realization, however, is that the network namespace is only relevant at the moment the socket is created. Once the file descriptor is instantiated within the target namespace, the thread can safely revert to the default namespace. This allows a single process to monitor ports across numerous namespaces without spawning a thread for each one.
Go Implementation Strategy
To ensure a socket is born inside a specific namespace, the Go runtime must be prevented from migrating the goroutine to a different OS thread mid-operation. The procedure for establishing a TCP listener is:
- Acquire and store a handle to the root (default) network namespace.
- Acquire a global lock and pin the goroutine to its current OS thread using
runtime.LockOSThread()to prevent scheduling disruptions. - Retrieve the target namespace handle and switch to it via
setns. - Invoke
net.Listento instantiate the socket. - Restore the root namespace context.
- Release the thread lock and the global mutex.
TCP Listener Example
Using the github.com/vishvananda/netns package, we can listen on port 8080 across the root namespace, ns-alpha, and ns-beta. Prepare the namespaces with:
ip netns add ns-alpha
ip netns add ns-beta
The Go implementation demonstrates how to switch contexts safely:
package main
import (
"net"
"runtime"
"sync"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"github.com/vishvananda/netns"
)
var (
rootNsHandle netns.NsHandle
nsLock sync.Mutex
)
func initRootNamespace() {
handle, err := netns.Get()
if err != nil {
panic(err)
}
rootNsHandle = handle
}
func bindSocketInNamespace(targetNs, proto, addr string) (net.Listener, error) {
if targetNs == "" {
return net.Listen(proto, addr)
}
var switched bool
nsLock.Lock()
runtime.LockOSThread()
defer func() {
if switched {
if err := netns.Set(rootNsHandle); err != nil {
logrus.WithError(err).Warn("Failed to revert to root namespace")
}
}
runtime.UnlockOSThread()
nsLock.Unlock()
}()
nsHandle, err := netns.GetFromName(targetNs)
if err != nil {
return nil, errors.Wrap(err, "failed to get namespace handle")
}
defer nsHandle.Close()
if err = netns.Set(nsHandle); err != nil {
return nil, errors.Wrap(err, "failed to switch namespace")
}
switched = true
return net.Listen(proto, addr)
}
func handleConnections(listener net.Listener) {
for {
conn, err := listener.Accept()
if err != nil {
logrus.WithError(err).Error("Accept error")
return
}
logrus.WithFields(logrus.Fields{"local": conn.LocalAddr(), "remote": conn.RemoteAddr()}).Info("Connection established")
conn.Write([]byte("ack"))
conn.Close()
}
}
func main() {
initRootNamespace()
targets := []string{"", "ns-alpha", "ns-beta"}
var wg sync.WaitGroup
for _, ns := range targets {
wg.Add(1)
go func(namespace string) {
defer wg.Done()
listener, err := bindSocketInNamespace(namespace, "tcp", ":8080")
if err != nil {
panic(err)
}
logrus.WithFields(logrus.Fields{"netns": namespace, "addr": listener.Addr()}).Info("Listening")
handleConnections(listener)
}(ns)
}
wg.Wait()
}
Handling UDP and SCTP
UDP sockets follow the same pattern without causing additional thread spawns, as the Go runtime manages scheduling effectively. SCTP, however, presents a distinct challenge. Libraries like github.com/ishidawataru/sctp provide basic file descriptor wrappers where the Accept() method operates as a blocking syscall. When invoked in a loop, the Go runtime creates a new OS thread to prevent the blocking from stalling the scheduler, defeating our thread-conservation goal.
The remedy involves bypassing the library's blocking accept loop and managing the file descriptor manaully:
- Configure the SCTP socket to non-blocking mode.
- Implement a custom epoll-based event loop. (Avoid
selectorpollas their performance degrades severely under high file descriptor counts).
To extract the file descriptor and apply non-blocking settings during creation:
type nonBlockingSctpListener struct {
*sctp.SCTPListener
rawFd int
}
func createNonBlockingSctpListener(network, addr string) (*nonBlockingSctpListener, error) {
parsedAddr, err := parseSctpAddress(addr)
if err != nil {
return nil, err
}
capturedFd := 0
cfg := sctp.SocketConfig{
InitMsg: sctp.InitMsg{NumOstreams: sctp.SCTP_MAX_STREAM},
Control: func(net, address string, rawConn syscall.RawConn) error {
return rawConn.Control(func(fd uintptr) {
if err := syscall.SetNonblock(int(fd), true); err != nil {
syscall.Close(int(fd))
return
}
capturedFd = int(fd)
})
},
}
listener, err := cfg.Listen(network, parsedAddr)
if err != nil {
return nil, err
}
return &nonBlockingSctpListener{SCTPListener: listener, rawFd: capturedFd}, nil
}
Production Metrics
In a production deployment on a 4-core machine, this approach yields significant resource savings. The process manages over 1200 file descriptors (encompassing TCP, UDP, and SCTPv6 sockets across various namespaces), yet operates with merely 14 OS threads. This confirms the viability of sharing threads across multipel namespace contexts rather than dedicating a thread per namespace.