Introduction to Capabilities
Traditional UNIX privilege checks divide processes into two categories: privileged processes (effective user ID 0, known as superuser or root) and unprivileged processes (effective UID non-zero). Privileged processes bypass all kernel permission checks, while unprivileged processes undergo full permission checks based on process credentials.
Starting with kernel 2.2, Linux divides privileges traditionally associated with superuser into distinct units called capabilities, which can be independently enabled and disabled. Capabilities are per-thread attributes.
Requirements for Full Capability Implementation
Complete capability implementation requires:
- The kernel must check if a thread has required capabilities in its effective set for all privileged operations.
- The kernel must provide system calls to allow changing and retrieving thread capability sets.
- The filesystem must support attaching capabilities to executable files so processes gain those capabilities when executing files.
Thread Capability Sets
Each thread has the following capability sets containing zero or more capabilities:
Permitted Set This is a limiting superset of capabilities the thread may have in its effective set. It also restricts capabilities that may be added to the inheritable set for threads without CAP_SETPCAP in their effective set.
Inheritable Set This set is preserved across execve(2). Inheritable capabilities remain inheritable when executing any program, and are added to the permitted set when executing programs with corresponding bits in their file inheritable set.
Effective Set This is the capability set the kernel uses to perform permission checks on the thread.
Bounding Set (per-thread since Linux 2.6.25) The capability bounding set restricts capabilities obtainable during execve(2).
Ambient Set (since Linux 4.3) This set is preserved across execve(2) for unprivileged programs. Ambient capabilities obey the invariant that no capability can be ambient if it is neither permitted nor inheritable.
Child processes inherit copies of their parent's capability sets via fork(2).
File Capabilities
Since kernel 2.6.24, the kernel supports associating capability sets with executable files using setcap(8). File capability sets are stored in the security.capability extended attribute. Writing this attribute requires CAP_SETFCAP capability.
Three file capability sets exist:
Permitted (formerly forced) These capabilities are automatically granted to the thread regardless of thread inheritable capabilities.
Inheritable (formerly allowed) This set is ANDed with the thread's inheritable set to determine which inheritable capabilities are enabled in the thread's permitted set after execve(2).
Effective This is a single bit rather than a set. If set, all new permitted capabilities are also raised in the effective set during execve(2).
Capability Transformation During execve()
During execve(2), the kernel calculates new process capabilities using:
P'(ambient) = (file is privileged) ? 0 : P(ambient)
P'(permitted) = (P(inheritable) & F(inheritable)) |
(F(permitted) & P(bounding)) | P'(ambient)
P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
P'(inheritable) = P(inheritable)
P'(bounding) = P(bounding)
Where P() denotes thread capability values before execve(2), P'() denotes values after execve(2), and F() denotes file capability sets.
Safety Checking for Capability-Unaware Binaries
A capability-unaware binary is marked with file capabilities but hasn't been converted to use libcap(3) API. For such applications, the file effective bit is set so file-permitted capabilities are automatically enabled in the process effective set when executing the file.
When executing a capability-unaware binary, the kernel checks whether the process obtained all capabilities specified in the file permitted set. If not, execve(2) fails with EPERM.
Capabilities and Root Program Execution
When a process with UID 0 (root) executes a program, or when executing a set-user-ID-root program, the kernel treats file capabilities specially:
- If process real or effective user ID is 0, file inheritable and permitted sets are ignored (conceptually considered full).
- If process effective user ID is 0 or file effective bit is enabled, file effective bit is nominally defined as 1.
Capability Bounding Set
The capability bounding set restricts capabilities obtainable during execve(2). The bounding set is ANDed with file permitted capability sets, with the result assigned to the thread's permitted capability set.
Since Linux 2.6.25, the bounding set is a per-thread attribute inherited from parent at fork(2) and preserved across execve(2).
Effect of User ID Changes on Capabilities
When transitioning between 0 and non-zero user IDs:
- If all real, effective, or saved user IDs become non-zero, all capabilities are cleared from permitted, effective, and ambient sets.
- If effective user ID changes from 0 to non-zero, all capabilities are cleared from effective set.
- If effective user ID changes from non-zero to 0, permitted set is copied to effective set.
Programmatically Adjusting Capability Sets
Threads can use capget(2) and capset(2) system calls to retrieve and change their permitted, effective, and inheritable capability sets. The libcap package provides more convenient interfaces.
Rules for changing thread capability sets:
- Without CAP_SETPCAP, new inheritable set must be subset of existing inheritable and permitted sets combined.
- New inheritable set must be subset of existing inheritable and bounding sets combined.
- New permitted set must be subset of existing permitted set.
- New effective set must be subset of new permitted set.
Securebits Flags
Securebits flags disable special treatment of capabilities for UID 0:
SECBIT_KEEP_CAPS Preserves capabilities in permitted set when switching all UIDs from 0 to non-zero.
SECBIT_NO_SETUID_FIXUP Prevents kernel adjustments to process capability settings when switching between 0 and non-zero UIDs.
SECBIT_NOROOT Prevents kernel from granting capabilities when executing set-user-ID-root programs or when root calls execve(2).
SECBIT_NO_CAP_AMBIENT_RAISE Prevents raising ambient capabilities via prctl(2) PR_CAP_AMBIENT_RAISE.
Each basic flag has a locked counterpart that prevents further changes.
Capability List
Key Linux capabilities include:
CAP_CHOWN - Make arbitrary changes to file UIDs and GIDs. CAP_DAC_OVERRIDE - Bypass file read, write, and execute permission checks. CAP_DAC_READ_SEARCH - Bypass file read permission checks and directory read/execute checks. CAP_FOWNER - Bypass permission checks requiring filesystem UID to match file UID. CAP_SETPCAP - Modify capability sets and securebits flags. CAP_SETUID - Manipulate process UIDs. CAP_SYS_ADMIN - Perform system administration operations. CAP_SYS_MODULE - Load and unload kernel modules. CAP_SYS_PTRACE - Trace arbitrary processes using ptrace(2). CAP_NET_ADMIN - Perform network-related operations. CAP_NET_RAW - Use RAW and PACKET sockets.
Practical Examples
Displaying File Capabilities
$ getcap /usr/bin/ping
/usr/bin/ping = cap_net_raw+ep
$ getfattr -d -m "^security\\." /usr/bin/ping
# file: usr/bin/ping
security.capability=0sAQAAAgAgAAAAAAAAAAAAAAAAAAA=
Setting Capabilities
$ sudo setcap cap_net_raw+ep /usr/bin/iftop
$ sudo setcap cap_net_admin+ep /usr/bin/mii-tool
$ sudo setcap 'cap_net_bind_service=+ep' /usr/bin/nginx
Finding Files with Capabilities
$ getcap -r / 2>/dev/null
/usr/bin/ping = cap_net_raw+ep
/usr/bin/nethogs = cap_net_admin,cap_net_raw+ep
/usr/bin/iftop = cap_net_raw+ep
Finding Setuid/Setgid Files
$ find /usr/bin /usr/lib -perm /4000 -user root
$ find /usr/bin /usr/lib -perm /2000 -group root