- Virtual File System Abstraction
Linux accommodates a wide variety of storage formats, including ext4, XFS, FAT, NTFS, and iso9660. Despite the differences in on-disk structures and underlying hardware, the operating system presents a unified directory tree to applications. Operations like directory listing, reading, and writing behave identically regardless of the specific storage backend. This uniformity is achieved through an abstraction layer embedded within the kernel known as the Virtual File System (VFS). VFS standardizes core concepts such as files, directories, inodes, and file descriptors, allowing diverse filesystem implementations to register their operations against a common enterface.
1.1 Duplicating File Descriptors
The dup() and dup2() system calls create copies of an existing file descriptor. Unlike invoking open() twice on the same path (which generates two independent struct file instances with separate file offsets and status flags), duplication causes both descriptors to reference the exact same underlying file object. Consequently, the file offset and access flags are shared, and the kernel increments the reference count on the file structure.
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
int target_fd = open("session.log", O_RDWR | O_CREAT, 0644);
if (target_fd < 0) {
perror("open");
exit(1);
}
int preserved_stdout = dup(STDOUT_FILENO);
if (preserved_stdout < 0) {
perror("dup");
close(target_fd);
exit(1);
}
/* Redirect standard output to the target file */
if (dup2(target_fd, STDOUT_FILENO) == -1) {
perror("dup2");
exit(1);
}
close(target_fd);
const char *payload = "System stream redirected.\n";
write(STDOUT_FILENO, payload, strlen(payload));
/* Restore original standard output */
if (dup2(preserved_stdout, STDOUT_FILENO) == -1) {
perror("restore");
exit(1);
}
close(preserved_stdout);
write(STDOUT_FILENO, payload, strlen(payload));
return 0;
}
- Core File Metadata and Attribute Operations
2.1 Inspecting Inode Information
The stat() family retrieves metadata stored in an inode. Three variants exist:
stat(): Follows symbolic links and inspects the target.lstat(): Examines the symbolic link itself without following it.fstat(): Operates on an already open file descriptor.
The returned struct stat contains critical fields:
dev_t st_dev; /* Device ID */
ino_t st_ino; /* Inode number */
mode_t st_mode; /* File type and permissions */
nlink_t st_nlink; /* Hard link count */
uid_t st_uid; /* Owner UID */
gid_t st_gid; /* Group GID */
off_t st_size; /* File size in bytes */
time_t st_atime; /* Last access time */
time_t st_mtime; /* Last content modification time */
time_t st_ctime; /* Last metadata change time */
Time attributes distinguish between read operations (atime), data writes (mtime), and permission/ownership updates (ctime).
2.2 Permission and Ownership Verificasion
access() checks file accessibility against the real user and group IDs, following symbolic links. It accepts mode flags like R_OK, W_OK, X_OK, and F_OK. This differs from the effective IDs used during actual execution, which is particularly relevant when setuid/setgid binaries or sudo are involved.
2.3 Modifying Attributes
chmod() and fchmod() alter file permission bits. Modifying ownership requires chown() or fchown(), which typically demand superuser privileges. Timestamps can be manually adjusted using utime(), while truncate() and ftruncate() instantly shrink or expand a file to a specified byte length.
- File Linking and Path Management
3.1 Hard Links vs Symbolic Links
link() creates a hard link by pointing a new directory entry to an existing inode. Hard links share the same data blocks and usually must reside on the same filesystem. Removing a file with rm actually invokes unlink(), wich decrements the inode's link count. The underlying data is only reclaimed when the count reaches zero and no processes hold the file open. This behavior enables a secure pattern for temporary files: create the file, immediately unlink() it, and rely on the kernel to clean it up once the descriptor closes.
symlink() generates an independent file containing a path string. readlink() extracts this path without accessing the target. Symbolic links can cross filesystem boundaries and point to directories, but they do not share inode metadata or data blocks.
3.2 Atomic Renaming and Directory Navigation
rename() atomically moves or renames a file or directory across the filesystem. Working directory management relies on chdir() (changes the current process directory) and getcwd() (retrieves the absolute path). Runtime filesystem limits can be queried via pathconf() and fpathconf().
- Directory I/O and Traversal
Directory manipulation uses a dedicated set of functions. mkdir() creates new directories with specified permissions, while rmdir() removes empty ones. Iteration requires opendir() to establish a DIR* stream, followed by repeated readdir() calls that yield struct dirent entries containing filenames and types. The stream position can be tracked with telldir() and reset using seekdir() or rewinddir(). Finally, closedir() releases the stream resources.
4.1 Recursive Directory Walker Implementation
The following implementation demonstrates a callback-driven recursive scanner. It replaces fixed buffers with safer string handling and uses lstat() to safely inspect entry types without immediately following symbolic links.
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <dirent.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define PATH_LIMIT 2048
typedef void (*entry_processor_t)(const char *);
void scan_directory(const char *root_path, entry_processor_t processor) {
DIR *stream = opendir(root_path);
if (!stream) {
fprintf(stderr, "Cannot open: %s\n", root_path);
return;
}
struct dirent *record;
while ((record = readdir(stream)) != NULL) {
if (strcmp(record->d_name, ".") == 0 || strcmp(record->d_name, "..") == 0) {
continue;
}
char resolved_path[PATH_LIMIT];
int written = snprintf(resolved_path, sizeof(resolved_path), "%s/%s", root_path, record->d_name);
if (written < 0 || written >= PATH_LIMIT) {
fprintf(stderr, "Path buffer overflow at: %s\n", root_path);
continue;
}
processor(resolved_path);
}
closedir(stream);
}
void process_item(const char *path) {
struct stat info;
if (lstat(path, &info) == -1) {
perror("stat");
return;
}
if (S_ISDIR(info.st_mode)) {
scan_directory(path, process_item);
}
printf("%10lld %s\n", (long long)info.st_size, path);
}
int main(int argc, char *argv[]) {
const char *target = (argc > 1) ? argv[1] : ".";
process_item(target);
return 0;
}
Note that while this traversal handles standard hierarchies efficiently, it remains vulnerable to infinite loops if circular symbolic links are present. Robust production scanners typically maintain a visited inode cache or use fts() to track directory device/inode pairs and prevent revisiting.