This article delves into fundamental Linux file operations, the creation and utilization of static and dynamic libraries, and essential system concepts.
C Language File I/O Review
Before diving into system-level operations, a brief recap of C language file I/O is provided. The standard streams stdin, stdout, and stderr are discussed, which are treated as files by the C language. Examples demonstrate writing to and reading from files using functions like fopen, fprintf, fscanf, and fclose. The modes 'w' (write) and 'a' (append) are explained, highlighting how 'w' overwrites existing content while 'a' appends to it.
The distinction between stdout and stderr is clarified, particularly in the context of output redirection. While both typically output to the console, their behavior under redirection differs.
System File I/O and System Calls
The core of the article shifts to system-level file I/O, bypassing C's standard library wrappers to interact directly with operating system system calls. This approach offers a deeper understanding of how file operations are managed at the kernel level.
The open system call is introduced as the primary function for opening files. Its parameters, pathname (file path) and flags (access modes like O_RDONLY, O_WRONLY, O_RDWR, O_CREAT, O_APPEND), are explained. The optional mode parameter for creating new files and setting their permissions is also detailed.
The write system call is used for writing data to a file descriptor, and read is used for reading data from it. Examples demonstrate their usage, emphasizing that the null terminator ('\0') is not written by write as it's a C-specific convention.
The concept of a file descriptor (fd) is explained. It's an integer returned by open that represents an open file within a process. Standard streams stdin, stdout, and stderr correspond to file descriptors 0, 1, and 2, respectively. Subsequent file descriptors are typically allocated starting from 3.
File Management in the Kernel
The article explains how the operating system manages open files using a "describe then organize" approach, similar to process management. Each open file is represented by a kernel-level struct file, containing file attributes. These struct file instances are organized, typically in a doubly linked list. A process's file descriptor table, managed by struct files_struct, maps file descriptors to these struct file objects.
The concept of polymorphism is touched upon via struct file_operations, which contains function pointers (read, write) allowing generic file handling code to interact with different underlying file types and devices through their specific driver implementations.
Redirection and File Descriptors
Redirection is demonstrated by manipulating file descriptors. Closing stendard output (fd 1) and then opening a file allows subsequent output operations (like printf) to be directed to that file. This is explained in the context of C's standard streams (stdout) and their underlying system file descriptors.
The dup2 system call is introduced as a more flexible method for redirection. It allows one file descriptor to be duplicated, creating a new descriptor that points to the same open file description as the original. This enables input, output, and append redirection without explicitly closing standard streams.
Buffering
The concept of buffering is explained, distinguishing between user-level (C standard library) buffers and kernel-level buffers. Data written via C library functions (like printf) first goes into a C buffer, then is flushed to the kernel buffer, and finally written to the actual device. Different buffering strategies (no buffering, line buffering, full buffering) apply depending on the target device (console vs. disk).
The behavior of file operations (printf, write) during redirection and process creation (fork) is analyzed in light of these buffering mechanisms, explaining why output might appear in different places or be duplicated.
File System Structure
The underlying structure of a file system on disk is described. This includes concepts like partitions, block groups, superblocks, group descriptor tables, inode bitmaps, block bitmaps, inodes, and data blocks. An inode stores file attributes, while data blocks store the file's content. Filenames are mappings to inodes, which in turn point to the data blocks. Directory entries themselves are special files containing filename-to-inode number mappings.
The article touches on file timestamps: Access time (atime), Modify time (mtime), and Change time (ctime), explaining their meanings and how they are updated (or optimized not to be updated immediately for atime).
Soft Links vs. Hard Links
The concepts of symbolic (soft) links and hard links are explained using the ln command.
- Soft links are independent files with their own inodes that store the path to the target file. They are analogous to shortcuts. Deleting the original file breaks the soft link.
- Hard links are additional directory entries that point to the same inode as the original file. They do not have their own inodes. Deleting a hard link only decrements a link count associated with the inode; the file data is only truly removed when the link count reaches zero.
The stat command is used to inspect file attributes, including inode numbers and link counts.
Static Library Creation and Usage
The process of creating a static library is detailed. Source files are compiled into object files (.o), typically with the -c flag. These object files are then bundled into a single archive file (e.g., .a) using the ar utility. This archive is the static library.
To use a static library:
- The header files (
.h) must be accessible during compilation (e.g., via-Iflag or system include paths). - The library file (
.a) must be linked during the linking phase, typically using the-L(library path) and-l(library name, e.g.,-lhelloforlibhello.a) flags.
When a program is linked statically, the library's code is copied directly into the executable, resulting in larger executable files.
Dynamic Library Creation and Usage
Dynamic libraries (shared libraries, .so files) are created similarly but with key differences:
- Object files are compiled with the
-fPIC(Position Independent Code) flag, allowing the libray code to be loaded at any address in the process's memory space. - The object files are then linked into a shared object file using the
-sharedflag withgcc.
To use a dynamic library:
- Header files are included as with static libraries (
-I). - The library path is specified (
-L). - The library name is specified (
-l).
Unlike static libraries, the code from dynamic libraries is not copied into the executable. Instead, the executable contains information about how to load and link the library at runtime. This allows multiple programs to share a single copy of the library in memory, saving space. However, the system must be able to locate the dynamic library file at runtime.
Methods for making dynamic libraries discoverable at runtime include:
- Copying them to standard system library paths (e.g.,
/lib64,/usr/lib64). - Setting the
LD_LIBRARY_PATHenvironment variable. - Configuring the dynamic linker using files in
/etc/ld.so.conf.d/and runningldconfig. - Creating symbolic links in standard system libray paths.
The article also explains GCC's linking preference: it defaults to dynamic linking if both static and dynamic versions of a library are available. The -static flag can be used to force static linking.
Process Replacement and File Handles
The behavior of open file descriptors during process replacement (e.g., using exec) is discussed. File descriptors are generally preserved across exec calls because they are managed by the kernel and are associated with the process's file descriptor table, not just the code segment being replaced.
Code Examples
Throughout the article, practical C code examples are provided to demonstrate system calls, redirection, library creation, and usage. These examples include creating files, writing/reading data, manipulating file descriptors, making static and dynamic libraries, and linking programs against these libraries.