Binary File Copying Utility
When working with non-text files such as images or executables, or ensuring encoding neutrality, binary mode is essential. The following exercise demonstrates a robust file copying script. Instead of reading line by line, which is inefficient for binary data, we use a chunk-based approach for better performance.
import sys
def copy_binary_data(src_path, dst_path):
"""
Copies content from source to destination in binary mode using chunks.
"""
try:
with open(src_path, 'rb') as source_stream, \
open(dst_path, 'wb') as dest_stream:
while True:
# Read in 4KB chunks
buffer = source_stream.read(4096)
if not buffer:
break
dest_stream.write(buffer)
print(f"Operation completed: {src_path} -> {dst_path}")
except FileNotFoundError:
print(f"Error: The source file '{src_path}' was not found.")
# User input handling
origin = input("Enter source file path: ").strip()
target = input("Enter target file path: ").strip()
copy_binary_data(origin, target)
Conrtolling the File Pointer
In Python, file operations rely on a pointer (cursor) that tracks the current position. Reading and writing moves this pointer automatically, but you can also control it manually using seek(). It is crucial to understand the difference between text mode and binary mode regarding measurement units:
- Text Mode ('t'): The offset and read counts are measured in characters.
- Binary Mode ('b'): The offset and read counts are measured in bytes.
The seek() method syntax is file.seek(offset, whence). The whence parameter determines the reference point:
0: Absolute positioning from the beginning of the file (default).1: Relative positioning from the current cursor position.2: Relative positioning from the end of the file.
Note: Modes 1 and 2 are restricted to binary files to prevent encoding inconsistencies.
# Text mode example: reads 5 characters
with open('data.txt', mode='rt', encoding='utf-8') as f:
chunk = f.read(5)
# Binary mode example: reads 5 bytes
with open('data.txt', mode='rb') as f:
chunk = f.read(5)
# Manual pointer manipulation
with open('data.txt', 'rb') as f:
f.seek(0, 2) # Jump to end of file
f.seek(-5, 2) # Move 5 bytes back from the end
print(f.read()) # Read last 5 bytes
Simulating Real-time Log Monitoring
A common system administration task is monitoring a log file for new entries in real-time, similar to the Linux tail -f command. By utilizing the file pointer, we can jump to the end of the file and wait for new data to arrive.
import time
def tail_log(file_path):
"""
Continuously monitors a file for new lines.
"""
try:
with open(file_path, mode='rb') as stream:
# Move pointer to the very end of the file
stream.seek(0, 2)
while True:
line = stream.readline()
if not line:
# If no new data, wait briefly to save CPU cycles
time.sleep(0.5)
else:
# Decode bytes and print
print(line.decode('utf-8'), end='')
except KeyboardInterrupt:
print("\nMonitoring stopped.")
tail_log('application.log')
Strategies for Modifying File Content
Data on a hard drive is stored in blocks and typically cannot be modified in place safely if the size changes. Therefore, standard file modification usually involves two strategies: loading everything into memory (for smaller files) or creating a new file and swapping it (for larger files).
Method 1: In-Memory Modification
This method reads the entire file into a variable, performs string replacements, and overwrites the original file. It is simple but consumes memory proportional to the file size.
with open('config.ini', 'r', encoding='utf-8') as f:
content = f.read()
# Modify the data
updated_content = content.replace('debug=True', 'debug=False')
# Write back to disk
with open('config.ini', 'w', encoding='utf-8') as f:
f.write(updated_content)
Method 2: File Swap (Efficient for Large Files)
This approach reads line by line and writes to a temporary file. Once complete, it replaces the original file. This is memory efficient.
import os
source_file = 'users.txt'
temp_file = 'users.tmp'
with open(source_file, 'r', encoding='utf-8') as read_f, \
open(temp_file, 'w', encoding='utf-8') as write_f:
for line in read_f:
# Modify specific parts of each line
new_line = line.replace('active', 'inactive')
write_f.write(new_line)
# Replace the old file with the new one
os.remove(source_file)
os.rename(temp_file, source_file)
Introduction to Python Functions
Functions are reusable blocks of code designed to perform a specific task. They help organize code, reduce repetition, and improve readability.
Syntax Structure
def function_name(param1, param2):
"""
Docstring: Explains what the function does.
"""
# Function body (logic)
result = param1 + param2
return result
- def: Keyword to define a function.
- Parameters: Inputs required by the function (optional).
- Docstring: Documentation describing usage (optional but recommended).
- return: Sends a value back to the caller (optional).
Definition vs. Calling
Defining a fucntion does not execute it; it merely registers the logic in memory. The code inside the function only runs when the function is explicitly called.
# 1. Definition Phase
def greet(name):
print(f"Hello, {name}!")
# 2. Call Phase
greet("Alice")
Categories of Functions
- Built-in Functions: Provided by Python (e.g.,
len(),print(),max()). - Custom Functions: Defined by the programmer.
- No arguments: Requires no input data.
- With arguments: Requires specific inputs to operate.
- Empty function: Contains only a
passstatement, used as a placeholder during development.
# Empty function placeholder
def future_feature():
pass
# Function with arguments
def calculate_area(width, height):
return width * height