Overview
Data storage can be accomplished through databases or files. While databases maintain data integrity and relationships with enhanced security, file-based storage offers simplicity and ease of use for uncomplicated data structures without requiring database management systems.
Python provides several modules for file manipulation including os, os.path, shutil, and pickle.
Understanding Files
Files represent collections of data stored on external media such as hard drives, USB devices, or flash memory. A typical filename follows the format: filename[.extension]. For instance, program files contain code while data files store information.
File Classification
By Storage Medium
- Regular files: Data stored on external storage devices
- Device files: System interfaces like monitors and keyboards
By Access Method
- Sequential access files: Data accessed from begining to end in order
- Random access files: Structured data with fixed record lengths allowing direct access to specific records
By Storage Format
- Text files (ASCII): Human-readable character-based files
- Binary files: Machine-readable format not easily interpreted by humans
By Operating System Access
- Buffered file systems: Use memory buffers for improved performence
- Unbuffered file systems: Direct disk operations
Python offers both buffered standard I/O functions and unbuffered system I/O functions.
Common File Operations
File handling involves three primary steps:
- Opening files - Establishes connection betwean disk file and program object
- Processing files - Reading, writing, copying, deleting, positioning
- Closing files - Terminates connection and flushes buffer to disk
Creating and Opening Files
The open() function handles file creation and opening with specified modes:
# Basic syntax
file_object = open(filename, mode, buffering)
# Context manager approach
with open(filename, mode, buffering) as file_object:
# File operations here
pass
Parameters:
filename: Target file namemode: Access mode (default is 'r' for reading)buffering: Buffer size control (0=no buffering, 1=line buffering, -1=default)
Common file modes include read ('r'), write ('w'), append ('a'), and binary ('b') variants.
File Reading Methods
Three primary reading methods exist:
Line-by-Line Reading (readline())
with open('data.txt', 'r') as f:
line = f.readline()
while line:
print(line.strip())
line = f.readline()
Multiple Lines (readlines())
with open('data.txt', 'r') as f:
lines = f.readlines()
for line in lines:
print(line.strip())
Complete Content Reading (read())
with open('data.txt', 'r') as f:
content = f.read()
print(content)
For large files, specify byte count: f.read(1024) reads 1024 bytes.
Writing to Files
Writing operations use write() and writelines() methods:
with open('output.txt', 'w') as f:
f.write('Hello World')
# Writing multiple strings from a list
with open('output.txt', 'w') as f:
lines = ['Line 1\n', 'Line 2\n', 'Line 3\n']
f.writelines(lines)
To prepend content to existing files:
with open('existing.txt', 'r') as f:
content = f.read()
with open('existing.txt', 'w') as f:
f.write('New prefix\n' + content)
File Management Operations
File Deletion
Use the os module for file removal:
import os
if os.path.exists('unwanted_file.txt'):
os.remove('unwanted_file.txt')
File Copying
Basic copying uses read/write operations:
with open('source.txt', 'r') as src:
with open('destination.txt', 'w') as dest:
dest.write(src.read())
The shutil module provides dedicated functions:
import shutil
shutil.copyfile('source.txt', 'dest.txt')
shutil.move('old_path.txt', 'new_path.txt')
File Renaming
import os
os.rename('old_name.txt', 'new_name.txt')
For batch extension changes:
import os
for filename in os.listdir('.'):
if filename.endswith('.old_ext'):
new_name = filename.replace('.old_ext', '.new_ext')
os.rename(filename, new_name)
Content Search and Replace
String operations handle content modification:
with open('input.txt', 'r') as f:
content = f.read()
count = content.count('search_term')
modified = content.replace('old_text', 'new_text')
with open('output.txt', 'w') as f:
f.write(modified)
Binary File Processing
Using struct Module
The struct module handles binary data conversion:
import struct
# Packing data
with open('binary_data.bin', 'wb') as f:
packed_data = struct.pack('ifi', 123, 3.14, 456)
f.write(packed_data)
# Unpacking data
with open('binary_data.bin', 'rb') as f:
data = f.read()
unpacked = struct.unpack('ifi', data)
print(unpacked) # (123, 3.140000104904175, 456)
Using pickle Module
For complex Python objects:
import pickle
# Serializing objects
data = {'name': 'John', 'age': 30, 'scores': [85, 92, 78]}
with open('objects.pkl', 'wb') as f:
pickle.dump(data, f)
# Deserializing objects
with open('objects.pkl', 'rb') as f:
loaded_data = pickle.load(f)
print(loaded_data)
Advanced Persistence Modules
Shelve Module
Provides dictionary-like persistent storage:
import shelve
# Create and store data
with shelve.open('storage.db') as shelf:
shelf['key1'] = {'name': 'Alice', 'age': 25}
shelf['key2'] = [1, 2, 3, 4, 5]
# Retrieve data
with shelve.open('storage.db') as shelf:
print(shelf['key1'])
print(shelf['key2'])
CSV Module
Handles comma-separated values files:
import csv
# Reading CSV
with open('data.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
print(row)
# Writing CSV
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Name', 'Age', 'City'])
writer.writerow(['John', 30, 'NYC'])
OpenPyXL Module
For Excel file processing:
from openpyxl import Workbook, load_workbook
# Create new workbook
wb = Workbook()
ws = wb.active
ws['A1'] = 'Header'
ws['B1'] = 'Value'
wb.save('example.xlsx')
# Read existing workbook
wb = load_workbook('example.xlsx')
ws = wb.active
print(ws['A1'].value)
OS Module Operations
The os and os.path modules provide system-level file operations:
import os
# Directory listing
files = os.listdir('.')
# Path operations
abs_path = os.path.abspath('relative/path')
base_name = os.path.basename('/path/to/file.txt')
# Recursive directory traversal
for root, dirs, files in os.walk('/starting/directory'):
for file in files:
full_path = os.path.join(root, file)
print(full_path)
Relative paths enhance portability across different systems, while absolute paths provide precise location specification.