Understanding File Operations
File operations in Python rely on the operating system's functionality. Modern operating systems restrict direct disk access to regular programs, meaning any file operation requires requesting the OS to open a file descriptor rather than directly manipulating disk storage.
The open() function creates a file object that provides methods for reading from or writing to the file.
open(filename, mode, buffering)
Parameters:
filename: String containing the file pathmode: Specifies the file opening mode (readd, write, append, etc.). Defaults to read-only ('r')buffering: Controls file buffering behavior. Setting to 0 disables buffering, 1 enables line buffering, and values greater than 1 set a specific buffer size. Negative values use system default.
File Mode Reference
| Mode | Description |
|---|---|
| r | Read-only, pointer at beginning |
| rb | Binary read-only, pointer at beginning |
| r+ | Read and write, pointer at beginning |
| rb+ | Binary read/write, pointer at beginning |
| w | Write-only, creates new file or truncates existing |
| wb | Binary write-only, creates new file or truncates existing |
| w+ | Read and write, creates new file or truncates existing |
| wb+ | Binary read/write, creates new file or truncates existing |
| a | Append mode, pointer at end |
| ab | Binary append mode, pointer at end |
| a+ | Read and append, pointer at end |
| ab+ | Binary read and append, pointer at end |
File Object Methods
file.read(size): Reads entire file or specified number of bytes. Returns empty string at EOFfile.readline(): Returns a single line from the filefile.readlines(size): Returns list of lines, or lines up to specified sizefile.write(string): Writes string to file (convert non-string data first)file.tell(): Returns current cursor position as integerfile.seek(offset, reference): Moves cursor. Reference: 0=beginning, 1=current, 2=endfile.close(): Closes the file and releases system resources
Reading Files
# Opening a file for reading
handle = open('/home/user/document.txt', 'r')
If the file does not exist, an IOError is raised:
>>> handle = open('/home/user/missing.txt', 'r')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/missing.txt'
Once opened, use the read() method to retrieve all content:
>>> content = handle.read()
'Sample file content'
Always close files after use to free system resources:
>>> handle.close()
Using try...finally ensures the file closes even if errors occur:
try:
handle = open('/path/to/file', 'r')
data = handle.read()
print(data)
finally:
if handle:
handle.close()
Python's with statement handles this automatically:
with open('/path/to/file', 'r') as handle:
data = handle.read()
print(data)
This aproach automatically calls close() when the block exits, regardless of how it exits.
Reading Strategies
Choose the appropriate method based on file size and requirements:
# For small files - read everything at once
content = handle.read()
# For large files - read in chunks
while True:
chunk = handle.read(8192)
if not chunk:
break
process(chunk)
# Read line by line
for line in handle:
print(line.strip())
# Read all lines into a list
lines = handle.readlines()
for line in lines:
print(line.strip())
Method comparison:
read(): Best for small files, loads entire content into memoryreadlines(): Returns list of all lines, convanient for iterationreadline(): Memory-efficient for very large files, reads one line at a time
Counting File Lines
# Simple approach - loads entire file into memory
total_lines = len(open(filepath, 'r').readlines())
# Memory-efficient approach using enumeration
total_lines = 0
for index, line in enumerate(open(filepath, 'r')):
total_lines += 1
Writing Files
Write operations use 'w' for text or 'wb' for binary mode:
handle = open('/Users/user/output.txt', 'w')
handle.write('Output text here')
handle.close()
The operating system may cache write data in memory before flushing to disk. Always call close() to ensure all data is written:
with open('/Users/user/output.txt', 'w') as handle:
handle.write('Output text here')
Handling Encodings
Specify encoding when working with non-UTF-8 files:
# Reading GBK encoded file
handle = open('/Users/user/gbk_file.txt', 'r', encoding='gbk')
content = handle.read()
handle.close()
Handle encoding errors gracefully:
handle = open('/Users/user/gbk_file.txt', 'r', encoding='gbk', errors='ignore')
Reading Binary Data
Open files in binary mode ('rb') for images, videos, or other non-text content:
handle = open('/Users/user/image.jpg', 'rb')
binary_data = handle.read()
handle.close()
Handling Multiple Files
Process multiple files sequentially or nested:
# Sequential approach
with open('data/input.txt', 'r') as input_file:
with open('data/output.txt', 'w') as output_file:
output_file.write(input_file.read())
# Independent files
with open('data/file1.txt', 'r') as f1:
content1 = f1.read()
with open('data/file2.txt', 'r') as f2:
content2 = f2.read()
Best Practice: Always use the with statement for file operations to ensure proper resource management and avoid data loss from unclosed files.