Common Data Persistence Methods in Python

Python offers a variety of mechanisms to persist data, each suited for different formats, use cases, and performance requirements. Below is a comprehensive overview of the most widely used approaches, with practical examples.

CSV Files

CSV is a lightweight, human-readable format ideal for tabular data. The standard library provides csv, while pandas simplifies operations on structured data.

import csv

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['product', 'price', 'category'])
    writer.writerow(['Laptop', 999, 'Electronics'])
    writer.writerow(['Book', 25, 'Education'])

Using pandas for direct DataFrame export:

import pandas as pd

df = pd.DataFrame({
    'product': ['Laptop', 'Book', 'Pen'],
    'price': [999, 25, 2],
    'category': ['Electronics', 'Education', 'Stationery']
})

df.to_csv('products.csv', index=False)

Excel Files

Excel (.xlsx) is preferred for reports and human inspection. Pandas supports writing to multiple sheets within a single workbook.

import pandas as pd

sales = pd.DataFrame({'month': ['Jan', 'Feb', 'Mar'], 'revenue': [12000, 15000, 18000]})
inventory = pd.DataFrame({'item': ['A', 'B', 'C'], 'stock': [45, 30, 12]})

with pd.ExcelWriter('report.xlsx', engine='xlsxwriter') as writer:
    sales.to_excel(writer, sheet_name='Sales', index=False)
    inventory.to_excel(writer, sheet_name='Inventory', index=False)

To append data to an existing file without overwriting:

import pandas as pd

# Append new rows to an existing sheet
new_data = pd.DataFrame({'month': ['Apr'], 'revenue': [21000]})
existing = pd.read_excel('report.xlsx', sheet_name='Sales')

combined = pd.concat([existing, new_data], ignore_index=True)

with pd.ExcelWriter('report.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
    combined.to_excel(writer, sheet_name='Sales', index=False)

Image Files

Images can be generated or saved using various libraries depending on the source data.

Using PIL (Pillow) for synthetic images:

from PIL import Image
import numpy as np

# Create a 100x100 red image
img_array = np.full((100, 100, 3), [255, 0, 0], dtype=np.uint8)
img = Image.fromarray(img_array)
img.save('red_image.jpg', 'JPEG')

With OpenCV (BGR format):

import cv2
import numpy as np

img = np.zeros((100, 100, 3), dtype=np.uint8)
img[:] = [0, 0, 255]  # BGR: red
cv2.imwrite('red_image_cv2.png', img)

Using matplotlib to save plots:

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randn(50, 50)
plt.imshow(data, cmap='viridis')
plt.axis('off')
plt.savefig('heatmap.png', dpi=150, bbox_inches='tight', pad_inches=0)

Text Files

Plain text is useful for logs or simple key-value storage.

lines = [
    "2024-05-01 INFO: System started",
    "2024-05-01 WARN: Low memory",
    "2024-05-01 INFO: Process completed"
]

with open('log.txt', 'w') as f:
    f.write('\n'.join(lines))

Save NumPy arrays as formatted text:

import numpy as np

matrix = np.array([[1.1, 2.2], [3.3, 4.4]])
np.savetxt('matrix.txt', matrix, fmt='%.2f', delimiter='\t')

JSON Files

JSON is ideal for structured, hierarchical data and API communication.

import json

config = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "timeout": 30
    },
    "features": ["auth", "logging", "backup"]
}

with open('config.json', 'w') as f:
    json.dump(config, f, indent=2)

Pandas can export DataFrames as JSON arrays:

import pandas as pd

records = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "score": [88.5, 92.0, 76.5]
})

records.to_json('records.json', orient='records', lines=True)

SQL Databases

SQLite is built into Python and suitable for lightwieght applications.

import sqlite3
import pandas as pd

data = pd.DataFrame({
    "user_id": [101, 102, 103],
    "email": ["a@example.com", "b@example.com", "c@example.com"],
    "joined": ["2024-01-15", "2024-02-20", "2024-03-10"]
})

conn = sqlite3.connect('users.db')
data.to_sql('users', conn, if_exists='replace', index=False)
conn.close()

For remote databases like MySQL, use SQLAlchemy with a connector:

from sqlalchemy import create_engine

engine = create_engine('mysql+pymysql://user:pass@localhost/mydb')
data.to_sql('users', engine, if_exists='replace', index=False)

Binary Files

For raw data or serialized objects, binary formats offer efficiency and compactness.

# Save raw bytes
raw_bytes = bytes([0x10, 0x20, 0x30, 0x40])
with open('data.bin', 'wb') as f:
    f.write(raw_bytes)

NumPy’s native format preserves array structure and dtype:

import numpy as np

arr = np.random.randint(0, 100, size=(1000,))
np.save('array_data.npy', arr)

# Load back
loaded = np.load('array_data.npy')

Pickle Files

Pickle serializes Python objects, including custom classes, making it powerful but not portable across languages.

import pickle

model = {
    'algorithm': 'RandomForest',
    'parameters': {'n_estimators': 100, 'max_depth': 5},
    'accuracy': 0.94
}

with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

with open('model.pkl', 'rb') as f:
    restored = pickle.load(f)

Parquet Files

Parquet is a columnar storage format optimized for analytics and large datasets, commonly used in data engineering.

import pandas as pd

large_dataset = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-01', periods=10000, freq='min'),
    'value': np.random.randn(10000),
    'category': np.random.choice(['A', 'B', 'C'], 10000)
})

large_dataset.to_parquet('analytics.parquet')

Tags: CSV Excel JSON sql Parquet

Posted on Wed, 20 May 2026 18:30:47 +0000 by vikramjeet.singla