Methods for Extracting Various Archive Formats in Python

Introduction

This article discusses using Python to extract five common archive formats:

  • .gz
  • .tar
  • .tgz
  • .zip
  • .rar

Format Overview

gz (gzip): Typically compresses a single file. Often used with tar to first bundle files, then compress.

tar: A bunlding tool in Linux systems that packages files without compression.

tgz (tar.gz): Created by first bundling with tar, then compressing with gzip.

zip: Unlike gzip, it can bundle and compress multiple files, but compresses each file individual. Compression ratio is general lower than tar.

rar: A bundling and compression format originally for DOS, now primarily used on Windows. Offers higher compression than zip but slower processing and random access speeds.

Extracting gz Files

Since gz typically contains a single file, extraction involves reading that file:

import gzip
import os

def extract_gz(archive_path):
    """Extract a gzip-compressed file"""
    output_name = archive_path.replace(".gz", "")
    
    with gzip.open(archive_path, 'rb') as compressed_file:
        file_content = compressed_file.read()
    
    with open(output_name, 'wb') as output_file:
        output_file.write(file_content)

Extracting tar Files

After extracting a .tar.gz file to get a .tar archive, further extraction is needed. Note: .tgz is equivalent to .tar.gz.

import tarfile
import os

def extract_tar(archive_path):
    """Extract a tar archive"""
    output_dir = archive_path + "_extracted"
    
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    with tarfile.open(archive_path, 'r') as archive:
        archive.extractall(path=output_dir)

Extracting zip Files

Similar to tar extraction:

import zipfile
import os

def extract_zip(archive_path):
    """Extract a zip archive"""
    output_dir = archive_path + "_extracted"
    
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    with zipfile.ZipFile(archive_path, 'r') as archive:
        archive.extractall(path=output_dir)

Extracting rar Files

RAR extraction requires the rarfile package. Install it via pip:

pip install rarfile
import rarfile
import os

def extract_rar(archive_path):
    """Extract a rar archive"""
    output_dir = archive_path + "_extracted"
    
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    with rarfile.RarFile(archive_path) as archive:
        archive.extractall(path=output_dir)

Creating tar Archives

When adding files to a tar archive, the arcname parameter allows custom naming instead of preserving full paths:

import tarfile
import os
import time

def create_tar_archive(source_dir, output_path):
    """Create a tar archive from a directory"""
    start_time = time.time()
    
    with tarfile.open(output_path, 'w') as archive:
        for root, dirs, files in os.walk(source_dir):
            for file in files:
                full_path = os.path.join(root, file)
                archive.add(full_path, arcname=file)
    
    elapsed = time.time() - start_time
    print(f"Archive created in {elapsed:.2f} seconds")

To create a compressed tar archive (e.g., gzip):

with tarfile.open('/path/to/archive.tar.gz', 'w:gz') as archive:
    # Add files

Common tarfile modes:

Mode Description
'r' or 'r:*' Read with transparent compression (recommended)
'r:' Read without compression
'r:gz' Read with gzip compression
'r:bz2' Read with bzip2 compression
'a' or 'a:' Append without compression (creates if missing)
'w' or 'w:' Write without compression
'w:gz' Write with gzip compression
'w:bz2' Write with bzip2 compression

Extracting tar Archives with Different Compression

import tarfile
import time

def extract_tar_archive(archive_path, extract_dir):
    """Extract a tar archive to specified directory"""
    start_time = time.time()
    
    with tarfile.open(archive_path, 'r:') as archive:
        archive.extractall(path=extract_dir)
    
    elapsed = time.time() - start_time
    print(f"Extraction completed in {elapsed:.2f} seconds")

For processing files individually (useful for large archives):

with tarfile.open('archive.tar.gz', 'r:gz') as archive:
    for member in archive:
        file_obj = archive.extractfile(member)
        # Process file_obj as needed

Note: When processing archives with many files individually, be mindful of memory usage.

Tags: python file compression archive extraction gzip tar

Posted on Tue, 19 May 2026 00:30:24 +0000 by Gayner