8086 Assembly Implementation of File Encryption with Letter Frequency Analysis

This program implements a file encryption utility in 8086 assembly language, fulfilling the requirements of a computer architecture assignment. It reads a text file, converts all letters to uppercase, analyzes letter frequencies, derives an encryption key based on frequency ranking, applies a substitution cipher to both letters and digits, and writes the encrypted output to cipher.txt.

Design Overview

The solution is structured into modular procedures:

  • INPUT: Reads a filename from the user and loads the file content into memory.
  • CAPITALOUTPUT: Converts lowercase letters (a–z) to uppercase (A–Z) in-place and displays the result.
  • CNTOUTPUT: Counts occurrences of each uppercase letter using nested loops. The key is derived as the letter that has exactly two other letters with higher frequencies—effectively identifying the third most frequent letter without full sorting.
  • ENCRYPTION: Applies distinct transformations:
    • Digits (0–9) are substituted using a fixed lookup table (NUMTABLE).
    • Letters are shifted forward by the key value modulo 26 (Caesar cipher).
  • SAVE: Writes the encrypted buffer to cipher.txt.

Data Structures

Key data segments include:

  • ARTICLE: Buffer holding up to 768 bytes of file content.
  • CNTBUF: Stores counts for A–Z in packed BCD format (e.g., 'A', count_high, count_low).
  • NUMTABLE: Digit substitution mapping: "7591368024" (i.e., '0'→'7', '1'→'5', etc.).
  • KEY: Holds the derived shift value (0–25).

Algorithm Details

Letter Case Conversion

Each byte in ARTICLE is checked. If it falls in 'a'–'z', 32 is subtracted to convert to uppercase.

Frequency Counting

For each letter A–Z (outer loop), the entire buffer is scanned (inner loop). Matching characters increment the corresponding counter in CNTBUF. The DAA instruction maintains BCD format during increments.

Key Derivation

The key is the letter whose frequency rank is third highest. For each letter, the algorithm counts how many other letters have strictly greater frequencies. When this count equals 2, the current letter is selected as the key.

Encryption Logic

  • Digits: Subtract '0' to get index, then use NUMTABLE for substitution.
  • Letters: Subtract 'A', add the key, take modulo 26, then add 'A' back.

Code Highlights

; Key derivation snippet
LKEY:
    MOV AL, [DI]        ; Current letter's frequency
    MOV BH, 0           ; Counter for letters with higher freq
    MOV SI, OFFSET CNTBUF + 1
    MOV AH, 26          ; Inner loop counter

CMPKEY:
    CMP AL, [SI]        ; Compare with another letter's freq
    JB  JUDGE           ; If current < other, increment counter
    JMP NEXT_LETTER
JUDGE:
    INC BH
NEXT_LETTER:
    ADD SI, 3           ; Move to next letter's count
    DEC AH
    JNZ CMPKEY
    CMP BH, 2           ; Check if exactly two letters are more frequent
    JE  KEYGET          ; If yes, use this letter as key

Execution Flow

  1. Prompt user for input filename.
  2. Load file contents into ARTICLE.
  3. Convert to uppercase and display.
  4. Count letter frequencies and compute key.
  5. Encrypt buffer contents.
  6. Write encrypted data to cipher.txt.

Sample Output

Given an input file containing:

AlexNet is a convolutional neural network trained on over a million images...

The program outputs the uppercase version, frequency table (e.g., A53B21...), encrypted text (e.g., NYRKARG VF N PBAIBYHGVBANY...), and saves the ciphertext to disk.

Tags: 8086-assembly x86-assembly file-encryption frequency-analysis caesar-cipher

Posted on Sat, 09 May 2026 03:17:51 +0000 by jd57