awk Overview and Common Methods

Introduction to awk

awk is a text processing tool commonly used for data manipulation and generating reports.

The name awk is derived from the initials of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan.

awk Working Mode

The following diagram illustrates the basic working flow of awk:

awk Working Mode

Syntax Format

There are two common forms of awk commands:

  1. Based on file input

    awk 'BEGIN{pattern}{commands}END{}' file_name
    
  2. Based on standard input via pipe

    standard_command | awk 'BEGIN{pattern}{commands}END{}'
    

The following diagram explains the syntax components:

awk Syntax Diagram

awk Built-in Variables

Here is a reference table of awk's built-in variables:

Built-in Variables Table

Built-in Variables Table (continued)

Below are the most commonly used built-in variables:

$0          : The entire current record (line).
$1 ... $n   : The first to nth field of the current record.
NF          : Number of fields in the current record.
NR          : Number of records (lines) processed so far (cumulative).
FNR         : Number of records processed in the current file (resets per file).
FS          : Input field separator (default: whitespace/tab).
RS          : Input record separator (default: newline).
OFS         : Output field separator (default: space).
ORS         : Output record separator (default: newline).
FILENAME    : Name of the current input file.
ARGC        : Number of command-line arguments.
ARGV        : Array of command-line arguments.

Examples of Working with Built-in Variables

Print the entire line

awk '{print $0}' passwd

print $0

Specify : as field separator and print the first field

awk 'BEGIN{FS=":"}{print $1}' passwd

print $1 with FS=":"

Default field separator (space/tab) — assuming a file list with content:

Hadoop Spark Flume
Java Python Scala
Allen Mike Meggie

Print the first field using space separator

awk 'BEGIN{FS=" "}{print $1}' list

print $1 with space

Print the number of fields in each line

awk '{print NF}' list

print NF

Print cumulative record number (NR) when processing multiple files

awk '{print NR}' list passwd /etc/fstab

print NR

Print file-specific record number (FNR) when processing mutliple files

awk '{print FNR}' list /etc/fstab

print FNR


More Advanced Examples with Custom Separators

Assume file list has the following content:

Hadoop|Spark:Flume
Java|Python:Scala:Golang
Allen|Mike:Meggie

Use | as field separator and print the second field

awk 'BEGIN{FS="|"}{print $2}' list

print $2 with |

Use : as field separator and print the second field

awk 'BEGIN{FS=":"}{print $2}' list

print $2 with :


Record Separator (RS) and Output Separators

Assume file list has content:

Hadoop|Spark|Flume--Java|Python|Scala|Golang--Allen|Mike|Meggie

Specify -- as the record separator and print whole records

awk 'BEGIN{RS="--"}{print $0}' list

RS --

Combine RS and FS to get structured output

awk 'BEGIN{RS="--";FS="|"}{print $3}' list

RS+FS

Use ORS to separate output records with &

awk 'BEGIN{RS="--";FS="|";ORS="&"}{print $3}' list

ORS &

Print multiple fields with default output separator (space)

awk 'BEGIN{RS="--";FS="|";ORS="&"}{print $1,$3}' list

multiple fields with default OFS

Use OFS to change output field sepaartor to :

awk 'BEGIN{RS="--";FS="|";ORS="&";OFS=":"}{print $1,$3}' list

OFS :


Printing File Name (FILENAME)

awk '{print FILENAME}' list

FILENAME

If the file has multiple lines, FILENAME is printed for each line because awk processes line by line. For instance, with a file list containing:

Hadoop|Spark|Flume--Java|Python|Scala|Golang--Allen|Mike|Meggie
Test File
Line

The output will show the filename three times:

FILENAME multiple lines


Command-Line Argument Count (ARGC)

awk '{print ARGC}' list

This will print 2 (one for awk and one for list). If you run:

awk '{print ARGC}' list /etc/fstab

The output will be 3 (three arguments).

ARGC

ARGC 3


Using NF to Access the Last Field

NF gives the total number of fields. Therefore $NF always refers to the last field.

awk 'BEGIN{FS=":"}{print $NF}' passwd

$NF


Formatted Output with printf

Format Specifiers

Format Specifier Description
%s String
%d Decimal integer
%f Floating-point number
%e Scientific notation (lowercase)
%E Scientific notation (uppercase)
%x Hexadecimal (lowercase)
%X Hexadecimal (uppercase)
%o Octal
%% Print a literal %

Format Specifiers Table

Modifiers

Modifier Meaning
- Left-justify within the field width
+ Always print sign for numeric values
0 Pad with zeros instead of spaces
width Minimum field width
.prec Number of decimal places (for %f)

Format Modifiers Table

Examples of printf

printf without newline (default behavior)

awk 'BEGIN{FS=":"}{printf $1}' passwd

printf without newline

Add newline with %s\n

awk 'BEGIN{FS=":"}{printf "%s\n",$1}' passwd

printf with newline

Use placeholders for aligned output (right-aligned by default)

awk 'BEGIN{FS=":"}{printf "%20s %20s\n",$1,$7}' /etc/passwd

printf right aligned

Left-align with -

awk 'BEGIN{FS=":"}{printf "%-20s %-20s\n",$1,$7}' /etc/passwd

printf left aligned

Print strings

awk 'BEGIN{FS=":"}{printf "%s\n",$7}' passwd

printf %s

Print decimal integers

awk 'BEGIN{FS=":"}{printf "%d\n",$3}' passwd

printf %d

Print floating-point with 2 decimal places

awk 'BEGIN{FS=":"}{printf "%0.2f\n",$3}' passwd

printf %0.2f

Print hexadecimal

awk 'BEGIN{FS=":"}{printf "%x\n",$3}' passwd

printf %x

Print octal

awk 'BEGIN{FS=":"}{printf "%o\n",$3}' passwd

printf %o

Print scientific notation

awk 'BEGIN{FS=":"}{printf "%e\n",$3}' passwd

printf %e


Pattern Matching in awk

There are two main ways to perform pattern matching:

  1. Regular Expression Matching
  2. Operator Matching

Reference Table for Pattern Matching

Pattern Matching Table

Pattern Matching Operators


1. Regular Expression Matching

Find lines containing the string "root"

awk 'BEGIN{FS=":"}/root/{print $0}' passwd

regex root

Find lines starting with "nginx"

awk '/^nginx/{print $0}' passwd

regex ^nginx


2. Operator Matching

Available comparison operators:

  • < less than
  • > greater than
  • <= less than or equal
  • >= greater than or equal
  • == equal
  • != not equal
  • ~ matches regular expression
  • !~ does not match regular expression

Lines where the third field is less than 50

awk 'BEGIN{FS=":"}$3<50{print $0}' passwd

operator < 50

Lines where the third field is greater than 50

awk 'BEGIN{FS=":"}$3>50{print $0}' passwd

operator > 50

Lines where the seventh field equals /bin/bash

awk 'BEGIN{FS=":"}$7=="/bin/bash"{print $0}' passwd

operator == /bin/bash

Lines where the seventh field is NOT /bin/bash

awk 'BEGIN{FS=":"}$7!="/bin/bash"{print $0}' passwd

operator != /bin/bash

Lines where the third field contains three or more digits

awk 'BEGIN{FS=":"}$3 ~ /[0-9]{3,}/{print $0}' passwd

operator ~ regex


Boolean Operators in Patterns

  • || logical OR
  • && logical AND
  • ! logical NOT

Lines where the first field is "ftp" OR "mail"

awk 'BEGIN{FS=":"}$1=="ftp" || $1=="mail"{print $0}' passwd

operator ||

Lines where third field < 50 AND fourth field > 50

awk 'BEGIN{FS=":"}$3<50 && $4>50{print $0}' passwd

operator &&

Lines starting with "nginx" (using regex)

awk 'BEGIN{FS=":"}/^nginx/{print $0}' passwd

regex ^nginx again

Lines where UID equals 1

awk 'BEGIN{FS=":"}$3==1{print $0}' passwd

Lines where UID (third field) consists of 3 or more digits

awk 'BEGIN{FS=":"}$3~/[0-9]{3,}/{print $0}' passwd

regex for 3+ digits

Lines that do NOT contain /sbin/nologin

awk 'BEGIN{FS=":"}$0!~\/sbin\/nologin/{print $0}' passwd

not matching /sbin/nologin

Lines where UID < 50 AND shell contains /bin/bash

awk 'BEGIN{FS=":"}$3<50 && $7~/\/bin\/bash/ {print $0}' passwd

combined conditions

Tags: awk Text processing Linux programming

Posted on Wed, 13 May 2026 12:21:36 +0000 by jdashca