awk Command Cheat Sheet

awk is a powerful programming language and command-line tool for pattern scanning and text processing. Named after its creators (Aho, Weinberger, Kernighan), awk excels at processing structured text data like logs, CSV files, and tabular output.

Synopsis

awk 'pattern { action }' file
awk -F: '{ print $1 }' file
awk -f script.awk file

Description

AWK processes text line-by-line, splitting each line into fields. It matches patterns and executes actions on matching lines. It's particularly powerful for extracting columns from structured data, performing calculations, and generating reports.

Basic Syntax

awk 'pattern { action }' input_file

Pattern: Condition to match (optional)
Action: What to do with matching lines (optional)
If pattern omitted: action applies to all lines
If action omitted: print matching lines

Print Commands

Print Entire Lines

awk '{ print }' file.txt
awk '1' file.txt  # Shorthand

Print Specific Fields

awk '{ print $1 }' file.txt        # First field
awk '{ print $1, $3 }' file.txt    # First and third
awk '{ print $1, $2, $3 }' file.txt

Print Last Field

awk '{ print $NF }' file.txt

NF = number of fields, so $NF = last field.

Print All But First Field

awk '{ $1=""; print }' file.txt
awk '{ for(i=2;i<=NF;i++) printf "%s ", $i; print "" }' file.txt

Built-in Variables

Variable	Description
`$0`	Entire line
`$1, $2, ...`	First, second field, etc.
`$NF`	Last field
`NF`	Number of fields in current record
`NR`	Current record (line) number
`FNR`	Record number in current file
`FS`	Input field separator (default: whitespace)
`OFS`	Output field separator (default: space)
`RS`	Input record separator (default: newline)
`ORS`	Output record separator (default: newline)
`FILENAME`	Name of current input file
`ARGC`	Number of command-line arguments
`ARGV`	Array of command-line arguments

Field Separators

Default (Whitespace)

awk '{ print $1 }' file.txt

Splits on spaces and tabs.

Custom Separator

# Using colon
awk -F: '{ print $1 }' /etc/passwd

# Using comma (CSV)
awk -F, '{ print $2 }' data.csv

# Using pipe
awk -F'|' '{ print $1, $3 }' data.txt

Multiple Separators

# Split on colon OR comma
awk -F'[;:]' '{ print $1 }' file.txt

# Split on multiple spaces/tabs
awk -F'[ \t]+' '{ print $1 }' file.txt

Change Output Separator

# Input: space-separated, Output: comma-separated
awk 'BEGIN { OFS="," } { print $1, $2, $3 }' file.txt

Pattern Matching

Regular Expression

# Lines containing "error"
awk '/error/' log.txt

# Lines starting with "Error:"
awk '/^Error:/' log.txt

# Lines ending with "fail"
awk '/fail$/' log.txt

# Case-insensitive
awk 'tolower($0) ~ /error/' log.txt

Field Matching

# Third field equals "active"
awk '$3 == "active"' file.txt

# First field matches regex
awk '$1 ~ /^user/' file.txt

# Second field doesn't match
awk '$2 !~ /test/' file.txt

Numeric Comparisons

# Third field greater than 100
awk '$3 > 100' data.txt

# Field between values
awk '$2 >= 50 && $2 <= 100' data.txt

# Any field equals value
awk '$1 == "admin" || $2 == "root"' file.txt

Line Number Conditions

# First line only
awk 'NR==1' file.txt

# Lines 5-10
awk 'NR>=5 && NR<=10' file.txt

# Every 5th line
awk 'NR%5==0' file.txt

# Last line
awk 'END{print}' file.txt

BEGIN and END

BEGIN Block

Executes before processing any lines:

awk 'BEGIN { print "Starting..." } { print } END { print "Done" }' file.txt

Common BEGIN Uses

# Set field separator
awk 'BEGIN { FS=":" } { print $1 }' /etc/passwd

# Print header
awk 'BEGIN { print "Name\tAge\tCity" } { print }' data.txt

# Initialize variables
awk 'BEGIN { total=0 } { total+=$1 } END { print total }' numbers.txt

Arithmetic Operations

Basic Math

# Sum first column
awk '{ total += $1 } END { print total }' numbers.txt

# Average
awk '{ total += $1; count++ } END { print total/count }' numbers.txt

# Multiply fields
awk '{ print $1 * $2 }' data.txt

Calculations

# Percentage
awk '{ print $1, ($1/total)*100 "%" }' data.txt

# Round numbers
awk '{ printf "%.2f\n", $1 }' numbers.txt

# Min/Max
awk 'NR==1{max=$1} $1>max{max=$1} END{print max}' numbers.txt

String Operations

Concatenation

# Combine fields
awk '{ print $1 $2 }' file.txt       # No separator
awk '{ print $1 "_" $2 }' file.txt   # With underscore

String Functions

# Length
awk '{ print length($1) }' file.txt

# Substring
awk '{ print substr($1, 1, 3) }' file.txt  # First 3 chars

# Index (position)
awk '{ print index($1, "x") }' file.txt

# Replace
awk '{ gsub(/old/, "new"); print }' file.txt

# Split string
awk '{ split($1, arr, ":"); print arr[1] }' file.txt

Case Conversion

# Uppercase
awk '{ print toupper($0) }' file.txt

# Lowercase
awk '{ print tolower($0) }' file.txt

Conditional Statements

If-Else

awk '{ if ($3 > 100) print $1, "high"; else print $1, "low" }' data.txt

Ternary Operator

awk '{ print ($3 > 100) ? "high" : "low" }' data.txt

Multiple Conditions

awk '{
    if ($3 > 100)
        print $1, "high"
    else if ($3 > 50)
        print $1, "medium"
    else
        print $1, "low"
}' data.txt

Loops

For Loop

# Print all fields
awk '{ for (i=1; i<=NF; i++) print $i }' file.txt

# Sum all fields
awk '{ total=0; for (i=1; i<=NF; i++) total+=$i; print total }' file.txt

While Loop

awk '{ i=1; while (i<=NF) { print $i; i++ } }' file.txt

Arrays

Associative Arrays

# Count occurrences
awk '{ count[$1]++ } END { for (word in count) print word, count[word] }' file.txt

# Sum by key
awk '{ sum[$1] += $2 } END { for (key in sum) print key, sum[key] }' data.txt

Array Examples

# Track unique values
awk '{ seen[$1]=1 } END { for (val in seen) print val }' file.txt

# First occurrence
awk '!seen[$1]++' file.txt

Practical Examples

Process CSV

# Print specific columns from CSV
awk -F, '{ print $1, $3 }' data.csv

# CSV to TSV
awk -F, 'BEGIN { OFS="\t" } { print $1, $2, $3 }' data.csv

# Filter rows
awk -F, '$3 > 1000' data.csv

Log Analysis

# Count log levels
awk '{ count[$3]++ } END { for (level in count) print level, count[level] }' app.log

# Extract errors
awk '/ERROR/' app.log

# Show last 10 errors
awk '/ERROR/ { errors[NR]=$0 } END { for (i=NR-9; i<=NR; i++) print errors[i] }' app.log

System Monitoring

# Memory usage
free -m | awk 'NR==2 { printf "Memory: %.2f%%\n", $3/$2*100 }'

# Disk usage
df -h | awk '$5 > 80 { print $0 }'

# CPU usage
ps aux | awk '$3 > 50 { print $1, $3, $11 }'

# Network connections
netstat -an | awk '/ESTABLISHED/ { count++ } END { print count }'

File Processing

# Remove duplicates (keeping first)
awk '!seen[$0]++' file.txt

# Remove blank lines
awk 'NF' file.txt

# Number lines
awk '{ print NR, $0 }' file.txt

# Print lines longer than 80 characters
awk 'length > 80' file.txt

Data Transformation

# Swap columns
awk '{ print $2, $1 }' file.txt

# Add column
awk '{ print $0, "new_value" }' file.txt

# Column math
awk '{ $4 = $2 * $3; print }' data.txt

Advanced Techniques

Multi-File Processing

# Process two files
awk 'NR==FNR { arr[$1]=$2; next } { print $1, arr[$1] }' file1.txt file2.txt

Custom Functions

awk '
function square(x) {
    return x * x
}
{ print square($1) }
' numbers.txt

Format Output

# Printf formatting
awk '{ printf "%-10s %5d %8.2f\n", $1, $2, $3 }' data.txt

# Aligned columns
awk '{ printf "|%-20s|%10s|\n", $1, $2 }' data.txt

Multiline Records

# Records separated by blank lines
awk 'BEGIN { RS="" } { print NR, $0 }' file.txt

Common One-Liners

Statistics

# Sum column
awk '{ sum += $1 } END { print sum }' numbers.txt

# Average
awk '{ sum += $1; n++ } END { print sum/n }' numbers.txt

# Min
awk 'NR==1 { min=$1 } $1<min { min=$1 } END { print min }' numbers.txt

# Max
awk 'NR==1 { max=$1 } $1>max { max=$1 } END { print max }' numbers.txt

# Count lines
awk 'END { print NR }' file.txt

Text Manipulation

# Print specific lines
awk 'NR==5' file.txt                    # Line 5
awk 'NR>=10 && NR<=20' file.txt         # Lines 10-20

# Skip header
awk 'NR>1' file.txt

# Print last field
awk '{ print $NF }' file.txt

# Print second-to-last field
awk '{ print $(NF-1) }' file.txt

Filtering

# Lines with more than 5 fields
awk 'NF > 5' file.txt

# Lines where field 3 is numeric
awk '$3 ~ /^[0-9]+$/' file.txt

# Unique lines (like uniq)
awk '!seen[$0]++' file.txt

# Remove comments and blank lines
awk '!/^#/ && NF' file.txt

Using Variables

Pass Shell Variables

# Using -v
threshold=100
awk -v limit=$threshold '$3 > limit' data.txt

# Multiple variables
awk -v a=10 -v b=20 '{ print a, b, $1 }' file.txt

Environment Variables

export THRESHOLD=100
awk '$3 > ENVIRON["THRESHOLD"]' data.txt

Script Files

Create AWK Script

# script.awk
BEGIN {
    FS = ":"
    print "Username Report"
    print "=" "==============="
}

{
    print "User:", $1
    print "Shell:", $NF
    print ""
}

END {
    print "Total users:", NR
}

Run Script

awk -f script.awk /etc/passwd

Real-World Examples

Generate Report

#!/bin/bash
awk 'BEGIN {
    print "Sales Report"
    print "============"
    total = 0
}
{
    print $1 ": $" $2
    total += $2
}
END {
    print "------------"
    print "Total: $" total
}' sales.txt

Process Access Log

# Count requests by IP
awk '{ ip[$1]++ } END { for (i in ip) print ip[i], i }' access.log | sort -rn

# Count by status code
awk '{ status[$9]++ } END { for (s in status) print s, status[s] }' access.log

# Top 10 URLs
awk '{ urls[$7]++ } END { for (u in urls) print urls[u], u }' access.log | \
    sort -rn | head -10

Data Aggregation

# Sum sales by category
awk -F, '{
    sales[$2] += $3
}
END {
    for (cat in sales)
        printf "%-15s $%.2f\n", cat, sales[cat]
}' sales.csv

Tips and Best Practices

Quote AWK Programs - Use single quotes to prevent shell interpretation
Use -F for Delimiters - Clearer than setting FS in BEGIN
Test Patterns First - Test pattern matching before adding actions
Use printf for Formatting - Better control than print
Initialize Variables - Set counters to 0 in BEGIN
Comment Complex Scripts - AWK supports # comments
Use Functions - Break complex logic into functions
Avoid Regex in Loops - Pre-compile if possible
Use NF for Non-Empty Lines - awk 'NF' removes blank lines
Always Close Braces - Easy to miss in multiline programs

Performance Tips

Avoid External Commands - Use AWK built-ins when possible
Use Associative Arrays - Very efficient for counting/grouping
Minimize Pattern Complexity - Simple patterns are faster
Use next - Skip remaining processing early
Compile Regex Once - Put in variable if used multiple times

Common Patterns

Remove Duplicates

awk '!seen[$0]++' file.txt

Print Duplicates

awk 'seen[$0]++' file.txt

Column Sum

awk '{ sum += $1 } END { print sum }' file.txt

Find Max Value

awk '{ if ($1 > max) max = $1 } END { print max }' file.txt

Count Occurrences

awk '{ count[$1]++ } END { for (i in count) print i, count[i] }' file.txt

Debugging

Print Debug Info

awk '{ print "NF=" NF " NR=" NR " $0=" $0 }' file.txt

Trace Execution

# Print each line before processing
awk '{ print "Processing:", $0; # your code }' file.txt

Exit Status

Code	Meaning
0	Success
1	Error in AWK program
2	Usage error (invalid option)

Quick Reference

Task	Command
Print column 1	`awk '{ print $1 }' file`
Print last column	`awk '{ print $NF }' file`
Print lines 5-10	`awk 'NR>=5 && NR<=10' file`
Sum column	`awk '{ sum+=$1 } END { print sum }' file`
Count lines	`awk 'END { print NR }' file`
Remove duplicates	`awk '!seen[$0]++' file`
CSV to TSV	`awk -F, 'BEGIN{OFS="\t"} {print}' file`
Filter by value	`awk '$3 > 100' file`
Pattern match	`awk '/error/' file`
Multiple files	`awk '{print FILENAME, $0}' *.txt`