Skip to content

awk Command Cheat Sheet

awk is a powerful programming language and command-line tool for pattern scanning and text processing. Named after its creators (Aho, Weinberger, Kernighan), awk excels at processing structured text data like logs, CSV files, and tabular output.


Synopsis

awk 'pattern { action }' file
awk -F: '{ print $1 }' file
awk -f script.awk file

Description

AWK processes text line-by-line, splitting each line into fields. It matches patterns and executes actions on matching lines. It's particularly powerful for extracting columns from structured data, performing calculations, and generating reports.


Basic Syntax

awk 'pattern { action }' input_file
  • Pattern: Condition to match (optional)
  • Action: What to do with matching lines (optional)
  • If pattern omitted: action applies to all lines
  • If action omitted: print matching lines

awk '{ print }' file.txt
awk '1' file.txt  # Shorthand
awk '{ print $1 }' file.txt        # First field
awk '{ print $1, $3 }' file.txt    # First and third
awk '{ print $1, $2, $3 }' file.txt
awk '{ print $NF }' file.txt

NF = number of fields, so $NF = last field.

awk '{ $1=""; print }' file.txt
awk '{ for(i=2;i<=NF;i++) printf "%s ", $i; print "" }' file.txt

Built-in Variables

Variable Description
$0 Entire line
$1, $2, ... First, second field, etc.
$NF Last field
NF Number of fields in current record
NR Current record (line) number
FNR Record number in current file
FS Input field separator (default: whitespace)
OFS Output field separator (default: space)
RS Input record separator (default: newline)
ORS Output record separator (default: newline)
FILENAME Name of current input file
ARGC Number of command-line arguments
ARGV Array of command-line arguments

Field Separators

Default (Whitespace)

awk '{ print $1 }' file.txt

Splits on spaces and tabs.

Custom Separator

# Using colon
awk -F: '{ print $1 }' /etc/passwd

# Using comma (CSV)
awk -F, '{ print $2 }' data.csv

# Using pipe
awk -F'|' '{ print $1, $3 }' data.txt

Multiple Separators

# Split on colon OR comma
awk -F'[;:]' '{ print $1 }' file.txt

# Split on multiple spaces/tabs
awk -F'[ \t]+' '{ print $1 }' file.txt

Change Output Separator

# Input: space-separated, Output: comma-separated
awk 'BEGIN { OFS="," } { print $1, $2, $3 }' file.txt

Pattern Matching

Regular Expression

# Lines containing "error"
awk '/error/' log.txt

# Lines starting with "Error:"
awk '/^Error:/' log.txt

# Lines ending with "fail"
awk '/fail$/' log.txt

# Case-insensitive
awk 'tolower($0) ~ /error/' log.txt

Field Matching

# Third field equals "active"
awk '$3 == "active"' file.txt

# First field matches regex
awk '$1 ~ /^user/' file.txt

# Second field doesn't match
awk '$2 !~ /test/' file.txt

Numeric Comparisons

# Third field greater than 100
awk '$3 > 100' data.txt

# Field between values
awk '$2 >= 50 && $2 <= 100' data.txt

# Any field equals value
awk '$1 == "admin" || $2 == "root"' file.txt

Line Number Conditions

# First line only
awk 'NR==1' file.txt

# Lines 5-10
awk 'NR>=5 && NR<=10' file.txt

# Every 5th line
awk 'NR%5==0' file.txt

# Last line
awk 'END{print}' file.txt

BEGIN and END

BEGIN Block

Executes before processing any lines:

awk 'BEGIN { print "Starting..." } { print } END { print "Done" }' file.txt

Common BEGIN Uses

# Set field separator
awk 'BEGIN { FS=":" } { print $1 }' /etc/passwd

# Print header
awk 'BEGIN { print "Name\tAge\tCity" } { print }' data.txt

# Initialize variables
awk 'BEGIN { total=0 } { total+=$1 } END { print total }' numbers.txt

Arithmetic Operations

Basic Math

# Sum first column
awk '{ total += $1 } END { print total }' numbers.txt

# Average
awk '{ total += $1; count++ } END { print total/count }' numbers.txt

# Multiply fields
awk '{ print $1 * $2 }' data.txt

Calculations

# Percentage
awk '{ print $1, ($1/total)*100 "%" }' data.txt

# Round numbers
awk '{ printf "%.2f\n", $1 }' numbers.txt

# Min/Max
awk 'NR==1{max=$1} $1>max{max=$1} END{print max}' numbers.txt

String Operations

Concatenation

# Combine fields
awk '{ print $1 $2 }' file.txt       # No separator
awk '{ print $1 "_" $2 }' file.txt   # With underscore

String Functions

# Length
awk '{ print length($1) }' file.txt

# Substring
awk '{ print substr($1, 1, 3) }' file.txt  # First 3 chars

# Index (position)
awk '{ print index($1, "x") }' file.txt

# Replace
awk '{ gsub(/old/, "new"); print }' file.txt

# Split string
awk '{ split($1, arr, ":"); print arr[1] }' file.txt

Case Conversion

# Uppercase
awk '{ print toupper($0) }' file.txt

# Lowercase
awk '{ print tolower($0) }' file.txt

Conditional Statements

If-Else

awk '{ if ($3 > 100) print $1, "high"; else print $1, "low" }' data.txt

Ternary Operator

awk '{ print ($3 > 100) ? "high" : "low" }' data.txt

Multiple Conditions

awk '{
    if ($3 > 100)
        print $1, "high"
    else if ($3 > 50)
        print $1, "medium"
    else
        print $1, "low"
}' data.txt

Loops

For Loop

# Print all fields
awk '{ for (i=1; i<=NF; i++) print $i }' file.txt

# Sum all fields
awk '{ total=0; for (i=1; i<=NF; i++) total+=$i; print total }' file.txt

While Loop

awk '{ i=1; while (i<=NF) { print $i; i++ } }' file.txt

Arrays

Associative Arrays

# Count occurrences
awk '{ count[$1]++ } END { for (word in count) print word, count[word] }' file.txt

# Sum by key
awk '{ sum[$1] += $2 } END { for (key in sum) print key, sum[key] }' data.txt

Array Examples

# Track unique values
awk '{ seen[$1]=1 } END { for (val in seen) print val }' file.txt

# First occurrence
awk '!seen[$1]++' file.txt

Practical Examples

Process CSV

# Print specific columns from CSV
awk -F, '{ print $1, $3 }' data.csv

# CSV to TSV
awk -F, 'BEGIN { OFS="\t" } { print $1, $2, $3 }' data.csv

# Filter rows
awk -F, '$3 > 1000' data.csv

Log Analysis

# Count log levels
awk '{ count[$3]++ } END { for (level in count) print level, count[level] }' app.log

# Extract errors
awk '/ERROR/' app.log

# Show last 10 errors
awk '/ERROR/ { errors[NR]=$0 } END { for (i=NR-9; i<=NR; i++) print errors[i] }' app.log

System Monitoring

# Memory usage
free -m | awk 'NR==2 { printf "Memory: %.2f%%\n", $3/$2*100 }'

# Disk usage
df -h | awk '$5 > 80 { print $0 }'

# CPU usage
ps aux | awk '$3 > 50 { print $1, $3, $11 }'

# Network connections
netstat -an | awk '/ESTABLISHED/ { count++ } END { print count }'

File Processing

# Remove duplicates (keeping first)
awk '!seen[$0]++' file.txt

# Remove blank lines
awk 'NF' file.txt

# Number lines
awk '{ print NR, $0 }' file.txt

# Print lines longer than 80 characters
awk 'length > 80' file.txt

Data Transformation

# Swap columns
awk '{ print $2, $1 }' file.txt

# Add column
awk '{ print $0, "new_value" }' file.txt

# Column math
awk '{ $4 = $2 * $3; print }' data.txt

Advanced Techniques

Multi-File Processing

# Process two files
awk 'NR==FNR { arr[$1]=$2; next } { print $1, arr[$1] }' file1.txt file2.txt

Custom Functions

awk '
function square(x) {
    return x * x
}
{ print square($1) }
' numbers.txt

Format Output

# Printf formatting
awk '{ printf "%-10s %5d %8.2f\n", $1, $2, $3 }' data.txt

# Aligned columns
awk '{ printf "|%-20s|%10s|\n", $1, $2 }' data.txt

Multiline Records

# Records separated by blank lines
awk 'BEGIN { RS="" } { print NR, $0 }' file.txt

Common One-Liners

Statistics

# Sum column
awk '{ sum += $1 } END { print sum }' numbers.txt

# Average
awk '{ sum += $1; n++ } END { print sum/n }' numbers.txt

# Min
awk 'NR==1 { min=$1 } $1<min { min=$1 } END { print min }' numbers.txt

# Max
awk 'NR==1 { max=$1 } $1>max { max=$1 } END { print max }' numbers.txt

# Count lines
awk 'END { print NR }' file.txt

Text Manipulation

# Print specific lines
awk 'NR==5' file.txt                    # Line 5
awk 'NR>=10 && NR<=20' file.txt         # Lines 10-20

# Skip header
awk 'NR>1' file.txt

# Print last field
awk '{ print $NF }' file.txt

# Print second-to-last field
awk '{ print $(NF-1) }' file.txt

Filtering

# Lines with more than 5 fields
awk 'NF > 5' file.txt

# Lines where field 3 is numeric
awk '$3 ~ /^[0-9]+$/' file.txt

# Unique lines (like uniq)
awk '!seen[$0]++' file.txt

# Remove comments and blank lines
awk '!/^#/ && NF' file.txt

Using Variables

Pass Shell Variables

# Using -v
threshold=100
awk -v limit=$threshold '$3 > limit' data.txt

# Multiple variables
awk -v a=10 -v b=20 '{ print a, b, $1 }' file.txt

Environment Variables

export THRESHOLD=100
awk '$3 > ENVIRON["THRESHOLD"]' data.txt

Script Files

Create AWK Script

# script.awk
BEGIN {
    FS = ":"
    print "Username Report"
    print "=" "==============="
}

{
    print "User:", $1
    print "Shell:", $NF
    print ""
}

END {
    print "Total users:", NR
}

Run Script

awk -f script.awk /etc/passwd

Real-World Examples

Generate Report

#!/bin/bash
awk 'BEGIN {
    print "Sales Report"
    print "============"
    total = 0
}
{
    print $1 ": $" $2
    total += $2
}
END {
    print "------------"
    print "Total: $" total
}' sales.txt

Process Access Log

# Count requests by IP
awk '{ ip[$1]++ } END { for (i in ip) print ip[i], i }' access.log | sort -rn

# Count by status code
awk '{ status[$9]++ } END { for (s in status) print s, status[s] }' access.log

# Top 10 URLs
awk '{ urls[$7]++ } END { for (u in urls) print urls[u], u }' access.log | \
    sort -rn | head -10

Data Aggregation

# Sum sales by category
awk -F, '{
    sales[$2] += $3
}
END {
    for (cat in sales)
        printf "%-15s $%.2f\n", cat, sales[cat]
}' sales.csv

Tips and Best Practices

  1. Quote AWK Programs - Use single quotes to prevent shell interpretation
  2. Use -F for Delimiters - Clearer than setting FS in BEGIN
  3. Test Patterns First - Test pattern matching before adding actions
  4. Use printf for Formatting - Better control than print
  5. Initialize Variables - Set counters to 0 in BEGIN
  6. Comment Complex Scripts - AWK supports # comments
  7. Use Functions - Break complex logic into functions
  8. Avoid Regex in Loops - Pre-compile if possible
  9. Use NF for Non-Empty Lines - awk 'NF' removes blank lines
  10. Always Close Braces - Easy to miss in multiline programs

Performance Tips

  1. Avoid External Commands - Use AWK built-ins when possible
  2. Use Associative Arrays - Very efficient for counting/grouping
  3. Minimize Pattern Complexity - Simple patterns are faster
  4. Use next - Skip remaining processing early
  5. Compile Regex Once - Put in variable if used multiple times

Common Patterns

Remove Duplicates

awk '!seen[$0]++' file.txt
awk 'seen[$0]++' file.txt

Column Sum

awk '{ sum += $1 } END { print sum }' file.txt

Find Max Value

awk '{ if ($1 > max) max = $1 } END { print max }' file.txt

Count Occurrences

awk '{ count[$1]++ } END { for (i in count) print i, count[i] }' file.txt

Debugging

awk '{ print "NF=" NF " NR=" NR " $0=" $0 }' file.txt

Trace Execution

# Print each line before processing
awk '{ print "Processing:", $0; # your code }' file.txt

Exit Status

Code Meaning
0 Success
1 Error in AWK program
2 Usage error (invalid option)

Quick Reference

Task Command
Print column 1 awk '{ print $1 }' file
Print last column awk '{ print $NF }' file
Print lines 5-10 awk 'NR>=5 && NR<=10' file
Sum column awk '{ sum+=$1 } END { print sum }' file
Count lines awk 'END { print NR }' file
Remove duplicates awk '!seen[$0]++' file
CSV to TSV awk -F, 'BEGIN{OFS="\t"} {print}' file
Filter by value awk '$3 > 100' file
Pattern match awk '/error/' file
Multiple files awk '{print FILENAME, $0}' *.txt