awk Command Cheat Sheet
awk is a powerful programming language and command-line tool for pattern scanning and text processing. Named after its creators (Aho, Weinberger, Kernighan), awk excels at processing structured text data like logs, CSV files, and tabular output.
Synopsis
awk 'pattern { action }' file
awk -F: '{ print $1 }' file
awk -f script.awk file
Description
AWK processes text line-by-line, splitting each line into fields. It matches patterns and executes actions on matching lines. It's particularly powerful for extracting columns from structured data, performing calculations, and generating reports.
Basic Syntax
awk 'pattern { action }' input_file
- Pattern: Condition to match (optional)
- Action: What to do with matching lines (optional)
- If pattern omitted: action applies to all lines
- If action omitted: print matching lines
Print Commands
Print Entire Lines
awk '{ print }' file.txt
awk '1' file.txt # Shorthand
Print Specific Fields
awk '{ print $1 }' file.txt # First field
awk '{ print $1, $3 }' file.txt # First and third
awk '{ print $1, $2, $3 }' file.txt
Print Last Field
awk '{ print $NF }' file.txt
NF = number of fields, so $NF = last field.
Print All But First Field
awk '{ $1=""; print }' file.txt
awk '{ for(i=2;i<=NF;i++) printf "%s ", $i; print "" }' file.txt
Built-in Variables
| Variable | Description |
|---|---|
$0 |
Entire line |
$1, $2, ... |
First, second field, etc. |
$NF |
Last field |
NF |
Number of fields in current record |
NR |
Current record (line) number |
FNR |
Record number in current file |
FS |
Input field separator (default: whitespace) |
OFS |
Output field separator (default: space) |
RS |
Input record separator (default: newline) |
ORS |
Output record separator (default: newline) |
FILENAME |
Name of current input file |
ARGC |
Number of command-line arguments |
ARGV |
Array of command-line arguments |
Field Separators
Default (Whitespace)
awk '{ print $1 }' file.txt
Splits on spaces and tabs.
Custom Separator
# Using colon
awk -F: '{ print $1 }' /etc/passwd
# Using comma (CSV)
awk -F, '{ print $2 }' data.csv
# Using pipe
awk -F'|' '{ print $1, $3 }' data.txt
Multiple Separators
# Split on colon OR comma
awk -F'[;:]' '{ print $1 }' file.txt
# Split on multiple spaces/tabs
awk -F'[ \t]+' '{ print $1 }' file.txt
Change Output Separator
# Input: space-separated, Output: comma-separated
awk 'BEGIN { OFS="," } { print $1, $2, $3 }' file.txt
Pattern Matching
Regular Expression
# Lines containing "error"
awk '/error/' log.txt
# Lines starting with "Error:"
awk '/^Error:/' log.txt
# Lines ending with "fail"
awk '/fail$/' log.txt
# Case-insensitive
awk 'tolower($0) ~ /error/' log.txt
Field Matching
# Third field equals "active"
awk '$3 == "active"' file.txt
# First field matches regex
awk '$1 ~ /^user/' file.txt
# Second field doesn't match
awk '$2 !~ /test/' file.txt
Numeric Comparisons
# Third field greater than 100
awk '$3 > 100' data.txt
# Field between values
awk '$2 >= 50 && $2 <= 100' data.txt
# Any field equals value
awk '$1 == "admin" || $2 == "root"' file.txt
Line Number Conditions
# First line only
awk 'NR==1' file.txt
# Lines 5-10
awk 'NR>=5 && NR<=10' file.txt
# Every 5th line
awk 'NR%5==0' file.txt
# Last line
awk 'END{print}' file.txt
BEGIN and END
BEGIN Block
Executes before processing any lines:
awk 'BEGIN { print "Starting..." } { print } END { print "Done" }' file.txt
Common BEGIN Uses
# Set field separator
awk 'BEGIN { FS=":" } { print $1 }' /etc/passwd
# Print header
awk 'BEGIN { print "Name\tAge\tCity" } { print }' data.txt
# Initialize variables
awk 'BEGIN { total=0 } { total+=$1 } END { print total }' numbers.txt
Arithmetic Operations
Basic Math
# Sum first column
awk '{ total += $1 } END { print total }' numbers.txt
# Average
awk '{ total += $1; count++ } END { print total/count }' numbers.txt
# Multiply fields
awk '{ print $1 * $2 }' data.txt
Calculations
# Percentage
awk '{ print $1, ($1/total)*100 "%" }' data.txt
# Round numbers
awk '{ printf "%.2f\n", $1 }' numbers.txt
# Min/Max
awk 'NR==1{max=$1} $1>max{max=$1} END{print max}' numbers.txt
String Operations
Concatenation
# Combine fields
awk '{ print $1 $2 }' file.txt # No separator
awk '{ print $1 "_" $2 }' file.txt # With underscore
String Functions
# Length
awk '{ print length($1) }' file.txt
# Substring
awk '{ print substr($1, 1, 3) }' file.txt # First 3 chars
# Index (position)
awk '{ print index($1, "x") }' file.txt
# Replace
awk '{ gsub(/old/, "new"); print }' file.txt
# Split string
awk '{ split($1, arr, ":"); print arr[1] }' file.txt
Case Conversion
# Uppercase
awk '{ print toupper($0) }' file.txt
# Lowercase
awk '{ print tolower($0) }' file.txt
Conditional Statements
If-Else
awk '{ if ($3 > 100) print $1, "high"; else print $1, "low" }' data.txt
Ternary Operator
awk '{ print ($3 > 100) ? "high" : "low" }' data.txt
Multiple Conditions
awk '{
if ($3 > 100)
print $1, "high"
else if ($3 > 50)
print $1, "medium"
else
print $1, "low"
}' data.txt
Loops
For Loop
# Print all fields
awk '{ for (i=1; i<=NF; i++) print $i }' file.txt
# Sum all fields
awk '{ total=0; for (i=1; i<=NF; i++) total+=$i; print total }' file.txt
While Loop
awk '{ i=1; while (i<=NF) { print $i; i++ } }' file.txt
Arrays
Associative Arrays
# Count occurrences
awk '{ count[$1]++ } END { for (word in count) print word, count[word] }' file.txt
# Sum by key
awk '{ sum[$1] += $2 } END { for (key in sum) print key, sum[key] }' data.txt
Array Examples
# Track unique values
awk '{ seen[$1]=1 } END { for (val in seen) print val }' file.txt
# First occurrence
awk '!seen[$1]++' file.txt
Practical Examples
Process CSV
# Print specific columns from CSV
awk -F, '{ print $1, $3 }' data.csv
# CSV to TSV
awk -F, 'BEGIN { OFS="\t" } { print $1, $2, $3 }' data.csv
# Filter rows
awk -F, '$3 > 1000' data.csv
Log Analysis
# Count log levels
awk '{ count[$3]++ } END { for (level in count) print level, count[level] }' app.log
# Extract errors
awk '/ERROR/' app.log
# Show last 10 errors
awk '/ERROR/ { errors[NR]=$0 } END { for (i=NR-9; i<=NR; i++) print errors[i] }' app.log
System Monitoring
# Memory usage
free -m | awk 'NR==2 { printf "Memory: %.2f%%\n", $3/$2*100 }'
# Disk usage
df -h | awk '$5 > 80 { print $0 }'
# CPU usage
ps aux | awk '$3 > 50 { print $1, $3, $11 }'
# Network connections
netstat -an | awk '/ESTABLISHED/ { count++ } END { print count }'
File Processing
# Remove duplicates (keeping first)
awk '!seen[$0]++' file.txt
# Remove blank lines
awk 'NF' file.txt
# Number lines
awk '{ print NR, $0 }' file.txt
# Print lines longer than 80 characters
awk 'length > 80' file.txt
Data Transformation
# Swap columns
awk '{ print $2, $1 }' file.txt
# Add column
awk '{ print $0, "new_value" }' file.txt
# Column math
awk '{ $4 = $2 * $3; print }' data.txt
Advanced Techniques
Multi-File Processing
# Process two files
awk 'NR==FNR { arr[$1]=$2; next } { print $1, arr[$1] }' file1.txt file2.txt
Custom Functions
awk '
function square(x) {
return x * x
}
{ print square($1) }
' numbers.txt
Format Output
# Printf formatting
awk '{ printf "%-10s %5d %8.2f\n", $1, $2, $3 }' data.txt
# Aligned columns
awk '{ printf "|%-20s|%10s|\n", $1, $2 }' data.txt
Multiline Records
# Records separated by blank lines
awk 'BEGIN { RS="" } { print NR, $0 }' file.txt
Common One-Liners
Statistics
# Sum column
awk '{ sum += $1 } END { print sum }' numbers.txt
# Average
awk '{ sum += $1; n++ } END { print sum/n }' numbers.txt
# Min
awk 'NR==1 { min=$1 } $1<min { min=$1 } END { print min }' numbers.txt
# Max
awk 'NR==1 { max=$1 } $1>max { max=$1 } END { print max }' numbers.txt
# Count lines
awk 'END { print NR }' file.txt
Text Manipulation
# Print specific lines
awk 'NR==5' file.txt # Line 5
awk 'NR>=10 && NR<=20' file.txt # Lines 10-20
# Skip header
awk 'NR>1' file.txt
# Print last field
awk '{ print $NF }' file.txt
# Print second-to-last field
awk '{ print $(NF-1) }' file.txt
Filtering
# Lines with more than 5 fields
awk 'NF > 5' file.txt
# Lines where field 3 is numeric
awk '$3 ~ /^[0-9]+$/' file.txt
# Unique lines (like uniq)
awk '!seen[$0]++' file.txt
# Remove comments and blank lines
awk '!/^#/ && NF' file.txt
Using Variables
Pass Shell Variables
# Using -v
threshold=100
awk -v limit=$threshold '$3 > limit' data.txt
# Multiple variables
awk -v a=10 -v b=20 '{ print a, b, $1 }' file.txt
Environment Variables
export THRESHOLD=100
awk '$3 > ENVIRON["THRESHOLD"]' data.txt
Script Files
Create AWK Script
# script.awk
BEGIN {
FS = ":"
print "Username Report"
print "=" "==============="
}
{
print "User:", $1
print "Shell:", $NF
print ""
}
END {
print "Total users:", NR
}
Run Script
awk -f script.awk /etc/passwd
Real-World Examples
Generate Report
#!/bin/bash
awk 'BEGIN {
print "Sales Report"
print "============"
total = 0
}
{
print $1 ": $" $2
total += $2
}
END {
print "------------"
print "Total: $" total
}' sales.txt
Process Access Log
# Count requests by IP
awk '{ ip[$1]++ } END { for (i in ip) print ip[i], i }' access.log | sort -rn
# Count by status code
awk '{ status[$9]++ } END { for (s in status) print s, status[s] }' access.log
# Top 10 URLs
awk '{ urls[$7]++ } END { for (u in urls) print urls[u], u }' access.log | \
sort -rn | head -10
Data Aggregation
# Sum sales by category
awk -F, '{
sales[$2] += $3
}
END {
for (cat in sales)
printf "%-15s $%.2f\n", cat, sales[cat]
}' sales.csv
Tips and Best Practices
- Quote AWK Programs - Use single quotes to prevent shell interpretation
- Use -F for Delimiters - Clearer than setting FS in BEGIN
- Test Patterns First - Test pattern matching before adding actions
- Use printf for Formatting - Better control than print
- Initialize Variables - Set counters to 0 in BEGIN
- Comment Complex Scripts - AWK supports # comments
- Use Functions - Break complex logic into functions
- Avoid Regex in Loops - Pre-compile if possible
- Use NF for Non-Empty Lines -
awk 'NF'removes blank lines - Always Close Braces - Easy to miss in multiline programs
Performance Tips
- Avoid External Commands - Use AWK built-ins when possible
- Use Associative Arrays - Very efficient for counting/grouping
- Minimize Pattern Complexity - Simple patterns are faster
- Use
next- Skip remaining processing early - Compile Regex Once - Put in variable if used multiple times
Common Patterns
Remove Duplicates
awk '!seen[$0]++' file.txt
Print Duplicates
awk 'seen[$0]++' file.txt
Column Sum
awk '{ sum += $1 } END { print sum }' file.txt
Find Max Value
awk '{ if ($1 > max) max = $1 } END { print max }' file.txt
Count Occurrences
awk '{ count[$1]++ } END { for (i in count) print i, count[i] }' file.txt
Debugging
Print Debug Info
awk '{ print "NF=" NF " NR=" NR " $0=" $0 }' file.txt
Trace Execution
# Print each line before processing
awk '{ print "Processing:", $0; # your code }' file.txt
Exit Status
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error in AWK program |
| 2 | Usage error (invalid option) |
Quick Reference
| Task | Command |
|---|---|
| Print column 1 | awk '{ print $1 }' file |
| Print last column | awk '{ print $NF }' file |
| Print lines 5-10 | awk 'NR>=5 && NR<=10' file |
| Sum column | awk '{ sum+=$1 } END { print sum }' file |
| Count lines | awk 'END { print NR }' file |
| Remove duplicates | awk '!seen[$0]++' file |
| CSV to TSV | awk -F, 'BEGIN{OFS="\t"} {print}' file |
| Filter by value | awk '$3 > 100' file |
| Pattern match | awk '/error/' file |
| Multiple files | awk '{print FILENAME, $0}' *.txt |