Awk command tutorial in linux/unix with examples and use cases

What is AWK?

AWK is a language for processing text files and a powerful text analysis tool.

AWK is called AWK because it takes the first characters of three founders, Alfred Aho, Peter Weinberger and Brian Kernighan’s Family Name.

AWK is a scripting language that supports processing data and generating reports. The awk command allows users to use variables, functions, user-defined functions, and logical operators without compiling.

What can AWK do?

  1. AWK Operations:
    (a) Scans a file line by line
    (b) Splits each input line into fields
    (c) Compares input line/fields to pattern
    (d) Performs action(s) on matched lines
  2. Useful For:
    (a) Transform data files
    (b) Produce formatted reports
    (c) Batch operation
  3. Programming Constructs:
    (a) Format output lines
    (b) Arithmetic and string operations
    (c) Conditionals and loops
    (d) User-defined function

AWK syntax

Basic format

awk [options] 'pattern{ commands }' file

Full format

awk [-F|-f|-v] 'BEGIN{ commands }  pattern{ commands }  END{ commands }' file

example:

$ ls -l /usr/bin | awk '
    BEGIN {
        print "Directory Report"
        print "================"
    }

    NF > 9 {
        print $9, "is a symbolic link to", $NF
    }

    END {
        print "============="
        print "End Of Report"
    }
'

Options

  • -F : Specifies a delimiter
  • -f : Call script
  • -v : Defining variables var=value

AWK variable

  • $0 : Represents the entire current line
  • $1 : First field per line
  • NF : Field quantity variable
  • NR : Record number per line, multi-file record increment
  • FNR : Similar to NR, but multi-file records do not increase, and each file starts from 1
  • FS : Define delimiters
  • RS : Enter the record separator, default is a newline
  • ~  : Match, not exact comparison with ==
  • !~ : Mismatch, inaccurate comparison
  • == : Equal, must be all equal, accurate comparison
  • != : Not equal, exact comparison
  • && : And
  • || : Or
  • OFS : Output field separator, default is also a space, can be changed to tabs, etc.
  • ORS : Output record separator, which defaults to a newline character, that is, the processing result is also output to the screen line by line.
  • -F'[:#|]’ : Define multiple separators

AWK Workflow – How does AWK work?

An awk script usually consists of three parts: BEGIN statement block, general statement block that can use pattern matching, and END statement block. These three parts are optional.

linux awk workflow
  • Step 1: Execute the statements in the BEGIN {commands} statement block;
  • Step 2: Read a line from a file or standard input (stdin), then execute the pattern {commands} block, which scans the file line by line and repeats the process from the first line to the last line until all the files are read;
  • Step 3: When reading to the end of the input stream, execute the END {commands} statement block.

BEGIN statement blocks are executed before awk starts reading rows from the input stream. This is an optional statement block. Statements such as variable initialization, table headers of printed output tables can usually be written in BEGIN statement blocks.  

END statement blocks are executed after awk reads all rows from the input stream, such as printing the analysis results of all rows. This information summary is done in the END statement block, which is also an optional statement block.  

Common commands in pattern blocks are the most important part and are optional. If a pattern statement block is not provided, then {print} is executed by default, that is, every read line is printed, and every read line by awk executes the block.

AWK Example – How to using AWK?

awk BEGIN/END examples

➜ awk '
    BEGIN{print "=======begin======="}
    {print $0;}
    END{print "========end========"}
' test.txt

awk print

  • awk print $0
$ echo "test" | awk '{print}'
$ echo "test" | awk '{print $0}'
  • use -F specifies a delimiter
$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2}'

  • continuous output of $2 and $5
$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2 $5}'

  • custom output of $2 and $5
$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2 ":" $5}'
B:DDDD

$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2 "#" $5}'
B#DDDD
  • print specific columns
$ awk '{print $2}' test.txt
  • use NR to print specific rows
$ awk 'NR==2{print}' test.txt 
2	3	4

  • print line number per line
$ awk '{print NR, $0}' test.txt

  • use OFS output specific format
$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2,$5}' OFS="|"

  • awk print last field
➜ awk '{print $NF}' test.txt                            

awk multiple delimiter

  • Use awk -F option examples
$ cat test_1.txt | awk -F[:=] '{print $1, $2, $3, $4, $5}'
  • Use awk split examples
$ cat test_1.txt | awk '{split($0, a, "[:=]");print a[1], a[2], a[3], a[4], a[5]}'

awk user defined functions

syntax

The syntax of the user-defined function is:

function function_name(argument1, argument2, ...)
{
    function body
}

analysis

  • Function_name is the name of a user-defined function. Function names should start with letters, followed by free combinations of numbers, letters, or underscores. Keywords retained by AWK cannot be used as the name of user-defined functions.
  • Custom functions can accept multiple input parameters separated by commas. Parameters are not required. We can also define functions that do not have any input parameters.
  • Function body is the body part of a function, which contains AWK program code.

example 1

➜ vim functions.awk

# calculated area
function area(a, b)
{
    return a * b
}

# main function
function main(a, b)
{
    result = area(a, b)
    print "area = ", result
}

# start statement
BEGIN {
    main(2, 4)
}

➜ awk -f functions.awk
area =  8

example 2



cat test.txt | awk '
	function sum(a) {
		print "= "a[1] + a[2] + a[3]
	};
	
	BEGIN{
		print "=================";
		FS=" "
	};

	split($0, a);
	{
		sum(a)
	}
	END{
		print "=================="
	}'

awk functions examples

Substr func examples

Substr returns a substring of the specified length from the start position; if the length is not specified, it returns a substring from the start position to the end of the string.

syntax

#Returns a string of length starting at position
substr(string, position, length)

#Returns a string from position to the end
substr(string, position)

examples


➜ echo "ABCDEFG" | awk '{print substr($0, 3)}'
➜ echo "ABCDEFG" | awk '{print substr($0, 3, 2)}'

Split func examples

Split function allows a string to be separated into words and stored in an array.

syntax

split(SOURCE,DESTINATION,DELIMITER)

split(SOURCE, DESTINATION)  -- If the third parameter is not provided, awk defaults to the current FS value.
  • SOURCE : Text to be analyzed
  • DESTINATION : Store analytical results
  • DELIMITER : Text separator

examples

  • Use Default Delimiter – Space
➜ cat test.txt 
1	2	3
2	3	4
3	4	5
➜ cat test.txt | awk '{split($0, a, " ");print a[1], a[2], a[3]}'
1 2 3
2 3 4
3 4 5
➜ cat test.txt 
1	2	3
2	3	4
3	4	5
➜ cat test.txt | awk -F" " '{split($0, a);print a[1], a[2], a[3]}'
1 2 3
2 3 4
3 4 5

Length func examples

Length function returns the number of characters in the entire record.

syntax

length(string)

example

➜ echo "ABCDEFG" | awk '{print length($0)}' 
7

System func examples

System function executes system commands.

syntax

system(Linux command)

example

  • use awk command batch creation of files

➜ awk 'BEGIN {do {++i; system("touch file_num_" i "_test") } while (i<9) }'
  • use awk command batch rename
➜ ls | grep test | awk '{system("mv "$0" "substr($0,0,10)"")}'

awk loop examples

awk if else if else

In the following examples, we will introduce the use of awk if else conditional statements and the use of multiple conditional judgments

if() {
    ...
} else if {
    ...
} else {
    ...
}
➜ cat test-2.txt 
123#192.168.1.33#google#20190622#/url/test

➜ awk -F"#" '{
    if($1==12) 
        print $1; 
    else if($3 == "google")
        print $3; 
    else 
        print $0
}' test-2.txt
awk-if-else-if-else

awk for loop

In the following example tutorial, we will introduce the use of awk for loop function in daily log processing.

➜ cat test-2.txt  
192.168.1.1#10.10.83.92#/url/test#google#1234322344

➜ awk -F"#" '{
        for(i=1;i<=NF;i++) {
                if($i == "google") {
                        print "NF=" i ", this is a refer";
                } else {
                        print "NF=" i "," $i
                }
        }
}' test-2.txt
linux awk for loop
awk-loop-for-while

Reference:

https://www.gnu.org/software/gawk/
https://en.wikipedia.org/wiki/AWK
https://www.geeksforgeeks.org/awk-command-unixlinux-examples/

Related post:

Linux common commands tutorial and use examples
GNU sed syntax and sed examples
linux iostat syntax and iostat examples
linux grep command and grep examples
linux top syntax and top examples
How to using multiple delimiters in awk and sed
Awk tutorial: find and kill process use awk
Awk tutorial: awk loop example: awk for and while
awk if else, if else if else and awk if else nested example
linux awk command – awk system batch operation

2 Comments

Add a Comment

Your email address will not be published. Required fields are marked *