AWK command tutorial and usage examples in linux/unix

AWK is a very powerful text processing command in linux/unix, usually using awk to process data and generate reports.

Awk, like other languages, supports built-in variables, built-in functions, conditional judgments, loop statements, etc. Its workflow is to scan text files line by line, match lines by pattern, and then perform operations.

AWK syntax

Basic format

awk [options] 'pattern{ commands }' file

Full format

awk [-F|-f|-v] 'BEGIN{ commands }  pattern{ commands }  END{ commands }' file

example:

$ ls -l /usr/bin | awk '
    BEGIN {
        print "Directory Report"
        print "================"
    }

    NF > 9 {
        print $9, "is a symbolic link to", $NF
    }

    END {
        print "============="
        print "End Of Report"
    }
'

Options

  • -F : Specifies a delimiter
  • -f : Call script
  • -v : Defining variables var=value

AWK built-in variable

  • $0 : Represents the entire current line
  • $1 : First field per line
  • NF : Field quantity variable
  • NR : Record number per line, multi-file record increment
  • FNR : Similar to NR, but multi-file records do not increase, and each file starts from 1
  • FS : Define delimiters
  • RS : Enter the record separator, default is a newline
  • ~  : Match, not exact comparison with ==
  • !~ : Mismatch, inaccurate comparison
  • == : Equal, must be all equal, accurate comparison
  • != : Not equal, exact comparison
  • && : And
  • || : Or
  • OFS : Output field separator, default is also a space, can be changed to tabs, etc.
  • ORS : Output record separator, which defaults to a newline character, that is, the processing result is also output to the screen line by line.
  • -F'[:#|]’ : Define multiple separators

AWK workflow

An awk script usually consists of three parts: BEGIN statement block, general statement block that can use pattern matching, and END statement block. These three parts are optional.

linux awk workflow
  • Step 1: Execute the statements in the BEGIN {commands} statement block;
  • Step 2: Read a line from a file or standard input (stdin), then execute the pattern {commands} block, which scans the file line by line and repeats the process from the first line to the last line until all the files are read;
  • Step 3: When reading to the end of the input stream, execute the END {commands} statement block.

BEGIN statement blocks are executed before awk starts reading rows from the input stream. This is an optional statement block. Statements such as variable initialization, table headers of printed output tables can usually be written in BEGIN statement blocks.  

END statement blocks are executed after awk reads all rows from the input stream, such as printing the analysis results of all rows. This information summary is done in the END statement block, which is also an optional statement block.  

Common commands in pattern blocks are the most important part and are optional. If a pattern statement block is not provided, then {print} is executed by default, that is, every read line is printed, and every read line by awk executes the block.

AWK examples

awk BEGIN/END examples

BEGIN and END are two special expressions in Linux awk, both of which can be used in pattern, where BEGIN and END are used to give the program an initial state and to perform some cleanup after the program has finished.

In the following example, we will demonstrate the awk BEGIN and END expressions.

➜ awk '
    BEGIN{print "=======begin======="}
    {print $0;}
    END{print "========end========"}
' test.txt

awk print

In the following example, we will show you the awk print function, one of awk’s most used functions, which prints file lines or file specified columns.

  • awk print $0

awk print $0: print the entire line

$ echo "test" | awk '{print}'
$ echo "test" | awk '{print $0}'
  • use -F specifies a delimiter

awk -F “delimiter“, it can be separated by the delimiter, $1 for the first column, $2 for the second column, and so on.

$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2}'

  • continuous output of $2 and $5
$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2 $5}'

  • custom output of $2 and $5
$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2 ":" $5}'
B:DDDD

$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2 "#" $5}'
B#DDDD
  • print specific columns
$ awk '{print $2}' test.txt
  • use NR to print specific rows
$ awk 'NR==2{print}' test.txt 
2	3	4

  • print line number per line
$ awk '{print NR, $0}' test.txt

  • use OFS output specific format
$ echo "a:B:C:CS:DDDD" | awk -F":" '{print $2,$5}' OFS="|"

  • awk print last field
➜ awk '{print $NF}' test.txt                            

awk multiple delimiter

When we do data analysis on a daily basis, we often need to separate strings based on multiple delimiters because of inconsistent data rules.

In the following example, we will use the awk -f option or the awk FS variable for multiple delimiters separating strings.

  • Use awk -F option examples
$ cat test_1.txt | awk -F[:=] '{print $1, $2, $3, $4, $5}'
  • Use awk FS variable examples
➜  cat test_1.txt | awk 'BEGIN{FS="[:=]";}{print $1, $2, $3, $4, $5}'

  • Use awk split examples
$ cat test_1.txt | awk '{split($0, a, "[:=]");print a[1], a[2], a[3], a[4], a[5]}'

awk user defined functions

syntax

The syntax of the user-defined function is:

function function_name(argument1, argument2, ...)
{
    function body
}

analysis

  • Function_name is the name of a user-defined function. Function names should start with letters, followed by free combinations of numbers, letters, or underscores. Keywords retained by AWK cannot be used as the name of user-defined functions.
  • Custom functions can accept multiple input parameters separated by commas. Parameters are not required. We can also define functions that do not have any input parameters.
  • Function body is the body part of a function, which contains AWK program code.

example 1

In the following example, we will use the awk custom function to calculate the area value.

➜ vim functions.awk

# calculated area
function area(a, b)
{
    return a * b
}

# main function
function main(a, b)
{
    result = area(a, b)
    print "area = ", result
}

# start statement
BEGIN {
    main(2, 4)
}

➜ awk -f functions.awk
area =  8

example 2



cat test.txt | awk '
	function sum(a) {
		print "= "a[1] + a[2] + a[3]
	};
	
	BEGIN{
		print "=================";
		FS=" "
	};

	split($0, a);
	{
		sum(a)
	}
	END{
		print "=================="
	}'

awk built-in functions examples

Substr func examples

Substr returns a substring of the specified length from the start position; if the length is not specified, it returns a substring from the start position to the end of the string.

syntax

#Returns a string of length starting at position
substr(string, position, length)

#Returns a string from position to the end
substr(string, position)

examples


➜ echo "ABCDEFG" | awk '{print substr($0, 3)}'
➜ echo "ABCDEFG" | awk '{print substr($0, 3, 2)}'

Split func examples

Split function allows a string to be separated into words and stored in an array.

syntax

split(SOURCE,DESTINATION,DELIMITER)

split(SOURCE, DESTINATION)  -- If the third parameter is not provided, awk defaults to the current FS value.
  • SOURCE : Text to be analyzed
  • DESTINATION : Store analytical results
  • DELIMITER : Text separator

examples

  • Use Default Delimiter – Space
➜ cat test.txt 
1	2	3
2	3	4
3	4	5
➜ cat test.txt | awk '{split($0, a, " ");print a[1], a[2], a[3]}'
1 2 3
2 3 4
3 4 5
➜ cat test.txt 
1	2	3
2	3	4
3	4	5
➜ cat test.txt | awk -F" " '{split($0, a);print a[1], a[2], a[3]}'
1 2 3
2 3 4
3 4 5

Length func examples

Length function returns the number of characters in the entire record.

syntax

length(string)

example

➜ echo "ABCDEFG" | awk '{print length($0)}' 
7

System func examples

System function executes system commands.

syntax

system(Linux command)

example

  • use awk command batch creation of files

➜ awk 'BEGIN {do {++i; system("touch file_num_" i "_test") } while (i<9) }'
  • use awk command batch rename
➜ ls | grep test | awk '{system("mv "$0" "substr($0,0,10)"")}'

Awk conditional statement if else

awk if, else if, else

Regardless of the development language, the if else conditional statement is one of the most used features.

In the following examples, we will introduce the use of awk if, else if conditional statements and the use of multiple conditional judgments

if() {
    ...
} else if {
    ...
} else {
    ...
}
➜ cat test-2.txt 
123#192.168.1.33#google#20190622#/url/test

➜ awk -F"#" '{
    if($1==12) 
        print $1; 
    else if($3 == "google")
        print $3; 
    else 
        print $0
}' test-2.txt
awk-if-else-if-else

awk ternary judgment

Ternary operations are common in development. Awk also supports ternary judgment, and in the following example, we will demonstrate how to use awk ternary operations for conditional judgment.

➜  ~ echo "100:23" | awk -F ":" 'BEGIN{res=$1>32?"true":"false";}{print res}'
false

awk loop examples

awk for loop

In the following example tutorial, we will introduce the use of awk for loop function in daily log processing.

➜ cat test-2.txt  
192.168.1.1#10.10.83.92#/url/test#google#1234322344

➜ awk -F"#" '{
        for(i=1;i<=NF;i++) {
                if($i == "google") {
                        print "NF=" i ", this is a refer";
                } else {
                        print "NF=" i "," $i
                }
        }
}' test-2.txt
linux awk for loop
awk-loop-for-while

The awk command is very widely used in linux / unix systems, and I will share it with you in subsequent articles.

Reference:

https://www.gnu.org/software/gawk/
https://en.wikipedia.org/wiki/AWK

2 Comments

Add a Comment

Your email address will not be published. Required fields are marked *