Unix Tips and Tricks: awk utility

Basic Structure of awk:

pattern{action}

The pattern specifies when the action should be performed. It takes each line of the input file as input, if the condition is true, the action is applied on the line.

Two other important patterns are specified by the keywords "BEGIN" and "END".

BEGIN { print "START"}
{ print }
END { print "END" }

BEGIN is applied once before the start of applying the action. END is applied just once after the action is applied.

Awk script inside a bash script:

#!/bin/sh
# Linux users have to change $8 to $9
awk '
BEGIN { print "File\tOwner" }
{ print $8, "\t", $3}
END { print " - DONE -" }

Native Awk script:

#!/bin/awk -f

BEGIN { print "File\tOwner" }

{ print $8, "\t", $3}

END { print " - DONE -" }

Syntax:

1. Print all the lines of a file.
awk '{print}' <filename>

2. Print only the first column of a file.
awk '{print $1}' <filename>

3. Print more than 1 fields of a file.
awk '{print $1$2}' <filename> //no space is displayed between the two fields in the output
awk '{print $1,$2}' <filename> //the two fields are separated by a space.

4. Print only lines with a specified text.
awk '/test/ {print $1}' filename //

5. Print the number of blank lines in a file:
awk 'BEGIN{x=0;} /^$/ {x=x+1;} END{print "There are "x" blank lines in the file";}' <filename>

Note: highlighted in red is the pattern. Higlighted in yellow is the action that is applied on the lines which match the pattern.

6. Print the number of lines in the file:
awk 'BEGIN{x=0} {x++;} END{print "There are "x" lines in the file";}'

Built in special variables:

1. FS (Field Separator)
The field seperator can either be a single character or a regular expression. It controls the way awk splits an input record into fields. awk scans the input record for character sequences that match the separator and save them as fields.

awk 'BEGIN {FS=":"} {print}' <filename>
//Here, the input line is split with ":" as the separator and processed accordingly.

awk 'BEGIN {FS="\t+"} {print}' <filename>
//Tab separated file. '+' matches one or more of the previous character.

awk 'BEGIN {FS = "[[:space]]+"} {print}' <filename>
//one or more spaces or tabs.

2. NF (Number of Fields)
This special variable denotes Number of fields in the current record.

awk 'NF == 3 {print "This record has 3 fields}' <filename>

Unix Tips and Tricks

Sunday, May 3, 2015

awk utility

No comments:

Post a Comment