Unix Tips and Tricks

Sunday, May 3, 2015

awk utility

Basic Structure of awk:

pattern{action}

The pattern specifies when the action should be performed. It takes each line of the input file as input, if the condition is true, the action is applied on the line.

Two other important patterns are specified by the keywords "BEGIN" and "END".

BEGIN { print "START"}
{ print }
END { print "END" }

BEGIN is applied once before the start of applying the action. END is applied just once after the action is applied.

Awk script inside a bash script:

#!/bin/sh
# Linux users have to change $8 to $9
awk '
BEGIN { print "File\tOwner" }
{ print $8, "\t", $3}
END { print " - DONE -" }

Native Awk script:

#!/bin/awk -f

BEGIN { print "File\tOwner" }

{ print $8, "\t", $3}

END { print " - DONE -" }

Syntax:

1. Print all the lines of a file.
awk '{print}' <filename>

2. Print only the first column of a file.
awk '{print $1}' <filename>

3. Print more than 1 fields of a file.
awk '{print $1$2}' <filename> //no space is displayed between the two fields in the output
awk '{print $1,$2}' <filename> //the two fields are separated by a space.

4. Print only lines with a specified text.
awk '/test/ {print $1}' filename //

5. Print the number of blank lines in a file:
awk 'BEGIN{x=0;} /^$/ {x=x+1;} END{print "There are "x" blank lines in the file";}' <filename>

Note: highlighted in red is the pattern. Higlighted in yellow is the action that is applied on the lines which match the pattern.

6. Print the number of lines in the file:
awk 'BEGIN{x=0} {x++;} END{print "There are "x" lines in the file";}'

Built in special variables:

1. FS (Field Separator)
The field seperator can either be a single character or a regular expression. It controls the way awk splits an input record into fields. awk scans the input record for character sequences that match the separator and save them as fields.

awk 'BEGIN {FS=":"} {print}' <filename>
//Here, the input line is split with ":" as the separator and processed accordingly.

awk 'BEGIN {FS="\t+"} {print}' <filename>
//Tab separated file. '+' matches one or more of the previous character.

awk 'BEGIN {FS = "[[:space]]+"} {print}' <filename>
//one or more spaces or tabs.

2. NF (Number of Fields)
This special variable denotes Number of fields in the current record.

awk 'NF == 3 {print "This record has 3 fields}' <filename>

Thursday, September 18, 2014

sed command

sed (Stream editor) command in Unix

sed stands for stream editor. It's a Unix utility and is most popularly used to find and replace a pattern in text files.

Syntax:

sed 's/pattern/replacement/'

Examples:

ubuntu:user% cat line

Test1|test2|test3|2

one|two|three|3

Dev|stag|prod|4

solaris|linux|unix|1

oracle|sybase|mysql|5

1. Find and replace a pattern in a line: //replaces just the first occurrence of 'test' string in a line.

ubuntu:user% sed 's/test/redhat/' line

Test1|redhat2|test3|2

one|two|three|3

Dev|stag|prod|4

solaris|linux|unix|1

oracle|sybase|mysql|5

2. Find and replace multiple occurrences of a pattern in a line using 'g' option:

ubuntu:user% sed 's/test/redhat/g' line

Test1|redhat2|redhat3|2

one|two|three|3

Dev|stag|prod|4

solaris|linux|unix|1

oracle|sybase|mysql|5

3. Find and replace only the specified occurrence of a pattern in a line: //replaces just the 2nd occurence of test in the line.

ubuntu:user% sed 's/test/redhat/2' line

Test1|test2|redhat3|2

one|two|three|3

Dev|stag|prod|4

solaris|linux|unix|1

oracle|sybase|mysql|5

4. For case insensitive: (This doesn't work on solaris, because 'i' is a non standard GNU extension)

sed 's/test/redhat/gi' line

5. Substitute the string and also display the searched pattern also:

In the below case, the search pattern - 'test' also is displayed along with replaced pattern using '&' .

ubuntu:user% sed 's/test/& redhat/g' line

Test1|test redhat2|test redhat3|2

one|two|three|3

Dev|stag|prod|4

solaris|linux|unix|1

oracle|sybase|mysql|5

turnstiles1:sujayj% sed 's/test/redhat &/g' line

Test1|redhat test2|redhat test3|2

one|two|three|3

Dev|stag|prod|4

solaris|linux|unix|1

oracle|sybase|mysql|5

6. Using regular expressions to find and replace.

//The parenthesis highlighted in green is called grouping. It captures the match in a register/buffer and this can be retrieved by \<number>. Here the \1 denotes the 1st match, \2 represents matches the 2nd match and so on. In the below case, we have used grouping for only 1 match.

ubuntu: user% echo "Height: 1000 centimeters" | sed 's/$[0-9]*$ centimeters/\1 cm/g'

Height: 1000 cm

8. Use shell variables inside a sed command
//The variable needs to be enclosed in '(single quotes) followed by "(double quotes) and closed in the same order.

ubuntu: user% VAR="test"
ubuntu: user% sed 's/'"$VAR"'/replace/g' line
Test1|replace2|replace3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1

oracle|sybase|mysql|5

11. Pattern matching using sed:

//In the below example, search and replace 'test' with 'replace' only when the text = oracle is present in the line.

ubuntu:user% sed '/oracle/ s/test/replace/g' line

Test1|test2|test3|2

one|two|three|3

Dev|stag|prod|4

solaris|linux|unix|1

Test4|test5|test6|2

oracle|sybase|mysql|5

oracle|replaceing|replaceer

//Substitue with line ranges

Syntax: sed 'line-number,/search-end-pattern/ /find/replace/g' filename

ubuntu:user% sed '2,/test6/ s/test/replace/g' line

Test1|test2|test3|2

one|two|three|3

Dev|stag|prod|4

solaris|linux|unix|1

Test4|replace5|replace6|2

oracle|sybase|mysql|5

oracle|testing|tester

13. Delete lines using sed

//Delete lines using line number ranges

Syntax: sed 'linenumber-start, linenumber-end d' filename. Note that, the delete option doesn't affect the actual contents in the file.

ubuntu:user% sed '3,4d' line

sed '3,4d' line

Test1|test2|test3|2

one|two|three|3

Test4|test5|test6|2

oracle|sybase|mysql|5

oracle|testing|tester

//Inverse of the above, i.e retain the lines specified and delete the others using the '!' symbol

ubuntu:user% sed '3,4!d' line
Dev|stag|prod|4
solaris|linux|unix|1

14. Regular expression
sed doesn't support the entire suite of regex, but it does support character classes etc.

//The following example demonstrates the use of character classes.

ubuntu:user% sed 's/[0-9]/*/g' line
Test*|test*|test*|*
one|two|three|*
Dev|stag|prod|*
solaris|linux|unix|*
Test*|test*|test*|*
oracle|sybase|mysql|*
oracle|testing|tester

[0-9] , [a-zA-Z], [^0-9] (^ symbol inside the class ignores the characters/numbers).

sort command examples

Sort command in Unix:

The sort command sorts lines of all the named files together and writes the result on the standard output. Comparisons are based on one or more sort keys extracted from each line of input. By default, there is one sort key, the entire input line. Lines are ordered according to the collating sequence of the current locale.

Examples:

//Sorted by 1st column

//Sorted by 2nd column

ubuntu:user% sort -t'|' -k2 line

solaris|linux|unix

dev|stag|prod

oracle|sybase|mysql

test1|test2|test3

one|two|three

//Reverses the order

ubuntu:user% sort -rt'|' -k1 line

test1|test2|test3

solaris|linux|unix

oracle|sybase|mysql

one|two|three

dev|stag|prod

//Sorts by number

ubuntu:user% sort -nt'|' -k4 line

solaris|linux|unix|1

test1|test2|test3|2

one|two|three|3

dev|stag|prod|4

oracle|sybase|mysql|5

//Using sort in combination with other commands using pipe

one|two|three|3

dev|stag|prod|4

oracle|sybase|mysql|5

//If the delimeter is not specified, then sort command takes the entire line as a string and sorts it based on the first column

ubuntu:user% sort line

dev|stag|prod|4

one|two|three|3

oracle|sybase|mysql|5

solaris|linux|unix|1

test1|test2|test3|2