Sunday, May 3, 2015

awk utility

Basic Structure of awk:

pattern{action}

The pattern specifies when the action should be performed. It takes each line of the input file as input, if the condition is true, the action is applied on the line. 

Two other important patterns are specified by the keywords "BEGIN" and "END".

BEGIN { print "START"}
          { print            }
END    { print "END"   }

BEGIN is applied once before the start of applying the action. END is applied just once after the action is applied.

Awk script inside a bash script:


#!/bin/sh
# Linux users have to change $8 to $9
awk '
BEGIN { print "File\tOwner" }
{ print $8, "\t", $3}
END { print " - DONE -" } 
'

Native Awk script:

#!/bin/awk -f
BEGIN { print "File\tOwner" }
{ print $8, "\t", $3}
END { print " - DONE -" }




Syntax:


1. Print all the lines of a file.
awk '{print}' <filename>

2. Print only the first column of a file.
awk '{print $1}' <filename>

3. Print more than 1 fields of a file.
awk '{print $1$2}' <filename> //no space is displayed between the two fields in the output
awk '{print $1,$2}' <filename> //the two fields are separated by a space.

4. Print only lines with a specified text.
awk '/test/ {print $1}' filename //

5. Print the number of blank lines in a file:
awk 'BEGIN{x=0;} /^$/ {x=x+1;} END{print "There are "x" blank lines in the file";}' <filename>

Note: highlighted in red is the pattern. Higlighted in yellow is the action that is applied on the lines which match the pattern.

6. Print the number of lines in the file:
awk 'BEGIN{x=0} {x++;} END{print "There are "x" lines in the file";}'


Built in special variables:

1. FS (Field Separator)
The field seperator can either be a single character or a regular expression. It controls the way awk splits an input record into fields. awk scans the input record for character sequences that match the separator and save them as fields.

awk 'BEGIN {FS=":"} {print}' <filename>
//Here, the input line is split with ":" as the separator and processed accordingly.

awk 'BEGIN {FS="\t+"} {print}' <filename>
//Tab separated file. '+' matches one or more of the previous character.

awk 'BEGIN {FS = "[[:space]]+"} {print}' <filename>
//one or more spaces or tabs.


2. NF (Number of Fields)
This special variable denotes Number of fields in the current record. 

awk 'NF == 3 {print "This record has 3 fields}' <filename>

Thursday, September 18, 2014

sed command

sed (Stream editor) command in Unix

sed stands for stream editor. It's a Unix utility and is most popularly used to find and replace a pattern in text files.

Syntax:
sed 's/pattern/replacement/'


Examples:
ubuntu:user% cat line
Test1|test2|test3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
oracle|sybase|mysql|5

1. Find and replace a pattern in a line: //replaces just the first occurrence of 'test' string in a line.

ubuntu:user% sed 's/test/redhat/' line
Test1|redhat2|test3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
oracle|sybase|mysql|5

2. Find and replace multiple occurrences of a pattern in a line using 'g' option:

ubuntu:user% sed 's/test/redhat/g' line
Test1|redhat2|redhat3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
oracle|sybase|mysql|5

3. Find and replace only the specified occurrence of a pattern in a line: //replaces just the 2nd occurence of test in the line.

ubuntu:user% sed 's/test/redhat/2' line
Test1|test2|redhat3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
oracle|sybase|mysql|5


4. For case insensitive: (This doesn't work on solaris, because 'i' is a non standard GNU extension)
sed 's/test/redhat/gi' line


5. Substitute the string and also display the searched pattern also:
In the below case, the search pattern - 'test' also is displayed along with replaced pattern using  '&' .

ubuntu:user% sed 's/test/& redhat/g' line
Test1|test redhat2|test redhat3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
oracle|sybase|mysql|5

turnstiles1:sujayj% sed 's/test/redhat &/g' line
Test1|redhat test2|redhat test3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
oracle|sybase|mysql|5


6. Using regular expressions to find and replace.
//The parenthesis highlighted in green is called grouping. It captures the match in a register/buffer and this can be retrieved by \<number>. Here the \1 denotes the 1st match, \2 represents matches the 2nd match and so on. In the below case, we have used grouping for only 1 match.

ubuntu: user% echo "Height: 1000 centimeters" | sed 's/\([0-9]*\) centimeters/\1 cm/g'
Height: 1000 cm


7. Search and Replace more than 1 pattern.
// -e option is used to search and replace more than 1 pattern.

ubuntu: user% sed -e 's/test/TEST/g' -e 's/solaris/redhat/' line
Test1|TEST2|TEST3|2
one|two|three|3
Dev|stag|prod|4
redhat|linux|unix|1
oracle|sybase|mysql|5


8. Use shell variables inside a sed command
//The variable needs to be enclosed in '(single quotes) followed by "(double quotes) and closed in the same order.

ubuntu: user% VAR="test"
ubuntu: user% sed 's/'"$VAR"'/replace/g' line
Test1|replace2|replace3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1

oracle|sybase|mysql|5


9. Search and Replace only on a specific line
//The number before the 's' denotes the line number. The substitution is performed only on that line and not others
ubuntu:user% sed '1 s/test/replace/' line
Test1|replace2|test3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
oracle|sybase|mysql|5
test|testing|tester

10. Search and Replace for a specific set of lines
//In the below example, we are instructing sed to start substitution from line - 5 to the last line - $.
ubuntu:user% sed  '5,$ s/test/replace/' line
Test1|test2|test3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
Test4|replace5|test6|2
oracle|sybase|mysql|5
replace|testing|tester

//Search and replace the string 'test' with 'replace' from line 1 to 5.
ubuntu:user% sed  '1,5 s/test/replace/' line
Test1|replace2|test3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
Test4|replace5|test6|2
oracle|sybase|mysql|5
test|testing|tester


11. Pattern matching using sed:
//In the below example, search and replace 'test' with 'replace' only when the text = oracle is present in the line.
ubuntu:user% sed '/oracle/ s/test/replace/g' line
Test1|test2|test3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
Test4|test5|test6|2
oracle|sybase|mysql|5
oracle|replaceing|replaceer


12. Substitute a string based on start and end pattern.
//In the following example, the 'test' string is replaced by 'replace' only from the start pattern to the end pattern.

Syntax: sed '/start-pattern/,/end-pattern/ /find/replace/g' filename


ubuntu:user% sed '/Test1/,/test6/ s/test/replace/g' line
Test1|replace2|replace3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
Test4|replace5|replace6|2
oracle|sybase|mysql|5
oracle|testing|tester


//Substitue with line ranges

Syntax: sed 'line-number,/search-end-pattern/ /find/replace/g' filename

ubuntu:user% sed '2,/test6/ s/test/replace/g' line
Test1|test2|test3|2
one|two|three|3
Dev|stag|prod|4
solaris|linux|unix|1
Test4|replace5|replace6|2
oracle|sybase|mysql|5
oracle|testing|tester


13. Delete lines using sed
//Delete lines using line number ranges

Syntax: sed 'linenumber-start, linenumber-end d' filename. Note that, the delete option doesn't affect the actual contents in the file.

ubuntu:user% sed '3,4d' line
sed '3,4d' line
Test1|test2|test3|2
one|two|three|3
Test4|test5|test6|2
oracle|sybase|mysql|5
oracle|testing|tester


//Inverse of the above, i.e retain the lines specified and delete the others using the '!' symbol

ubuntu:user% sed '3,4!d' line
Dev|stag|prod|4
solaris|linux|unix|1


14. Regular expression
sed doesn't support the entire suite of regex, but it does support character classes etc.

//The following example demonstrates the use of character classes.


ubuntu:user% sed 's/[0-9]/*/g' line
Test*|test*|test*|*
one|two|three|*
Dev|stag|prod|*
solaris|linux|unix|*
Test*|test*|test*|*
oracle|sybase|mysql|*
oracle|testing|tester

[0-9] , [a-zA-Z], [^0-9] (^ symbol inside the class ignores the characters/numbers).

sort command examples

Sort command in Unix:

The sort command sorts lines of all the named files together and writes the result on the standard output. Comparisons are based on one or  more  sort  keys  extracted from  each line of input. By default, there is one sort key, the entire input line. Lines are ordered  according to the collating sequence of the current locale.

Examples:

ubuntu:user% cat line
test1|test2|test3
one|two|three
dev|stag|prod
solaris|linux|unix
oracle|sybase|mysql

//Sorted by 1st column
ubuntu:user% sort -t'|' -k1 line
dev|stag|prod
one|two|three
oracle|sybase|mysql
solaris|linux|unix
test1|test2|test3

//Sorted by 2nd column
ubuntu:user% sort -t'|' -k2 line
solaris|linux|unix
dev|stag|prod
oracle|sybase|mysql
test1|test2|test3
one|two|three

//Reverses the order
ubuntu:user% sort -rt'|' -k1 line
test1|test2|test3
solaris|linux|unix
oracle|sybase|mysql
one|two|three
dev|stag|prod

//Sorts by number
ubuntu:user% sort -nt'|' -k4 line
solaris|linux|unix|1
test1|test2|test3|2
one|two|three|3
dev|stag|prod|4
oracle|sybase|mysql|5

//Using sort in combination with other commands using pipe
ubuntu:user% cat line | egrep "dev|oracle|one" | sort -nt '|' -k4
one|two|three|3
dev|stag|prod|4
oracle|sybase|mysql|5

//If the delimeter is not specified, then sort command takes the entire line as a string and sorts it based on the first column
ubuntu:user% sort line
dev|stag|prod|4
one|two|three|3
oracle|sybase|mysql|5
solaris|linux|unix|1
test1|test2|test3|2