AWK

AWK is a scripting language used for manipulating data and generating reports. It's a Domain Specific Language.

Demo Using AWK

wget 
https://raw.githubusercontent.com/gchandra10/awk_scripts_data_science/master/sales_100.csv

Display file contents

awk '{print }' sales_100.csv

By default, AWK uses space as a delimiter. Since our file has a comma (,) let's specify it with -F

awk -F ',' '{print }' sales_100.csv

To get the number of columns of each row, use the NF (a predefined variable)

awk -F ',' '{print NF}' sales_100.csv

Instead of seeing all 14 columns each time, AWK lets you choose specific columns.

awk -F ',' '{print $1,$2,$4}' sales_100.csv

Row Filter

AND = &&

OR = ||

Not = !

awk -F ',' '{if($4 == "Online") {print $1,$2,$4}}' sales_100.csv

awk -F ',' '{if($4 == "Online" && $5 =="L") {print $1,$2,$4,$5}}' sales_100.csv

Variables

awk -F ',' '{sp=$9 * $10;cp=$9 * $11; {printf "%f,%f,%s,%s \n",sp,cp,$1,$2 }}' sales_100.csv

RegEx: Return all rows starting with A in Column 1

awk -F ',' '$1 ~ /^A/ {print}' sales_100.csv

Return all rows which have Space in Column 1

awk -F ',' '$1 ~ /\s/ {print}' sales_100.csv

AWK also has the functionality to change the column and row delimiter

OFS: Output Field Separator

ORS: Output Row Separator

awk -F ',' 'BEGIN{OFS="|";ORS="\n\n"} $1 ~ /^A/ {print substr($1,1,4),$2,$3,$4,$5}' sales_100.csv

Built-in Functions

awk -F ',' 'BEGIN{OFS="|";ORS="\n"} $1 ~ /^A/ {print tolower(substr($1,1,4)),tolower($2),$3,$4,$5}' sales_100.csv