AWK Essentials
AWK is a really nice programming language which is installed by default on standard *nixes. Many people do not really use AWK in the way it was meant since we have in many ways moved away from doing accounting stuff in plain text files. It does however provide you with a untyped language with a very low memory footprint! I will keep this as simplified as possible but assume you are running a *nix with an AWK which supports variables.
The base of AWK is:
PATTERN ACTION
Or a bit more descriptive, a pattern is run for each line in the input and if the pattern is noticed execute its action.
PATTERN is any regexp and ACTION is a block of code. Patterns are executed from top to bottom. AWK is untyped which means that you can only use native types but you don’t need to specify exactly what a variable is. To make it concrete lets make a program which prints your user name. Fire up a terminal and run:
user@lcl:~> awk 'BEGIN {print ENVIRON["USER"]}'
So what is going on? Firstly we specify an inline awk script and run it directly with awk. In our script we utilize BEGIN which is the first out of two builtin patterns, the other one being END. Both BEGIN and END is run exactly once when the interpreter BEGINs and ENDs parsing. We didn’t specify any file to run awk with which means it initialized the parser, ran BEGIN which had code and then terminated, but only after checking if there was an END clause. print is a standard function in awk just like (not exhaustive), system, split, length and substr. ‘print’ just prints text to stdout.
In AWK you can make heavy use of associative arrays, in this case ENVIRON is an array containing all of the current environment variables. This array is filled by your AWK runtime and is not really a part of the language. You may of course specify own arrays at will in any pattern action. Lets look at another example!
#!/usr/bin/awk -f
BEGIN {
main(ENVIRON["PATH"])
}
function main(string)
{
split(string,paths,":")
for(i in paths)
{
print paths[i]
}
return
}
This is an AWK script which need to be executed, suppose its name is myPaths then it will be executed as ‘./myPaths’. In it we specify a main function taking a string argument which splits the string on ‘:’ and prints each of the tokens. ’split’ takes a string input, a target array and a list of separators which tokenizes the input.
Things to notice is the “for-each” like syntax in the loop, the array ‘paths’ which is not declared before we use it and that the pattern block cannot be written like the function block. One thing which is not stated is that the arrays index starts at 1 and not the normal 0.
Finally lets look at a one-liner which prints the last found active interface! This assumes the use of a program ‘ifconfig’ giving the following output:
eth0 Link encap:Ethernet HWaddr 00:1E:8C:E5:E5:C2
inet addr:xxx.xxx.x.xx Bcast:xxx.xxx.x.xxx Mask:255.255.255.0
inet6 addr: xxxx::xxx:xxxx:xxxx:xxxx/xx Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:579259192 errors:0 dropped:0 overruns:0 frame:0
TX packets:338999063 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:300177137147 (286271.2 Mb) TX bytes:905103595491 (863174.0 Mb)
Interrupt:18
eth1 Link encap:Ethernet HWaddr 00:E0:4D:4A:A2:D2
inet addr:xx.xx.xxx.xxx Bcast:xx.xx.xxx.xxx Mask:255.255.255.0
inet6 addr: xxxx::xxx:xxxx:xxxx:xxxx/xx Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:886942313 errors:1 dropped:6 overruns:1 frame:0
TX packets:1060224372 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:599343450907 (571578.4 Mb) TX bytes:936322147776 (892946.3 Mb)
Interrupt:21 Base address:0xac00
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:74323801 errors:0 dropped:0 overruns:0 frame:0
TX packets:74323801 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:16951245193 (16165.9 Mb) TX bytes:16951245193 (16165.9 Mb)
the script, which is not kosher, is used in the following manner:
user@lcl:~> ifconfig | awk \
'/encap:Ethernet/{iface = $1} \
/UP.*RUNNING.*MULTICAST/{activeIface=iface} \
END{print activeIface}'
the AWK script has two patterns, which are as mentioned executed in order. The first registers the first field in any line containing a ‘encap:Ethernet’. We can see that for these lines the first field to be the iface name. The second line catches any line which states that the interface is up and if it is stashes the iface in activeIface for later printing in END.
Happy AWKing!