The order I learned things:
perl, shell, awk.
The order I wish I'd learned things:
shell, awk, perl.
They all have their uses... If you find yourself writing system() over and over, it probably ought to be a shell script. If your shell scripts are full of 'while read', they can probably be simplified with awk. And if you're facing some egregiously complex regex work, perl might be good for that.
But awk is powerful and simple, simple enough to use off-the-cuff. Imagine a tool which reads lines like grep/sed, splits columns like cut, has easy expressions like C's if(X>Y) { ... }
, and associative arrays like perl, but easier than shell or perl. It's simple to learn, high enough performance to use with data in the hundreds of megabytes, and excellent for writing data translators in.
At its simplest, you give it a simple expression to decide whether it prints the current line or not. Any expression will do. Expressions are C-like, with proper variables, brackets, and math.
Here's a 1-character program to emulate cat. Since '1' is always true(zero numbers or blank strings are false, all else is true), all lines are printed:
awk '1' file1 file2 file3
Put a regex into there instead and it becomes a 'grep':
awk '/regex/' file1 file2 file3
What if you want to know which filename it came from, to print lines like 'file: asdf'. awk has special variables for various things, and FILENAME is one of them.
Unusually, awk also lets you alter most of its special variables, letting you do things which would be lines of complicated regex in sed or split and loops in perl. $0 means the entire line; here we (always) prepend the filename to it, then print only whenever /regex/ is true. You could also do $5="asdf" to alter the value of the fifth column.
Since we put a code block after it, we've overrided the default 'print' function, and have to put a 'print' inside the code itself.
awk '/regex/ { $0=FILENAME ": " $0; print }' file1 file2 file3
What if you needed it to match two different regexes? Just add another to the expression.
awk '/regex/ || /another/ { $0=FILENAME ": " $0; print }' file1 file2 file3
Now, how about something grep can't do, like "if a line matches /regex/, print it and two more lines?" You can call 'getline' by itself to read the next line of input. (You can also use it to read into other variables and/or from other files -- this is just its most basic use)
awk '/regex/ { print ; getline ; print ; getline ; print }' file1 file2 file3
awk also has associative arrays. What if you wanted to sum up all data with the same first column? 'END' here is just another expression, but a special one, which is only true after all data is read. $1 is the first column, $2 is the second column.
$ awk '{ A[$1]+=$2 } END { for(X in A) print X, A[X] }' <<EOF
a 1
a 2
a 3
b 1
b 2
b 3
EOF
a 6
b 6
$
...except oops, our data is separated with |, not space. What shall we do? The -F option changes the special variable FS to handle this:
$ awk -F'|' '{ A[$1]+=$2 } END { for(X in A) print X, A[X] }' <<EOF
a|1
a|2
a|3
b|1
b|2
b|3
EOF
a 6
b 6
$
...and what if we wanted our output split by | too? That's just another special variable, OFS. It doesn't have its own option but -v can set any variable. I'm throwing VAR='asdf' in there to show how to easily import shell strings and variables into awk...
awk -F'|' -v OFS="|" -v VAR="asdf" '{ A[$1]+=$2 } END { for(X in A) print X, A[X], VAR }' <<EOF
a|1
a|2
a|3
b|1
b|2
b|3
EOF
a|6|asdf
b|6|asdf
$
Okay, but what if you wanted the 'a' total in a file named 'a'? awk even has redirection:
awk -F'|' -v OFS="|" -v VAR="asdf" '{ A[$1]+=$2 } END { for(X in A) print X, A[X], VAR >X }' <<EOF
a|1
a|2
a|3
b|1
b|2
b|3
EOF
$ cat a
a|6|asdf
$ cat b
b|6|asdf
$
Perl can do all this too, but it's much more complicated and makes you do everything explicitly.
Perl's one real indispensible use, I think? Date math. Not much else you can depend on to convert arbitrary dates into epoch seconds the same way on linux, solaris, and aix...