Should I focus efforts on learning Perl or develop skills in awk, sed, etc

brianjb · May 8, 2012, 1:32pm

Good afternoon,

I am not trying to start a debate. Please don't take it that way. I'm not trying to make this a Perl versus Bash scripts thing.

I have been writing shell scripts for several years. I am not 100%, but I seem to get the job done. I would like to start focusing on spending some spare time in tuning my skill set and becoming a better scripter in the process. So do I need to focus on shell scripting (awk, sed, etc) or will it be useful if I do Perl? I am thinking that I need to pick or the other.....kind of better to be 100% at one than 50% at each. Is that right?

Here is what I do currently (I may be leaving some things off):

Run MYSQL queries to get data out of a large database. We will be switching to an Oracle database over the next year as we change vendors on one of the products we use.
Process DNS zone files. Sometimes there is a need to generate DNS zone files based on reading in data from various forms (csv files, etc).
Write scripts that monitor system performance. They log and also send mail if there are issues.
Write scripts that query the MYSQL database and either mail that information or create csv file.
As you know, we are moving from MYSQL database to Oracle database since we are changing products. I also write scripts that query the database and put the output into a csv that is in the right format for importing into the new product via their CLI's.

As you can see, pretty much the gist of what I do is to take data (sometimes large amounts) and then put it into a certain format. Perl people say that Perl is the best...and old times UNIX guys, say that it can all be done with shell scripts. Both of them are partially right.

Do you think I need to focus my efforts on fine tuning my shell scripting knowledge or should I also learn Perl.

Keep in mind, that I don't know what the future holds. I would like to make myself marketable in the future.

Thanks!

bartus11 · May 8, 2012, 1:39pm

I'd say go with learning Perl if you want to keep being competitive in the job market. Perl scripting is highly desirable skill for SysAdmins.

joeyg · May 8, 2012, 2:13pm

I would recommend that you begin learning perl. While at it, probably become better at sed and awk. There is no best, merely different tools to be used at different times.

Can you bang a nail in with a large wrench? Sure, but better to use the proper tool for this instance -- a hammer. Likewise, each situation will present different challenges, and lead you to one approach or another.

Corona688 · May 8, 2012, 2:40pm

The order I learned things:

perl, shell, awk.

The order I wish I'd learned things:

shell, awk, perl.

They all have their uses... If you find yourself writing system() over and over, it probably ought to be a shell script. If your shell scripts are full of 'while read', they can probably be simplified with awk. And if you're facing some egregiously complex regex work, perl might be good for that.

But awk is powerful and simple, simple enough to use off-the-cuff. Imagine a tool which reads lines like grep/sed, splits columns like cut, has easy expressions like C's if(X>Y) { ... } , and associative arrays like perl, but easier than shell or perl. It's simple to learn, high enough performance to use with data in the hundreds of megabytes, and excellent for writing data translators in.

At its simplest, you give it a simple expression to decide whether it prints the current line or not. Any expression will do. Expressions are C-like, with proper variables, brackets, and math.

Here's a 1-character program to emulate cat. Since '1' is always true(zero numbers or blank strings are false, all else is true), all lines are printed:

awk '1' file1 file2 file3

Put a regex into there instead and it becomes a 'grep':

awk '/regex/' file1 file2 file3

What if you want to know which filename it came from, to print lines like 'file: asdf'. awk has special variables for various things, and FILENAME is one of them.

Unusually, awk also lets you alter most of its special variables, letting you do things which would be lines of complicated regex in sed or split and loops in perl. $0 means the entire line; here we (always) prepend the filename to it, then print only whenever /regex/ is true. You could also do $5="asdf" to alter the value of the fifth column.

Since we put a code block after it, we've overrided the default 'print' function, and have to put a 'print' inside the code itself.

awk '/regex/ { $0=FILENAME ": " $0; print }' file1 file2 file3

What if you needed it to match two different regexes? Just add another to the expression.

awk '/regex/ || /another/ { $0=FILENAME ": " $0; print }' file1 file2 file3

Now, how about something grep can't do, like "if a line matches /regex/, print it and two more lines?" You can call 'getline' by itself to read the next line of input. (You can also use it to read into other variables and/or from other files -- this is just its most basic use)

awk '/regex/ { print ; getline ; print ; getline ; print }' file1 file2 file3

awk also has associative arrays. What if you wanted to sum up all data with the same first column? 'END' here is just another expression, but a special one, which is only true after all data is read. $1 is the first column, $2 is the second column.

$ awk '{ A[$1]+=$2 } END { for(X in A) print X, A[X] }' <<EOF
a 1
a 2
a 3
b 1
b 2
b 3
EOF

a 6
b 6

$

...except oops, our data is separated with |, not space. What shall we do? The -F option changes the special variable FS to handle this:

$ awk -F'|' '{ A[$1]+=$2 } END { for(X in A) print X, A[X] }' <<EOF
a|1
a|2
a|3
b|1
b|2
b|3
EOF

a 6
b 6

$

...and what if we wanted our output split by | too? That's just another special variable, OFS. It doesn't have its own option but -v can set any variable. I'm throwing VAR='asdf' in there to show how to easily import shell strings and variables into awk...

awk -F'|' -v OFS="|" -v VAR="asdf" '{ A[$1]+=$2 } END { for(X in A) print X, A[X], VAR }' <<EOF
a|1
a|2
a|3
b|1
b|2
b|3
EOF

a|6|asdf
b|6|asdf

$

Okay, but what if you wanted the 'a' total in a file named 'a'? awk even has redirection:

awk -F'|' -v OFS="|" -v VAR="asdf" '{ A[$1]+=$2 } END { for(X in A) print X, A[X], VAR >X }' <<EOF
a|1
a|2
a|3
b|1
b|2
b|3
EOF

$ cat a

a|6|asdf

$ cat b

b|6|asdf

$

Perl can do all this too, but it's much more complicated and makes you do everything explicitly.

Perl's one real indispensible use, I think? Date math. Not much else you can depend on to convert arbitrary dates into epoch seconds the same way on linux, solaris, and aix...

brianjb · May 8, 2012, 2:45pm

Thanks bartus and joey. That is kind of what I was thinking in the back of my mind. I appreciate your feedback.

I have the 'Learning Perl' and the 'Programming Perl' books from O'Reilly. I have started to go through the 'Learning Perl' book and do the exercises.

Is there any book that you can recommend on fine tuning sed and awk skills? Is there a particular book? I saw one on amazon.com by O'Reilly. Or should I continue to just read these forums to see how different people solve different problems or issues?

---------- Post updated at 01:45 PM ---------- Previous update was at 01:41 PM ----------

Thanks Corona. I definitely see the power of awk. I need to learn that more.

Thanks for the input.

Corona688 · May 8, 2012, 2:59pm

I've been editing that repeatedly over the last while, sorry.

shamrock · May 8, 2012, 3:01pm

This is the only book on awk I would get.

Thogh Perl is the new kid on the block I would recommend getting proficient in awk as it comes with every linux/unix flavor while perl is still an add on.

bartus11 · May 8, 2012, 3:23pm

I would argue with the statement that Perl is an add-on. It comes by default even with older UNIX systems like Solaris 8 or AIX 5 (don't know about older as I don't have access to those now). While AWK indeed can be found on any UNIX/Linux machine, the problem in my opinion is in its older versions' lack of features.

Anyway it is good to know both AWK and Perl ;).

brianjb · May 8, 2012, 3:28pm

That is my thought. I am teaching myself Perl. I am also learning more awk while at it. I can do simple things with both. The part I need to work harder on is taking huge gobs of data and twisting it into the desired output.

Thanks to everyone for the feedback.

I sent an email to our admin asking to order me these two books:

Amazon.com: Effective awk Programming (3rd Edition) (9780596000707): Arnold Robbins: Books

Amazon.com: The AWK Programming Language (9780201079814): Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger: Books

I am excited to get them, and hope that those in addition to my Perl books, along with reading this forum I will be on the way.

Like I said, I like to read this forum to see how the experts solve problems. There is more than one way to do things, right? I like to see all of those ways.

bartus11 · May 8, 2012, 3:35pm

One last advice from me is to not only read the Forums, but also try and help posters with their problems. This is how I learned AWK and to some extent Perl ;). Good luck.

brianjb · May 8, 2012, 3:37pm

Good point. They are real life examples of exercises that you use the tools for. I have been lurking around for long time...it's time I get more involved.

fpmurphy · May 8, 2012, 9:13pm

The advise I now give people who ask me whether they should learn Perl is to first learn Python. Perl is the old kid on the sysadm block. Python is what more and more sysadm work is done with - at least on Linux systems.