HELP on Perl array / sorting - trying to convert Korn Shell Script to Perl

newbie_01 · October 21, 2011, 10:09pm

Hi all,

Not sure if this should be in the programming forum, but I believe it will get more response under the Shell Programming and Scripting FORUM.

Am trying to write a customized df script in Perl and need some help with regards to using arrays and file handlers.

At the moment am using

 
system("df -k > /tmp/df_tmp.00");

To re-direct the df output. Am using df -k because some of the Solaris and HP servers does not have df -h, by using df -k, am sure it will work on all of them.

Sample output is as below:

 
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d1       3099287 2482045  555257    82%    /
/proc                      0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
fd                         0       0       0     0%    /dev/fd
/dev/md/dsk/d3       3099287 1595167 1442135    53%    /var
swap                 8663192     368 8662824     1%    /var/run
swap                 8717624   54800 8662824     1%    /tmp
/dev/md/dsk/d4       5003466 4330989  622443    88%    /opt
dev0ns951:/vol/vol_admin/common 321912832 266888556 55024276    83%    /nas_mnt/common
dev0ns951:/vol/vol_admin/admin/cpadocs 39741440 32961924 6779516    83%    /opt/info
dev0ns951:/vol/vol_admin/admin 39741440 32961924 6779516    83%    /nas_mnt/admin
dev0ns951:/vol/vol_admin/docs 8468480 7245924 1222556    86%    /nas_mnt/docs
dev0ns951:/vol/vol_admin/prodhome 33658880 26586948 7071932    79%    /nas_mnt/prodhome
dev0ns951:/vol/vol_admin/saphome   92160   37960   54200    42%    /nas_mnt/saphome
dev0ns951:/vol/vol_admin/prodhome 33658880 26586948 7071932    79%    /home/users

Then I get rid of the header as below, more dependence on using Unix OS commands, sorry Perl gurus, don't know what's the equivalent of the commands below in Perl

 
$num_lines=`wc -l /tmp/df_${processid}.00 | awk '{ print \$1 }'`;
$num_lines=$num_lines-1;
system("tail -${num_lines} /tmp/df_tmp.00 > /tmp/df_tmp.01");

Then I do the lines below which read each line to an array:

 
open(DFTMP, "/tmp/df_tmp.01");
while ( <DFTMP> )
{
   chomp;
   @df_lines= split(' ',$_);
}
close DFTMP;

My question is, first of all, how to do an array operation where I can operate on the field/column 2,3,4 where I can divide them by 1024 or 1024/1024 so their KB values are converted to MB or GB? Or do I have to foreach each array member and do the division line by line? Would be nice if I can use the df header as hash references

Also need to be able to get the max(length) of each column so I can use it for formatting the output and I can't find a Perl max or min function :(-

BTW, if anyone is interested to know what am trying to do, I've attached a version of the script in Korn shell.

Am wanting to convert it to Perl 'coz I have a server that has 30+ lines of df output and it takes ages to run using Korn shell. I am hoping that it will run faster in Perl. Plus it is a good exercise to learn Perl arrays and sorting?

Can anyone suggest the "best" Perl forum that I can post this question to? Any response/advise will be much appreciated.

Thanks in advance.

Sample output of the run using the Korn shell script as below, using df-m:

 
Filesystem                                        MBytes      Used     Avail Capacity Mount
---------------------------------------------  --------- --------- --------- -------- -----------------------------------
/dev/md/dsk/d1                                   3027-MB   2424-MB    542-MB      82% /
/dev/md/dsk/d3                                   3027-MB   1560-MB   1406-MB      53% /var
/dev/md/dsk/d4                                   4886-MB   4229-MB    608-MB      88% /opt
/proc                                               0-MB      0-MB      0-MB       0% /proc
dev0ns951:/vol/vol_admin/docs                    8271-MB   7076-MB   1194-MB      86% /nas_mnt/docs
dev0ns951:/vol/vol_admin/prodhome               32870-MB  25964-MB   6906-MB      79% /home/users
dev0ns951:/vol/vol_admin/prodhome               32870-MB  25964-MB   6906-MB      79% /nas_mnt/prodhome
dev0ns951:/vol/vol_admin/saphome                   91-MB     37-MB     53-MB      42% /nas_mnt/saphome
fd                                                  0-MB      0-MB      0-MB       0% /dev/fd
mnttab                                              0-MB      0-MB      0-MB       0% /etc/mnttab
swap                                             8460-MB      0-MB   8459-MB       1% /var/run
swap                                             8513-MB     54-MB   8459-MB       1% /tmp

durden_tyler · October 22, 2011, 1:02am

newbie_01:

...

 
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d1       3099287 2482045  555257    82%    /
/proc                      0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
fd                         0       0       0     0%    /dev/fd
/dev/md/dsk/d3       3099287 1595167 1442135    53%    /var
swap                 8663192     368 8662824     1%    /var/run
swap                 8717624   54800 8662824     1%    /tmp
/dev/md/dsk/d4       5003466 4330989  622443    88%    /opt
dev0ns951:/vol/vol_admin/common 321912832 266888556 55024276    83%    /nas_mnt/common
dev0ns951:/vol/vol_admin/admin/cpadocs 39741440 32961924 6779516    83%    /opt/info
dev0ns951:/vol/vol_admin/admin 39741440 32961924 6779516    83%    /nas_mnt/admin
dev0ns951:/vol/vol_admin/docs 8468480 7245924 1222556    86%    /nas_mnt/docs
dev0ns951:/vol/vol_admin/prodhome 33658880 26586948 7071932    79%    /nas_mnt/prodhome
dev0ns951:/vol/vol_admin/saphome   92160   37960   54200    42%    /nas_mnt/saphome
dev0ns951:/vol/vol_admin/prodhome 33658880 26586948 7071932    79%    /home/users

Then I get rid of the header as below, ... don't know what's the equivalent of the commands below in Perl
...
My question is, first of all, how to do an array operation where I can operate on the field/column 2,3,4 where I can divide them by 1024 or 1024/1024 so their KB values are converted to MB or GB? Or do I have to foreach each array member and do the division line by line? Would be nice if I can use the df header as hash references

Also need to be able to get the max(length) of each column so I can use it for formatting the output and I can't find a Perl max or min function :(-

...
Sample output of the run using the Korn shell script as below, using df-m:

 
Filesystem                                        MBytes      Used     Avail Capacity Mount
---------------------------------------------  --------- --------- --------- -------- -----------------------------------
/dev/md/dsk/d1                                   3027-MB   2424-MB    542-MB      82% /
/dev/md/dsk/d3                                   3027-MB   1560-MB   1406-MB      53% /var
/dev/md/dsk/d4                                   4886-MB   4229-MB    608-MB      88% /opt
/proc                                               0-MB      0-MB      0-MB       0% /proc
dev0ns951:/vol/vol_admin/docs                    8271-MB   7076-MB   1194-MB      86% /nas_mnt/docs
dev0ns951:/vol/vol_admin/prodhome               32870-MB  25964-MB   6906-MB      79% /home/users
dev0ns951:/vol/vol_admin/prodhome               32870-MB  25964-MB   6906-MB      79% /nas_mnt/prodhome
dev0ns951:/vol/vol_admin/saphome                   91-MB     37-MB     53-MB      42% /nas_mnt/saphome
fd                                                  0-MB      0-MB      0-MB       0% /dev/fd
mnttab                                              0-MB      0-MB      0-MB       0% /etc/mnttab
swap                                             8460-MB      0-MB   8459-MB       1% /var/run
swap                                             8513-MB     54-MB   8459-MB       1% /tmp

A sample Perl script for your problem is posted below. I'll answer your questions first.

To get rid of the header, skip the line if line number is 1. The statement with the "next" keyword (in the script below) does that.

You do not have to use "foreach" loop, since the array indexes are fixed. The "a" qualifier splits the input line into an array called "@F". We know that the elements at indexes 1, 2 and 3 are to be divided by 1024, so we do just hard-code those. How do we know that those indexes are fixed? Because the output of "df -k" has those columns in the pre-defined positions.

Perl has the min and max functions in the List::Util module, which is a part of the standard distribution (>= ver 5.6).

$
$
$ perl -le 'BEGIN{use List::Util qw(min max)} @x = qw (10 20 -1 99 78); print "Min = ",min @x; print "Max = ",max @x'
Min = -1
Max = 99
$
$

Although this is not really required for your case. Since we are anyway looping through all the lines, we just need to initialize a max length variable, and reset it if we encounter a bigger length of the first token.

The Perl script below works on the data in the file "f17".

$
$
$ cat f17
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d1       3099287 2482045  555257    82%    /
/proc                      0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
fd                         0       0       0     0%    /dev/fd
/dev/md/dsk/d3       3099287 1595167 1442135    53%    /var
swap                 8663192     368 8662824     1%    /var/run
swap                 8717624   54800 8662824     1%    /tmp
/dev/md/dsk/d4       5003466 4330989  622443    88%    /opt
dev0ns951:/vol/vol_admin/common 321912832 266888556 55024276    83%    /nas_mnt/common
dev0ns951:/vol/vol_admin/admin/cpadocs 39741440 32961924 6779516    83%    /opt/info
dev0ns951:/vol/vol_admin/admin 39741440 32961924 6779516    83%    /nas_mnt/admin
dev0ns951:/vol/vol_admin/docs 8468480 7245924 1222556    86%    /nas_mnt/docs
dev0ns951:/vol/vol_admin/prodhome 33658880 26586948 7071932    79%    /nas_mnt/prodhome
dev0ns951:/vol/vol_admin/saphome   92160   37960   54200    42%    /nas_mnt/saphome
dev0ns951:/vol/vol_admin/prodhome 33658880 26586948 7071932    79%    /home/users
$
$
$
$ perl -lane 'next if $.==1;
              $maxlen = length($F[0]) if length($F[0]) > $maxlen;
              $F[1] = int($F[1]/1024)."-MB";
              $F[2] = int($F[2]/1024)."-MB";
              $F[3] = int($F[3]/1024)."-MB";
              @{$x[$i++]} = @F;
              END {
                $fmt = "%-${maxlen}s  %10s  %10s  %10s  %10s  %-20s\n"; print;
                printf ($fmt, "Filesystem", "MBytes", "Used", "Avail", "Capacity", "Mount");
                printf ($fmt, "-"x${maxlen}, "-"x10, "-"x10, "-"x10, "-"x10, "-"x20);
                @y = map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, $_->[0] ], @x;
                foreach $item (@y) { printf ($fmt, @$item) } print;
              }
             ' f17

Filesystem                                  MBytes        Used       Avail    Capacity  Mount
--------------------------------------  ----------  ----------  ----------  ----------  --------------------
/dev/md/dsk/d1                             3026-MB     2423-MB      542-MB         82%  /
/dev/md/dsk/d3                             3026-MB     1557-MB     1408-MB         53%  /var
/dev/md/dsk/d4                             4886-MB     4229-MB      607-MB         88%  /opt
/proc                                         0-MB        0-MB        0-MB          0%  /proc
dev0ns951:/vol/vol_admin/admin            38810-MB    32189-MB     6620-MB         83%  /nas_mnt/admin
dev0ns951:/vol/vol_admin/admin/cpadocs    38810-MB    32189-MB     6620-MB         83%  /opt/info
dev0ns951:/vol/vol_admin/common          314368-MB   260633-MB    53734-MB         83%  /nas_mnt/common
dev0ns951:/vol/vol_admin/docs              8270-MB     7076-MB     1193-MB         86%  /nas_mnt/docs
dev0ns951:/vol/vol_admin/prodhome         32870-MB    25963-MB     6906-MB         79%  /nas_mnt/prodhome
dev0ns951:/vol/vol_admin/prodhome         32870-MB    25963-MB     6906-MB         79%  /home/users
dev0ns951:/vol/vol_admin/saphome             90-MB       37-MB       52-MB         42%  /nas_mnt/saphome
fd                                            0-MB        0-MB        0-MB          0%  /dev/fd
mnttab                                        0-MB        0-MB        0-MB          0%  /etc/mnttab
swap                                       8460-MB        0-MB     8459-MB          1%  /var/run
swap                                       8513-MB       53-MB     8459-MB          1%  /tmp

$
$
$

Finally, you do not have to create a temporary file that stores the value of "df" command. You could simply pipe it to the Perl script:

df -k | perl -lane '... <script> ...'

tyler_durden

newbie_01 · October 22, 2011, 8:58am

Hi tyler_durden.

Thanks for a very quick response. Very much appreciated, helpful and finally got something to move forward with. Ran the script that you provided on a server that has almost 200 defined filesystem, it ran for a minute using my shell script and took only 5 seconds using Perl.

Didn't know I can check for line numbers like this. Can I do the same check for files when reading each line of a file?

The "a" qualified that you referring to, is that the command line switch/option in perl -lane?

Perl has the min and max functions in the List::Util module, which is a part of the standard distribution (>= ver 5.6).
$
$
$ perl -le 'BEGIN{use List::Util qw(min max)} @x = qw (10 20 -1 99 78); print "Min = ",min @x; print "Max = ",max @x'
Min = -1
Max = 99
$
$

Unfortunately, I still got some servers where the Perl is < 5.6 due to some old application that can't be upgraded and management don't want to touch whatever are installed on those servers.

And some of the 5.6 versions that I have, they do not have that module :(. Wish am the SA and I can install those modules.

Although this is not really required for your case. Since we are anyway looping through all the lines, we just need to initialize a max length variable, and reset it if we encounter a bigger length of the first token.

The Perl script below works on the data in the file "f17".

$
$
$ cat f17
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d1       3099287 2482045  555257    82%    /
/proc                      0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
fd                         0       0       0     0%    /dev/fd
/dev/md/dsk/d3       3099287 1595167 1442135    53%    /var
swap                 8663192     368 8662824     1%    /var/run
swap                 8717624   54800 8662824     1%    /tmp
/dev/md/dsk/d4       5003466 4330989  622443    88%    /opt
dev0ns951:/vol/vol_admin/common 321912832 266888556 55024276    83%    /nas_mnt/common
dev0ns951:/vol/vol_admin/admin/cpadocs 39741440 32961924 6779516    83%    /opt/info
dev0ns951:/vol/vol_admin/admin 39741440 32961924 6779516    83%    /nas_mnt/admin
dev0ns951:/vol/vol_admin/docs 8468480 7245924 1222556    86%    /nas_mnt/docs
dev0ns951:/vol/vol_admin/prodhome 33658880 26586948 7071932    79%    /nas_mnt/prodhome
dev0ns951:/vol/vol_admin/saphome   92160   37960   54200    42%    /nas_mnt/saphome
dev0ns951:/vol/vol_admin/prodhome 33658880 26586948 7071932    79%    /home/users
$
$
$
$ perl -lane 'next if $.==1;
   $maxlen = length($F[0]) if length($F[0]) > $maxlen;
   $F[1] = int($F[1]/1024)."-MB";
   $F[2] = int($F[2]/1024)."-MB";
   $F[3] = int($F[3]/1024)."-MB";
   @{$x[$i++]} = @F;
   END {
   $fmt = "%-${maxlen}s  %10s  %10s  %10s  %10s  %-20s\n"; print;
   printf ($fmt, "Filesystem", "MBytes", "Used", "Avail", "Capacity", "Mount");
   printf ($fmt, "-"x${maxlen}, "-"x10, "-"x10, "-"x10, "-"x10, "-"x20);
   @y = map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, $_->[0] ], @x;
   foreach $item (@y) { printf ($fmt, @$item) } print;
   }
   ' f17
 
Filesystem                                  MBytes        Used       Avail    Capacity  Mount
--------------------------------------  ----------  ----------  ----------  ----------  --------------------
/dev/md/dsk/d1                             3026-MB     2423-MB      542-MB         82%  /
/dev/md/dsk/d3                             3026-MB     1557-MB     1408-MB         53%  /var
/dev/md/dsk/d4                             4886-MB     4229-MB      607-MB         88%  /opt
/proc                                         0-MB        0-MB        0-MB          0%  /proc
dev0ns951:/vol/vol_admin/admin            38810-MB    32189-MB     6620-MB         83%  /nas_mnt/admin
dev0ns951:/vol/vol_admin/admin/cpadocs    38810-MB    32189-MB     6620-MB         83%  /opt/info
dev0ns951:/vol/vol_admin/common          314368-MB   260633-MB    53734-MB         83%  /nas_mnt/common
dev0ns951:/vol/vol_admin/docs              8270-MB     7076-MB     1193-MB         86%  /nas_mnt/docs
dev0ns951:/vol/vol_admin/prodhome         32870-MB    25963-MB     6906-MB         79%  /nas_mnt/prodhome
dev0ns951:/vol/vol_admin/prodhome         32870-MB    25963-MB     6906-MB         79%  /home/users
dev0ns951:/vol/vol_admin/saphome             90-MB       37-MB       52-MB         42%  /nas_mnt/saphome
fd                                            0-MB        0-MB        0-MB          0%  /dev/fd
mnttab                                        0-MB        0-MB        0-MB          0%  /etc/mnttab
swap                                       8460-MB        0-MB     8459-MB          1%  /var/run
swap                                       8513-MB       53-MB     8459-MB          1%  /tmp
 
$
$
$

Finally, you do not have to create a temporary file that stores the value of "df" command. You could simply pipe it to the Perl script:

df -k | perl -lane '... <script> ...'

tyler_durden

I did as below. Not sure if that is what you meant to say.

I created three (3) files, dfk.pl, dfm.pl and dfg.pl. All of them contains similar codes except for the $F[1] = int($F[1]/1024)."-MB"; lines, the dfg.pl have it at $F[1] = int($F[1]/1024/1024)."-MB"; and dfk.pl have it as simply $F[1] = int($F[1])."-KB";

Sample content of dfm.pl as below:

cat dfm.pl
/bin/perl -lane 'next if $.==1;
              $maxlen = length($F[0]) if length($F[0]) > $maxlen;
              $F[1] = int($F[1]/1024)."-MB";
              $F[2] = int($F[2]/1024)."-MB";
              $F[3] = int($F[3]/1024)."-MB";
              @{$x[$i++]} = @F;
              END {
                $fmt = "%-${maxlen}s  %10s  %10s  %10s  %10s  %-20s\n"; print;
                printf ($fmt, "Filesystem", "MBytes", "Used", "Avail", "Capacity", "Mount");
                printf ($fmt, "-"x${maxlen}, "-"x10, "-"x10, "-"x10, "-"x10, "-"x20);
                @y = map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, $_->[0] ], @x;
                foreach $item (@y) { printf ($fmt, @$item) } print;
              }
             '

Then I run df -k | dfm.pl. Output is as I want them to be. So thanks a lot. That looks good for the meantime.

If you don't mind, got some more additional questions.

If I really need to store the output of the df command and "manipulate" the data in some way, can I re-direct the df output to an array, i.e. for example, @df=`df -k`? On some servers, the filesystem are owned by different business groups, so I need to check which filesystem belong to which business group and then send an email to whichever business group own the filesystem. I need to check the value of capacity and then make some computation of how much space need to be requested.

For the print format, RE: max length, you are checking for max length only for the first column and the others you set them to be at constant max length of either 10 or 20, is that correct?

From your codes below, you are storing the computed values of the @F array to the array named @x and then after that sort @x and assigning them to @y to get printed, is that correct?

 
$ perl -lane 'next if $.==1;
            $maxlen = length($F[0]) if length($F[0]) > $maxlen;
            $F[1] = int($F[1]/1024)."-MB";
            $F[2] = int($F[2]/1024)."-MB";
            $F[3] = int($F[3]/1024)."-MB";
            @{$x[$i++]} = @F;
            END {
              $fmt = "%-${maxlen}s  %10s  %10s  %10s  %10s  %-20s\n"; print;
              printf ($fmt, "Filesystem", "MBytes", "Used", "Avail", "Capacity", "Mount");
              printf ($fmt, "-"x${maxlen}, "-"x10, "-"x10, "-"x10, "-"x10, "-"x20);
              @y = map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, $_->[0] ], @x;
              foreach $item (@y) { printf ($fmt, @$item) } print;
            }
           ' f17

Thanks again for your help. This is the best answer I've received for this post. Been banging my head against the wall for a couple of days already. :wall::o

durden_tyler · November 6, 2011, 1:26am

Yes, you can.

Yes, it is. Type

perl --help

for description of all switches.

Any approach is fine if it solves your problem and is easily maintainable.

Yes, you can redirect the output of df -k to a Perl array, but I don't see why you would need that. The processing steps you've mentioned could be performed quite easily by a Perl script that is piped to df -k.

That is correct.

tyler_durden