Sorting awk array output?

Hi all,

I have a script which produces a nice table but I want to sort it on column 3.

This is the output line in the script:

# Output
        { FS = ":";
        format = "%11s %6s %-16s\n";
        prinft "\n"
        printf ( format, "Size","Count","Who" ) }
        for (i in u_count) {
                if (i != "") {
                { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (u_size>=x) {
        usersize = sprintf ( "%.2f %s", u_size/x,hum[x] )
        printf ( format,usersize, u_count, i);break } } }
                          }
                }
        for (i in all_count) {
                if (i != "") {
                { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (all_size>=x) {
        allsize = sprintf ( "%.2f %s", all_size/x,hum[x] )
        printf ( format,allsize, all_count, "Total");break } } }
                        }
        }
}'

And this is a possible output

 Size  Count Who             
    1.96 Gb   2007 Uitvoerend      
   63.54 Mb    167 Juridische      
  354.68 Mb    465 Beleidsvoorbereiding
   36.69 Gb  20439 Marketing       
  966.32 Kb      5 DWHN            
..
...

...
 175.62 Gb 133569 Total  

Is it possible to sort this on column 3 ?

Regards,

Ronald

Try piping this output to:

| sort -rk3

Only -k4 and with -r the header would be the footer. :slight_smile:
Try to add | "sort -k4" to the every print/printf statements in awk except those printing the header/footer.

Where should I put this.... ??

I'm not sure how are you calling the AWK code, but try this:

# Output
        { FS = ":";
        format = "%11s %6s %-16s\n";
        prinft "\n"
        printf ( format, "Size","Count","Who" ) }
        for (i in u_count) {
                if (i != "") {
                { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (u_size>=x) {
        usersize = sprintf ( "%.2f %s", u_size/x,hum[x] )
        printf ( format,usersize, u_count, i);break } } }
                          }
                }
        for (i in all_count) {
                if (i != "") {
                { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (all_size>=x) {
        allsize = sprintf ( "%.2f %s", all_size/x,hum[x] )
        printf ( format,allsize, all_count, "Total");break } } }
                        }
        }
}' | sort -rk3

Almost...
I will try to put it on different levels but you gave me a starter..

  Size  Count Who             
Total Size = 6.16 Gb
  63.02 Mb     56 Automatisering  
   3.64 Mb      1 Karaoke         
  45.79 Mb      2 Muziekfeesten   
   4.95 Gb    175 Opgelicht       
  72.67 Mb     12 Radar           
 493.33 Mb     24 RegelRecht      
   6.16 Gb    308 Total           
 410.69 Mb     10 Vermist         
 146.40 Mb     28 Zappsport       

Ok: this is tricky..
If I put the sort command else where I receive an error :
awk: syntax error at source line 41
context is
} >>> | <<< sort -k4
awk: illegal statement at source line 42

My guess is I have to seperate my header / footer from the data.
(will have to figure that out :P)

This is the complete array output (FYI) :

{
        for (I=9 ; I<=NF ; I++) { x++ } { size=size+$5 }
}
END { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (size>=x) { { printf "Total Size = ",NR }
        printf "%.2f %s\n\n",size/x,hum[x];break }
        }
# Output
        { FS = ":";
        format = "%11s %6s %-16s\n";
        prinft "\n"
        printf ( format, "Size","Count","Who" ) }
        for (i in u_count) {
                if (i != "") {
                { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (u_size>=x) {
        usersize = sprintf ( "%.2f %s", u_size/x,hum[x] )
        printf ( format,usersize, u_count, i);break } } } | sort -k4
                          }
                }           
        for (i in all_count) {
                if (i != "") {
                { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (all_size>=x) {
        allsize = sprintf ( "%.2f %s", all_size/x,hum[x] )
        printf ( format,allsize, all_count, "Total");break } } }
                        }
        }
}' 

Add double quotes:

printf ( format,usersize, u_count, i);break } } } | "sort -k4"

Try this example with and without quotes:

echo 'a
d
b' | awk '{print | "sort"}'
a
b
d

If you're using gawk there is an asort function. If you're piping a command in awk you'd quote it. like:

print "stuff" | "sort -k4"

but it's usually done as so if used repetitively:

cmd = "sort -k4"
print "stuff" | cmd
....
close(cmd)

But the pipes are unidirectional. I wouldn't expect it to print to screen. You'd have to use file redirection.

cmd = "sort -k4 > output.txt"

Remember to close() it or it wouldn't receive the end of input signal and produce results until awk exits. Then maybe a

print "header"
while ((getline < "output.txt") > 0) {
    print;
}
print "footer"

If it's already a large stand-alone script invoked with a '#!/usr/bin/awk -f' I'd personally add a sort function, or try gawk's if you're OK with being dependent on it.

Edit: I guess it will print to screen seeing yazu's example. I was just reading the forums on lunch and didn't have a terminal to play with. :\

That did not work.. it generates an error on the pipe..

Can you show us how are you calling the script?

Oh... yes. Sorry:

printf ( format,usersize, u_count, i)  | "sort -k4";break } } }
1 Like
#!/usr/bin/env bash
#
# 
# Bash Script written by R. Blaas
#
# This script will find files of a specific type and displays the full path and size
# Also a total of found files and size is displayed per Department and an overal total found files and size
# 
# if nothing is passed to the script, show usage and exit
[[ -n "$1" ]] || { echo �Usage: usage.sh [Variable]�; exit 0 ; }

# Make sure only root can run our script
if [ "$(id -u)" != "0" ]; then
    echo "This script must be run as root" 1>&2
    exit 1
fi

# Variables
POSTMASTER="My email"

DATE=`date +"%d%m%Y"` 
DATIME=`date +"%Y%m%d%H%M"`
DAGNAAM=`date +"%A"`

# Set current directory to variable $CURRENT
CURRENT=/opt/local/COMPANY/usage

# Set the log directory
DIR_LOG=$CURRENT"/log"

# Catch search variable
SEARCH=`echo $1 | sed 's/*//' | sed 's/.//'`

# Check if directory exists
if test ! -d "$DIR_LOG"
 then
    mkdir "$DIR_LOG"
fi

# Create temporary file for log messages
POSTMLOG=$DIR_LOG"/"$SEARCH"-usage.log"

# BEGIN
logger "Script start (usage.sh)"
echo `date` start of script usage.sh > $POSTMLOG
echo "" >> $POSTMLOG
echo Search variable = $SEARCH >> $POSTMLOG
echo "" >> $POSTMLOG

# Search Command where $1 is the type to find. Search will start at current location.
find . -iname $1 -exec ls -l {} \; | awk 'BEGIN { 

# Initialize all Arrays
    size = "0"; 
    u_count[""]=0;
    all_count[""]=0;
} 
{
# Assign field names
    sizes=$5
    split($9,a,"/")
    dept=a[2]

# Count of number of files
    u_count[dept]++;
    all_count["* *"]++;

# Count disc space used
    u_size[dept]+=sizes;
    all_size["* *"]+=sizes;
}
{
    for (I=9 ; I<=NF ; I++) { x++ } { size=size+$5 } 
}
END { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb"; 
    for (x=1024**4; x>=1024; x/=1024) { if (size>=x) { { printf "Total Size = ",NR } 
    printf "%.2f %s\n\n",size/x,hum[x];break } 
    }
}

# Output
{
        for (I=9 ; I<=NF ; I++) { x++ } { size=size+$5 }
}
END    { { FS = ":";
    format = "%11s %6s %-16s\n";
    prinft "\n"
    printf ( format, "Size","Count","Who" ) }

    for (i in u_count) {
        if (i != "") {
        { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (u_size>=x) {
    usersize = sprintf ( "%.2f %s", u_size/x,hum[x] )
        printf ( format,usersize, u_count, i);break } } } 
              } 
        }
}  
{
        for (I=9 ; I<=NF ; I++) { x++ } { size=size+$5 }
}
END     { for (i in all_count) {
        if (i != "") {
        { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (all_size>=x) {
    allsize = sprintf ( "%.2f %s", all_size/x,hum[x] )
        printf ( format,allsize, all_count, "Total");break } } } 
            } 
    } 
}' >> $POSTMLOG

echo "" >> $POSTMLOG
echo `date` end of search >> $POSTMLOG

echo `date` send report to POSTMASTER >> $POSTMLOG

# Mail report to POSTMASTER
mail -s " usage.sh REPORT `date`" $POSTMASTER < $POSTMLOG
 if [ "$?" == "0" ]; then
  echo Mail sent successful! >> $POSTMLOG
 else echo Mail sent unsuccesful! >> $POSTMLOG
 fi

echo "" >> $POSTMLOG
echo `date` end of script usage.sh >> $POSTMLOG
echo "" >> $POSTMLOG

# Preserving logfile
mv "$POSTMLOG" "$DIR_LOG/$DATIME-$SEARCH-usage.log"

logger "Script end (usage.sh)"

This is the complete script

So try doing this modification:

END     { for (i in all_count) {
        if (i != "") {
        { hum[1024**4]="Tb"; hum[1024**3]="Gb"; hum[1024**2]="Mb"; hum[1024]="Kb";
        for (x=1024**4; x>=1024; x/=1024) { if (all_size>=x) {
    allsize = sprintf ( "%.2f %s", all_size/x,hum[x] )
        printf ( format,allsize, all_count, "Total");break } } } 
            } 
    } 
}' | sort -rk3 >> $POSTMLOG

Ok, one step further...

The sort works but now the last line is added above the sort instead of below:
(See bold Text)

    Size  Count Who             
   6.16 Gb    308 Total           
  63.02 Mb     56 Automatisering  
   3.64 Mb      1 Karaoke         
  45.79 Mb      2 Muziekfeesten   
   4.95 Gb    175 Opgelicht       
  72.67 Mb     12 Radar           
 493.33 Mb     24 RegelRecht      
 410.69 Mb     10 Vermist         
 146.40 Mb     28 Zappsport       

Edit:
I was wrong again. I really need a rest. Sorry one more.

No Problem at all...

Please rest now :slight_smile: I will be back in the office on monday :smiley:

Thanks anyway!

Yes, this does not work:

cat INPUTFILE
a
d
c

awk '{ print | "sort" } END { print "END" }' INPUTFILE
END
a
c
d

But this does (at least for gawk):

awk '{ print | "sort" } END { close("sort"); print "END" }' INPUTFILE
a
c
d
END

You need to use exactly the same string for close, as for a pipe - "sort -k4" in your case.

So how should I put this in my script? Does it also works with printf?

Put this as the first statement after the second "END":

END { 
  close "sort -k4"
  ...
}

It doesn't matter what you use after - print or printf.
I've tested it with gawk, but this feature is not GNU extension so it should work.

Hmm the sort function still doesn't work how I like to see it.. But maybe it is just impossible in the current code.. Will work on it. I think I have enough info.

A different question:
On the split command, it seems that spaces are not accounted for.
For some of the fields seperated by the / the outcome should be something like: "my department". But instead I get only "my". I have done some search on this and in all occasions I see that spaces are allowed and stored/printed.. So what am I doing wrong here ?

regards