Arrays in awk

catwoman · July 27, 2008, 11:23pm

Hi, I've written the following code to manipulate the first 40 lines of a data file into my desired order:

#!/bin/awk -f
{ if (NR<=(4)){
a[NR]=a[NR]$0" "}
else { if ((NR >= (5)) && (NR <= (13))) {
b[NR%3]=b[NR%3]$0" " }
else {if ((NR >= (14)) && (NR <= (25))){
c[NR%3]=c[NR%3]$0" "}
else{if ((NR>=(26)) && (NR<=(40))){
d[NR%3]=d[NR%3]$0" "}
}}}}
END {{for (i in a)
{print a[i]} }
{for (i in b)
{ print b[i]} }
{for (i in c)
{print c[i]} }
{for (i in d)
{print d[i]} }}

However, I want to apply this sorting to every set of 40 lines in the data file, where the data files are 10000+ lines. I tried enclosing the entire code in a for loop but this didn't work as the a, b, c and d arrays in each block of 40 lines would have to change as well.

Can arrays be given numerical names so they could be incremented with each execution of the for loop?

Many thanks

thana · July 27, 2008, 11:27pm

Why not pass 40 lines at a time to the awk script..

catwoman · July 28, 2008, 12:35am

Do you mean by echo-ing/printing the selected line range, feeding that into the above program and then appending the output to another file?

Cheers

Annihilannic · July 28, 2008, 1:29am

Try this solution:

#!/bin/awk -f
function printem() {
        for (i in a) { print a ; delete a }
        for (i in b) { print b ; delete b }
        for (i in c) { print c ; delete c }
        for (i in d) { print d ; delete d }
}
{
        # use nr as the record number within this set of 40
        nr=NR%40

        if      (nr>=1  && nr<=4           ) a[nr  ]=a[nr  ]$0" "
        else if (nr>=5  && nr<=13          ) b[nr%3]=b[nr%3]$0" "
        else if (nr>=14 && nr<=25          ) c[nr%3]=c[nr%3]$0" "
        else if (nr==0  || nr>=26 && nr<=40) d[nr%3]=d[nr%3]$0" "

        if (nr==0) { printem() }
}
END { printem() }

I've changed the code layout a little to hopefully make it a bit more readable. It gets around your problem by deleting the contents of the arrays while printing them so that they are empty for the next set of 40 records.

catwoman · July 28, 2008, 1:46am

Thanks for that it's worked a treat!! Just one question... does:

nr=NR%40

just dictate that the first set of 40 lines should be processed first, and subsequent sets after?

Annihilannic · July 28, 2008, 1:51am

No, all it does is assigns the modulus of NR divided by 40 to a new variable, to save having to type NR%40 instead of nr in the rest of the code.