Get values from 2 files - Complex "for loop and if" awk problem

Hi everyone,

I've been thinking and trying/changing all day long the below code, maybe some awk expert could help me to fix the for loop I've thought,
I think I'm very close to the correct output.

file1 is:

<boxes content="Grapes and Apples">
    <box No.="Box MT. 53">
      <quantity f="4">Grapes</quantity>
      <quantity f="8">Apples</quantity>
    </box>
    <box No.="Box MJ 62">
      <quantity f="7">Grapes</quantity>
      <quantity f="12">Apples</quantity>
    </box>
  </boxes>

file2 is:

<some text...>
<some text...>        
        <f><v>Begin</v></f>
        <f><v>Prod No</v></f>
        <f><v>Serial</v></f>
        <f><v>Grapes and Apples</v></f>
        <f><v>Begin 1</v></f>
        <f><v>Box MT. 53</v></f>
        <f><v>XMT. 5563</v></f>
        <f><v>Begin 2</v></f>
        <f><v>Box MJ 62</v></f>
        <f><v>JJKD. 772</v></f>
        <f><v>Apples</v></f>
        <f><v>Grapes</v></f>
</abc>

My code so far is:

#Arr1  #Array to store info of 1rst block, Don't pay attention to this array.
#Arr3 #Array for 1rst line for blocks 2 y 3 (stores unique strings in blue in file 1, 
          #Apples and Grapes).  Apples and grapes appear in alphabetical order in file2
#Arr5 #Array for values of each block taken from file 1 in red.

awk 'BEGIN{  B = 66 }
    FNR==NR{
        if ($0 ~ "box No.=")
            {Arr1[FNR]=gensub(/^[^"]+"|".+$/,"","g");asorti(Arr1,Arr2)}
        else if ( $0 ~ "quantity f=" )
            {Arr3[gensub(/.+">|<.+$/,"","g")];asorti(Arr3,Arr4) 
             Arr5[FNR]=gensub(/^[^"]+"|".+$/,"","g");asorti(Arr5,Arr6) 
             }
        next
    }
{
###############  for loop to generate blocks #####################
    for ( j=2;j<=length(Arr3)+1;j++ ) {  #Loop to generate block 2 and 3, because of that j begins in 2.
        if($0 ~ ">"Arr4[j-1]"<") {
            {printf("<begin \"%d\" >\n\t<b ln=\"A%d\" t=\"s\"><v>%d</v></b>\n", j,j,FNR);} #print 1rst line of each block
            for ( k=(j-1);k<=(j-2)+length(Arr5);k=k+length(Arr1) ) { #Loop to print rest of the values related to each fruit
                if ( k < length(Arr5)/length(Arr1) ) {
                    printf("\t<b ln=\"%c%d\"><v>%d</v></b>\n", B, j, Arr6[k]); #Printing the value
                    B++
                }    
                else {                    
                    printf("\t<b ln=\"%c%d\"><v>%d</v></b>\n</begin>", B, j, Arr6[k]); #Printing last line of each block
                    B=66  # B=66 because is the ASCII in decimal of letter B.
                }            
            }
        }
    }
}' file1 file2

The for loop intends to generate the blocks 2, 3...N of the output (in the sample only blocks 2 y 3). The blocks 2 and 3 represents info from
uniques fruits in file1 and their respective values. Block 2 is for Apples and contains its values from file1 (8 and 12); Block 3 is for
Grapes and contains its values from file1 (4 and 7).

  • In alphabetical order, Apples goes first than Grapes, then, block 2 is for Apples and block 3 for Grapes.
  • For each fruit block, the fruit values must appear in same order that appear in file1, e.g for Apples 8 and 12 and not 12 and 8.

I'm getting this output:

<begin "2" >
        <b ln="A2" t="s"><v>13</v></b>
        <b ln="B2"><v>3</v></b>
        <b ln="C2"><v>7</v></b>
</begin><begin "3" >
        <b ln="A3" t="s"><v>14</v></b>
        <b ln="B3"><v>4</v></b>
</begin>        <b ln="B3"><v>8</v></b>
</begin>

and the correct output should be:

<begin ln="2" >
    <c ln="A2" t="s"><v>13</v></b>
    <c ln="B2"><v>8</v></b>
    <c ln="C2"><v>12</v></b>
</begin>
<begin ln="3" >
    <c ln="A3" t="s"><v>14</v></b>
    <c ln="B3"><v>4</v></b>
    <c ln="C3"><v>7</v></b>
</begin>

The first line for each block is line number from file2, e.g. Apples appears in line 13 in file2 and Grapes appear in line 14.

Maybe someone could fix my for loop, I'm stuck in the part to print in correct order the values related to each fruit block.

PS: I have another for loop that generates the first block (not shown), so it will be great if the solution could be added to the first loop.

Many thanks in advance.

Can you please explain once more where file2 fits in here?

--ahamed

Hi ahamed, thanks for reply.

Well, the file2 is needed to know the line number of Apples and Grapes within file2 and put them in first line of each block.

To understand better within file2 check the line number of Apples and you'll see that is 13
and for Grapes is 14. Well, now see that in the output 13 is in blue in first line of block 2 and 14 in first line of block 3.

Thanks for any help.

See if this works for you...

awk 'NR==FNR {
  gsub(/"|>|</," ");
  if($3 ~ /^[0-9]/){ a[$4]=a[$4]" "$3 }
  next
}
{
  gsub(/"|>|</," ");
  x++;  if(NF==5 && $3 in a){ b[$3,1]=x; }
}
END{ j=2;
  for(i in a) {
    al=65; split(a,arr," ")
    print "<begin ln=\""j"\" >"
    printf("\t<c ln=\"%c%d\" t=\"s\"><v>"b[i,1]"</v></b>\n",al++,j)
    for(v in arr) {
      printf("\t<c ln=\"%c%d\"><v>"arr[v]"</v></b>\n",al++,j)
    } print "</begin>";j++
  }
}' file1 file2

Sorry didn't have the patience to go thru your code... :wall:

--ahamed

Hi ahamed,

Thanks for your help. It's very appreciated.

The code is almost work, the only issue is that is printing the blocks in different order. The solution would be to sort array "a"
alphabetically in the first part of awk code (when NR=FNR). I've been trying to use the same logic to sort it using asorti(), but
doesn't work (the line number in first line of each block is not printed if I include asorti in the code).

To test that, I've modified I little bit file1 and file2 as below:
*(If you test using new file1 and new file2 you'll see that 14 appears in 1rst block and 13 in 2nd block, it should be in ascending order)

file1:

<boxes content="Grapes and Apples">
    <box No.="Box MT. 53">
      <quantity f="4">Grapes A</quantity>
      <quantity f="8">Apples B</quantity>
    </box>
    <box No.="Box MJ 62">
      <quantity f="7">Grapes A</quantity>
      <quantity f="12">Apples B</quantity>
    </box>
  </boxes>

file2:

<some text...>
<some text...>        
        <f><v>Begin</v></f>
        <f><v>Prod No</v></f>
        <f><v>Serial</v></f>
        <f><v>Grapes and Apples</v></f>
        <f><v>Begin 1</v></f>
        <f><v>Box MT. 53</v></f>
        <f><v>XMT. 5563</v></f>
        <f><v>Begin 2</v></f>
        <f><v>Box MJ 62</v></f>
        <f><v>JJKD. 772</v></f>
        <f><v>Apples B</v></f>
        <f><v>Grapes A</v></f>
</abc>

The code I have so far is:

# I've added or modified a little bit your code (in blue) in order that be able to handle strings with spaces. 
#(E.g. instead of "Apples" and "Grapes" the string could be "Apples XXX YYY" or "Grapes abc" etc)

awk 'NR==FNR {
  $0=gensub(/(.+=")([0-9]+)(">)(.+)(<\/.+)/, "\\2 \\4", "g");
  if($1 ~ /^[0-9]/){ t=$1; gsub(/^[0-9]+[ ]+/,""); a[$0]=a[$0]" "t }
  next
}
{
  gsub(/.+<.>|<\/.+$/,"")
  x++;  if($0 in a){ b[$0,1]=x; }
}
END{ j=2;
  for(i in a) {
    al=65; split(a,arr," ")
    print "<begin ln=\""j"\" >"
    printf("\t<c ln=\"%c%d\" t=\"s\"><v>"b[i,1]"</v></b>\n",al++,j)
    for(v in arr) {
      printf("\t<c ln=\"%c%d\"><v>"arr[v]"</v></b>\n",al++,j)
    } print "</begin>";j++
  }
}' file1 file2

Many thanks for your help so far.

Grettings

Try this...

awk 'NR==FNR {
  $0=gensub(/(.+=")([0-9]+)(">)(.+)(<\/.+)/, "\\2 \\4", "g");
  if($1 ~ /^[0-9]/){ t=$1; gsub(/^[0-9]+[ ]+/,""); a[$0]=a[$0]" "t; }
  next
}
{
  gsub(/.+<.>|<\/.+$/,"")
  x++;  if($0 in a){ b[$0,1]=x; }
}
END{ j=2;
  asorti(a,d)
  for(i in d) {
    al=65; split(a[d],arr," ")
    print "<begin ln=\""j"\" >"
    printf("\t<c ln=\"%c%d\" t=\"s\"><v>"b[d,1]"</v></b>\n",al++,j)
    for(v in arr) {
      printf("\t<c ln=\"%c%d\"><v>"arr[v]"</v></b>\n",al++,j)
    } print "</begin>";j++
  }
}' file1 file2

--ahamed

Great ahamed! it works.

I thought to sort it before and for some reason it wasnt working.

Now that parts goes just fine.

Very appreciated all your help.

Best regards.

Offtopic, but this is the best thread I've ever seen on this site. The question was phrased perfectly. in detail, with what has been attempted, input and desired output. The help offered, and the modifying of the code through each attempt. That was great. Just great.

Hi DeCoTwc,

Thanks for your kind comment. I try to explain the best I can because English is not my maternal language and because
I know that is the best way to get appropiate help and that the same problem concept helps somebody else in the future.

Sometime when I do a question I think whether or not add some details because for experience a question that looks long
in explanation has less possibilities to be completely read it and therefore, answered, even when it has an not complex solution.

Nice to know that our questions are not only a way to get help for us, but to contribute to enrich the forum.

PS: I like to use color tools :slight_smile:

Best regards