Please explain what this Awk code is doing

Hi Guys,

Please help me, I am new to programming and I don�t understand what some parts of this code are doing. I have comments on the parts I know, please help if my understanding of the code is not correct and also help with parts with questions.

  awk '
      {
          gsub( ">", "" );        #replace the > with space
          gsub( " ", "~" );          #replace the  space with match 
          gsub( "<", " " );          #replace the < with space
   
   
          for( i = 1; i <= NF; i++ )          #For loop which sets i to 1 and i is less than or equal to number  of fields and I increases by 1 each time the loop run. 
   
          {
              if( split( $(i), a, "=" ) == 2 )  #split the string i into array a on the regular expression. 
  What is ==2 doing 
              {
                  gsub(  "\"", "", a[2] ); #replace \ with space 
  What is a[2] doing , is this array index 2 if so why index 2
                  gsub(  "~", " ", a[2] );
                  values[a[1]] = a[2];
  What is this doing values [a[1]]=a[2];             
  }
          }
   
          gcount[values["Gender"]]++;         # collect counts
          acount[values["Age"]]++;
   
          printf( "%s %s %s %s\n", values["NAME"], values["Age"], values["D.O.B"], values["Gender"] );
      }
   
      END {
          printf( "\nAge Count" );
          for( x in acount ) #for loop make the x as index for array acount 
              printf( "%s %d\n", x, acount[x] );
  What is this print doing? It is printing the values in x but how. 
          printf( "\nGender Count:\n" );
          for( x in gcount )
              printf( "%s %d\n", x, gcount[x] );
      }
  ' input_file

Example of the xml message is.

  [date+time], message=[DATA= �<?xml version=�1.0?�><data changeMsg><NAME=�John Smith�><Age=�23�><D.O.B=�11-10-1988�> <Gender=�Male�>�

Sorry guys I know I am asking a lot but any help would be greatly appreciated?

Thank you all

What is ==2 doing
== 2 is matching the result of the split. Hes doing 2 things on one line (like he should)

Think of like like this:

split (blablabla)
if split result equals 2

What is a[2] doing , is this array index 2 if so why index 2
Does the gsub for the value in a[2]

He's probably removing the " from the value

values[a[1]] = a[2]
Building the values array so a[1] will be the key to a[2]

What is a[2] doing , is this array index 2 if so why index 2
printing both values of x and acount[x] first one as a string and the other one as decimal.

Hope it helps.

1 Like