use awk to read randomly located columns in an excel file

mdap · August 9, 2008, 9:30am

Hi,

I have an excel file that have a random count of columns/fields and what im trying to do is to only retrieve all the rows under 2 specific field headers.

I can use the usually command for awk which is awk 'print{ $1 $2}' > output.txt, but the location of the 2 specific field headers is not constant so it will not be useful to use the command.

example:(headers needed are name & age)
for today the generated column count and location of needed headers are:

name age sex location company

the next day the column count and location of the needed headers might be

company age location id name address

Is there a command that can be used just to read the headers that i just need and then once the headers are found retrieve all the rows under the header and place them to output.txt?

Thanks,
mdap

aigles · August 9, 2008, 11:55am

The following script, display selected columns of the input file.
The names of selected columns are specified in the Columns variable.

Columns="file,size"
awk -v Columns="$columns" '
BEGIN {
   cols_count = split(Columns, cols, ",");
   for (i=1; i<=cols_count; i++)
      col_fields[cols] = 0;
}
NR==1 {
   for (i=1; i<=NF; i++)
      col_fields[$i] = i;
}
{
   out = "";
   for (i=1; i<=cols_count; i++) {
      col_hdr = cols;
      out = (i>1 ? out OFS : "") (col_fields[col_hdr] ? $(col_fields[col_hdr]) : "?" );
   }
   print out;
}
' mdat.txt

Sample input file (mdat.txt)

perms   link owner       group size d1  d2    d3 file
-rw-r--r-- 1 Jean-Pierre Aucun   85 Jul  1  2007 p.sh
-rw-r--r-- 1 Jean-Pierre Aucun  104 Jul  1  2007 p2.sh
-rw-r--r-- 1 Jean-Pierre Aucun  121 Aug  3 19:27 alalush.sh
-rw-r--r-- 1 Jean-Pierre Aucun  180 Oct 10  2007 raghav.sh
-rw-r--r-- 1 Jean-Pierre Aucun  417 Jul  6  2007 struct.sh
-rw-r--r-- 1 Jean-Pierre Aucun  682 Nov  6  2007 single.sh
-rw-r--r-- 1 Jean-Pierre Aucun  703 Aug  9 17:42 mdat.sh
-rw-r--r-- 1 Jean-Pierre Aucun  789 Aug  1 17:43 sm.sh
-rw-r--r-- 1 Jean-Pierre Aucun  888 Aug  3 16:03 mogarb2.sh
-rw-r--r-- 1 Jean-Pierre Aucun  946 Aug  8 18:59 asirohi.sh

Output

file size
p.sh 85
p2.sh 104
alalush.sh 121
raghav.sh 180
struct.sh 417
single.sh 682
mdat.sh 703
sm.sh 789
mogarb2.sh 888
asirohi.sh 946

Jean-Pierre.

mdap · August 9, 2008, 10:22pm

HI aigles,

i tried to copy paste the exact code to mdat.sh and also the input file to mdat.txt and when i executed mdat.sh it only showed blank rows and no data in it...

mdap

Perderabo · August 9, 2008, 10:52pm

Post your OS. Use "uname -a" and post the results. If you are using Solaris/SunOS, switch "awk" to "nawk".

mdap · August 9, 2008, 11:09pm

its GNU/Linux... that is why im using awk..

aigles · August 10, 2008, 4:07am

Sorry, typo error

Columns="file,size"
awk -v Columns="$Columns" '
BEGIN {

Jean-Pierre.

mdap · August 10, 2008, 5:52am

Thanks much aigles!!! its now working fine..

Would you mind to explain the meaning of the block of code below, if it is ok with? (",)..

NR==1 {
for (i=1; i<=NF; i++)
col_fields[$i] = i; }

{
out = "";
for (i=1; i<=cols_count; i++) {
col_hdr = cols[i];
out = (i>1 ? out OFS : "") (col_fields[col_hdr] ? $(col_fields[col_hdr]) : "?" );
}
print out;
}

THANK YOU Master!...

aigles · August 10, 2008, 7:47am

NR==1 {                            # Select first line = header 
   for (i=1; i<=NF; i++)           #    For every field i (NF is last field# ) 
      col_fields[$i] = i;          #       Memorize field# for column name ($i)
}	  

{                                  # Select every line             
   out = "";                       #    Initialize out valriable (used for line diplay) 
   for (i=1; i<=cols_count; i++) { #    For every selected columns
      col_hdr = cols;           #       get column name
      out = (i>1 ? out OFS : "") (col_fields[col_hdr] ? $(col_fields[col_hdr]) : "?" );
	                            #       Concatenate column value $(col_fields[col_hdr]
                                   #       to out variable with Output Field Separator 
   }                               #
   print out;                      #    Print columns
}

mdap · August 10, 2008, 8:15am

Thanks Again!!!

last question though.. what does "" and "?" in the code mean?

as far as i can understand it the line of code:
out = (i>1 ? out OFS : "") (col_fields[col_hdr] ? $(col_fields[col_hdr]) : "?" );

means

out = if i>1 is true then out OFS else "" and if col_fields[col_hdr] is true then $(col_fields[col_hdr]) else "?"

am i correct? :-?

aigles · August 10, 2008, 11:18am

Exactly