Need awk help to print specific columns with as string in a header

arv_cds · May 26, 2011, 3:26pm

awk experts,

I have a big file of 4000 columns with header. Would like to print the columns with string value of "Commands" in header. File has "," separator. This file is on ESX host with Bash.

Thanks,
Arv

Shell_Life · May 26, 2011, 3:32pm

Could you please display a sample input (a few columns is enough) and desired output?

arv_cds · May 26, 2011, 3:47pm

Sure.
Input with 4000 columns.
abc,xxkkk,eeee,eee,kkknn,sss,xxx,kkkvvv,yyy, ttt,mmm,fff..........
1,1,3,4,6,7,8,9,9,0,0,5
3,4,1,5,1,5,6,7,8,9,9,7,

Would like to print the colums with header which has substring of KKK
xxkkk, kkknn,kkkvvv
1,6,9
4,1,7

Thanks,

Shell_Life · May 26, 2011, 3:51pm

See if this works for you:

cut -d',' -f2,5,8 input_file

arv_cds · May 26, 2011, 3:58pm

Thanks. This works for the test file but not for the real file which has 4000 columns. and the substring "kkk" could be in any column and may be in 1000 different columns. Need an if statement in awk which can scan each header and find the header with string 'KKK' and then enter in print to select the column. Ask questions or put in your words to make sure that I am explaining it well.

Thanks, Again.

vgersh99 · May 26, 2011, 4:13pm

depending on your version of awk, there might be a limit on the maximum number of fields per record. If so, try using gawk.

nawk '
   FNR==1{
        for(i=1;i<=NF;i++) 
            if ($i ~ str) {
              h=(h)?h FS $i:$i
              f=(f)?f FS i:i
            }
        print h
        nf=split(f,fA,FS);next
   }
   {
       for(i=1;i<=nf;i++) 
           printf("%s%c",$fA, (i==nf)?ORS:FS)
   }' str='kkk' FS=, myFile

arv_cds · May 26, 2011, 4:22pm

I got following error: What did I missed?

[root@esx09 ~]# ./esx09.scr
./esx09.scr: line 14: nawk: command not found
[root@esx09 ~]# more esx09.scr
nawk '
   FNR==1{
        for(i=1;i<=NF;i++)
            if ($i ~ str) {
              h=(h)?h FS $i:$i
              f=(f)?f FS i:i
            }
        print h
        nf=split(f,fA,FS);next
   }
   {
       for(i=1;i<=nf;i++)
           printf("%s%c",$fA, (i==nf)?ORS:FS)
   }' str='Commands' FS=, esx09.tst
[root@esx09 ~]# ./esx09.scr
./esx09.scr: line 14: nawk: command not found

vgersh99 · May 26, 2011, 4:26pm

Firstly, start using code tags when posting code/data samples, thanks!

Secondly, your OS doesn't have 'nawk' - try using either 'awk' or 'gawk' instead!

arv_cds · May 26, 2011, 4:38pm

vgersh99, It worked. Thanks a lot. Let me run through real data. Good JOb.

---------- Post updated at 03:38 PM ---------- Previous update was at 03:33 PM ----------

One more questions on the output. The output has now 800 columns. Excel can load only 256 columns. Any idea how to load all the columns?

Shell_Life · May 26, 2011, 4:41pm

See if this works:

#!/usr/bin/ksh
typeset -i mFld=0
IFS=','
mFields=""
for mLine in $(head -1 input_file)
do
  mFld=${mFld}+1
  mFound=$(echo ${mLine} | egrep 'kkk')
  if [[ "${mFound}" != "" ]]; then
    mFields=${mFields}${mFld}','
  fi
done
mFields=${mFields%,}
cut -d',' -f"${mFields}" input_file

vgersh99 · May 26, 2011, 4:44pm

By not using Excel? Maybe others would know - I'm not Excel savvy.
I can modify the code to create multiple files 256 (or less) columns each.
Let me know.

arv_cds · May 26, 2011, 5:06pm

What will this do? I have many columns with zero output. those can be deleted, if all the raw for that columns are zero.

---------- Post updated at 03:46 PM ---------- Previous update was at 03:45 PM ----------

That will help. To split file in 250 columns in each file.

---------- Post updated at 03:50 PM ---------- Previous update was at 03:46 PM ----------

Keep the first colunm of the original file in each file. Thanks

---------- Post updated at 04:06 PM ---------- Previous update was at 03:50 PM ----------

Vgersh,

To reduce the columns, Is it possible to add all the fields and make two columns? First and total off all the extracted columns.

In my original file:
kkk kkk kkk Total
1 2 5 8
3 1 1 5

That is what I am doing in excel adding up all the raws to one column.

Thanks

vgersh99 · May 26, 2011, 5:48pm

Don't quite follow your 'First and total off all the extracted columns', but try this:

awk '
   FNR==1{
        for(i=1;i<=NF;i++)
            if ($i ~ str) {
              if (!h) h=$i
              f=(f)?f FS i:i
            }
        print h FS "TOTAL"
        nf=split(f,fA,FS);next
   }
   {
       t=0
       printf("%s", $1 FS)
       for(i=1;i<=nf;i++)
           t+=$fA
       printf("%d\n", t)
   }' str='Commands' FS=, esx09.tst

arv_cds · May 27, 2011, 8:48am

Vgersh,

Thanks a lot, it worked.
To clarify the data: The input file has first columns as timestamp, rest of the fields are number under different headre. Your awk should keep first field, extract columns with string "Commands" from the rest of the columns. Add (sum) of all the rows of extracted and put in a new columns called Total. The output would have two columns, first timestamp same as input file and 2nd column is Total field.

In my original file:

Time kkk kkk kkk Total
9:15AM 1 2 5 8 
9:30AM 3 1 1 5

Keep the bold field from above with "," for output.

Pl confirm that is what you did.

Thanks,
Arv

vgersh99 · May 27, 2011, 9:31am

awk '
   FNR==1{
        h=$1
        for(i=1;i<=NF;i++)
            if ($i ~ str) {
              h=h OFS $i
              f=(f)?f FS i:i
            }
        print h OFS "TOTAL"
        nf=split(f,fA,FS);next
   }
   {
       t=0
       printf("%s", $1)
       for(i=1;i<=nf;i++) {
           printf("%c%d", OFS, $fA)
           t+=$fA
       }
       printf("%c%d\n", OFS,t)
   }' str='Commands' OFS=, esx09.tst

arv_cds · May 27, 2011, 9:46am

Sorry, I may have missed something. The final output should have only two columns. First timestamp and 2nd Total. Your last night script had good output. The new one has multiple columns.

Output example:

Timestamp,Total
   9:15AM,8
   9:30AM,7

My explaination in prev post was to confirm your logic in last night's script.

Thanks,
Arv

vgersh99 · May 27, 2011, 9:58am

awk '
   FNR==1{
        print $1 OFS "Total"
        for(i=1;i<=NF;i++)
            if ($i ~ str)
              f=(f)?f FS i:i
        nf=split(f,fA,FS)
        next
   }
   {
       t=0
       for(i=1;i<=nf;i++)
           t+=$fA
       printf("%s%c%d\n", $1, OFS,t)
   }'  str='Commands' OFS=, esx09.tst

arv_cds · May 27, 2011, 10:07am

Vgresh, The total columns has all zeros. Last night script has good output. Just confirm it totals the "Commands" field and we are done with this post.

Thanks,
Arv

vgersh99 · May 27, 2011, 10:20am

Most likely this's because you keep changing your original file format: yesterday it had ',' as separators - today it's separated by spaces.
Which is it?
If the original file is space separated:

str='Commands' OFS=, esx09.tst

If the original file is ',' separated:

str='Commands' FS=, OFS=, esx09.tst

arv_cds · May 27, 2011, 10:37am

Vgresh,

Thanks again. It worked as last night's script. The input is "," separated file.

Regards,
-Arv