Need awk help to print specific columns with as string in a header

awk experts,

I have a big file of 4000 columns with header. Would like to print the columns with string value of "Commands" in header. File has "," separator. This file is on ESX host with Bash.

Thanks,
Arv

Could you please display a sample input (a few columns is enough) and desired output?

Sure.
Input with 4000 columns.
abc,xxkkk,eeee,eee,kkknn,sss,xxx,kkkvvv,yyy, ttt,mmm,fff..........
1,1,3,4,6,7,8,9,9,0,0,5
3,4,1,5,1,5,6,7,8,9,9,7,

Would like to print the colums with header which has substring of KKK
xxkkk, kkknn,kkkvvv
1,6,9
4,1,7

Thanks,

See if this works for you:

cut -d',' -f2,5,8 input_file

Thanks. This works for the test file but not for the real file which has 4000 columns. and the substring "kkk" could be in any column and may be in 1000 different columns. Need an if statement in awk which can scan each header and find the header with string 'KKK' and then enter in print to select the column. Ask questions or put in your words to make sure that I am explaining it well.

Thanks, Again.

depending on your version of awk, there might be a limit on the maximum number of fields per record. If so, try using gawk.

nawk '
   FNR==1{
        for(i=1;i<=NF;i++) 
            if ($i ~ str) {
              h=(h)?h FS $i:$i
              f=(f)?f FS i:i
            }
        print h
        nf=split(f,fA,FS);next
   }
   {
       for(i=1;i<=nf;i++) 
           printf("%s%c",$fA, (i==nf)?ORS:FS)
   }' str='kkk' FS=, myFile

I got following error: What did I missed?

[root@esx09 ~]# ./esx09.scr
./esx09.scr: line 14: nawk: command not found
[root@esx09 ~]# more esx09.scr
nawk '
   FNR==1{
        for(i=1;i<=NF;i++)
            if ($i ~ str) {
              h=(h)?h FS $i:$i
              f=(f)?f FS i:i
            }
        print h
        nf=split(f,fA,FS);next
   }
   {
       for(i=1;i<=nf;i++)
           printf("%s%c",$fA, (i==nf)?ORS:FS)
   }' str='Commands' FS=, esx09.tst
[root@esx09 ~]# ./esx09.scr
./esx09.scr: line 14: nawk: command not found

Firstly, start using code tags when posting code/data samples, thanks!

Secondly, your OS doesn't have 'nawk' - try using either 'awk' or 'gawk' instead!

:b:vgersh99, It worked. Thanks a lot. Let me run through real data. Good JOb.

---------- Post updated at 03:38 PM ---------- Previous update was at 03:33 PM ----------

One more questions on the output. The output has now 800 columns. Excel can load only 256 columns. Any idea how to load all the columns?

See if this works:

#!/usr/bin/ksh
typeset -i mFld=0
IFS=','
mFields=""
for mLine in $(head -1 input_file)
do
  mFld=${mFld}+1
  mFound=$(echo ${mLine} | egrep 'kkk')
  if [[ "${mFound}" != "" ]]; then
    mFields=${mFields}${mFld}','
  fi
done
mFields=${mFields%,}
cut -d',' -f"${mFields}" input_file

By not using Excel? :wink: Maybe others would know - I'm not Excel savvy.
I can modify the code to create multiple files 256 (or less) columns each.
Let me know.

What will this do? I have many columns with zero output. those can be deleted, if all the raw for that columns are zero.

---------- Post updated at 03:46 PM ---------- Previous update was at 03:45 PM ----------

That will help. To split file in 250 columns in each file.

---------- Post updated at 03:50 PM ---------- Previous update was at 03:46 PM ----------

Keep the first colunm of the original file in each file. Thanks

---------- Post updated at 04:06 PM ---------- Previous update was at 03:50 PM ----------

Vgersh,

To reduce the columns, Is it possible to add all the fields and make two columns? First and total off all the extracted columns.

In my original file:
kkk kkk kkk Total
1 2 5 8
3 1 1 5

That is what I am doing in excel adding up all the raws to one column.

Thanks

Don't quite follow your 'First and total off all the extracted columns', but try this:

awk '
   FNR==1{
        for(i=1;i<=NF;i++)
            if ($i ~ str) {
              if (!h) h=$i
              f=(f)?f FS i:i
            }
        print h FS "TOTAL"
        nf=split(f,fA,FS);next
   }
   {
       t=0
       printf("%s", $1 FS)
       for(i=1;i<=nf;i++)
           t+=$fA
       printf("%d\n", t)
   }' str='Commands' FS=, esx09.tst

Vgersh,

Thanks a lot, it worked.
To clarify the data: The input file has first columns as timestamp, rest of the fields are number under different headre. Your awk should keep first field, extract columns with string "Commands" from the rest of the columns. Add (sum) of all the rows of extracted and put in a new columns called Total. The output would have two columns, first timestamp same as input file and 2nd column is Total field.

In my original file:

Time kkk kkk kkk Total
9:15AM 1 2 5 8 
9:30AM 3 1 1 5

Keep the bold field from above with "," for output.

Pl confirm that is what you did.

Thanks,
Arv

awk '
   FNR==1{
        h=$1
        for(i=1;i<=NF;i++)
            if ($i ~ str) {
              h=h OFS $i
              f=(f)?f FS i:i
            }
        print h OFS "TOTAL"
        nf=split(f,fA,FS);next
   }
   {
       t=0
       printf("%s", $1)
       for(i=1;i<=nf;i++) {
           printf("%c%d", OFS, $fA)
           t+=$fA
       }
       printf("%c%d\n", OFS,t)
   }' str='Commands' OFS=, esx09.tst

Sorry, I may have missed something. The final output should have only two columns. First timestamp and 2nd Total. Your last night script had good output. The new one has multiple columns.

Output example:

Timestamp,Total
   9:15AM,8
   9:30AM,7
 

My explaination in prev post was to confirm your logic in last night's script.

Thanks,
Arv

awk '
   FNR==1{
        print $1 OFS "Total"
        for(i=1;i<=NF;i++)
            if ($i ~ str)
              f=(f)?f FS i:i
        nf=split(f,fA,FS)
        next
   }
   {
       t=0
       for(i=1;i<=nf;i++)
           t+=$fA
       printf("%s%c%d\n", $1, OFS,t)
   }'  str='Commands' OFS=, esx09.tst

Vgresh, The total columns has all zeros. Last night script has good output. Just confirm it totals the "Commands" field and we are done with this post.

Thanks,
Arv

Most likely this's because you keep changing your original file format: yesterday it had ',' as separators - today it's separated by spaces.
Which is it?
If the original file is space separated:

str='Commands' OFS=, esx09.tst

If the original file is ',' separated:

str='Commands' FS=, OFS=, esx09.tst

Vgresh,

Thanks again. It worked as last night's script. The input is "," separated file.

Regards,
-Arv