Help with the awk syntax

juzz4fun · June 6, 2013, 1:31pm

Hello Experts:

While writing a script to help one of the posts on here, I end up writing a wrong one. I am very much eager to know how this can be corrected.
Aim was to not print specified columns - lets say out of 100 fields, need to print all but 5th, 10th, 15th columns.
Someone already answered it - $5=$10=$15="" . This also should work.
I was thinking in another way to do it: Store the fields that needs to be excluded, in a separate file - nofields

Below is the script that I wrote:

awk 'NR==FNR{a[$1];next}{for (i = 1; i <= NF; i++)if(i not in a)print $i}' nofields input

I am sure there is mistakes in it, but interested in knowing what it is.

Yoda · June 6, 2013, 1:48pm

Replace if(i not in a ) with if ( ! ( i in a ) )

juzz4fun · June 6, 2013, 1:57pm

That certainly worked.
But the o/p is like 1st fields on 1st line, 2nd field on 2nd line, 3rd is on 3rd line so on..
how can I avoid this?

Yoda · June 6, 2013, 1:59pm

Replace print $i with printf (i==NF?$i RS:$i)

elixir_sinari · June 6, 2013, 2:02pm

What if the last field is the one to be excluded from the output?

shamrock · June 6, 2013, 2:02pm

Setting $5 $10 and $15 to null is the way to go but if you want to do it using an exclude file then here's how...

awk '{if (NR==FNR) a[$1]; else {for (i=1; i<=NF; i++) if (!(i in a)) printf("%s", i<NF ? $i FS : $i "\n")}' nofields input

juzz4fun · June 6, 2013, 2:10pm

Thanks all... I modified my script as below and it worked as expected.

nawk 'NR==FNR{a[$1];next}{for (i = 1; i <= NF; i++)if(!(i in a))printf("%s", i<NF ? $i FS : $i "\n")}' nofields input

Yoda · June 6, 2013, 2:13pm

I said printf not print

juzz4fun · June 6, 2013, 2:18pm

Oops ! My apologies, Yoda.

Yoda · June 6, 2013, 2:18pm

elixir sinari is right about the problem that he pointed out.

So use this code instead:

awk 'NR==FNR{A[$1];next}{for(i=1;i<=NF;i++) { if( !( i in A )) printf $i OFS } printf "\n" }' nofields input

Don_Cragun · June 6, 2013, 4:08pm

I'm surprised this works for you. If nofields contains:

1
3
5

and input contains:

1 2 3 4 5
1 2 3 4 5 6 7 
1 2 3 4
1
1 2
1 2 3

the output I get is:

2 4 2 4 6 7
2 4
2
2

where the first two lines are joined, the last line is incomplete (ending with a <space> instead of a <newline>), and the input line that just contained one field was deleted (where I expected you wanted an empty line) as follows:

where each line is terminated by a <newline> and there is never a <space> immediately before any line's terminating <newline>. If this is what you want, you could try something like:

nawk '
NR == FNR {
        a[$1]
        next
}
{       first=0
        for(i = 1; i <= NF; i++)
                if(!(i in a))
                        printf("%s%s", first++ ? FS : "", $i)
        print ""
}' nofields input

or, if you insist on a much less readable single line version:

nawk 'NR==FNR{a[$1];next}{f=0;for(i=1;i<=NF;i++)if(!(i in a))printf("%s%s",f++?FS:"",$i);print ""}' nofields input

I am not a fan of one-liners!

Note that Yoda's script will take care of most of the issues discussed here, but will still sometimes print a field separator at the end of line (before the terminating <newline>).

juzz4fun · June 6, 2013, 4:37pm

I didn't faced this problem .. might be due the fact that my input file had all the rows with same number of fields in it... and last field is not in nofields file.
But I must appreciate you for forcing my mind to rethink about this one-liner...

Jotne · June 7, 2013, 2:37am

@Yoda
I would have replaced printf "\n" with print ""

alister · June 7, 2013, 12:30pm

yoda:

elixir sinari is right about the problem that he pointed out.

So use this code instead:
awk 'NR==FNR{A[$1];next}{for(i=1;i<=NF;i++) { if( !( i in A )) printf $i OFS } printf "\n" }' nofields input

This code silently converts %% into % and will implode if a % is present and not followed by another %.

Regards,
Alister