How print evencolumns using awk ?

quincyjones · June 4, 2017, 6:34am

I have a big file (with 200k rows and 10k columns).
How can I print all even columns of this file and also keep the first column?

input

a    1    a    10    a    11    a    12
b    2    b    30    b    22    b    33

output

a       1       10      11      12
b       2       30      22      33

RudiC · June 4, 2017, 7:07am

Any attempts / ideas / thoughts from your side?

quincyjones · June 4, 2017, 7:15am

I was using this that prints everything in one column

awk '{ for (i=1;i<=NF;i+=2) print $i }'

RudiC · June 4, 2017, 7:22am

That's because print always prints an entire line including the RS char, usually <LF> (\n, 0x0A). Did you consider using the "print formatted" printf command? And, for your desired result, you need to output field 1, and then every other field starting from field 2.

RavinderSingh13 · June 4, 2017, 8:29am

Hello quincyjones,

Could you please try following and let me know if this helps you.

awk '{printf("%s ",$1);for(i=2;i<=NF;i+=2){printf("%s ",$i)};print ""}'   Input_file

Thanks,
R. Singh

RudiC · June 4, 2017, 8:54am

Please be aware that your input file has DOS line terminators (<CR>, \r, ^M, 0x0D), so the result you see will not be exactly what you want. Either create a correct *nix text file with your application / editor, or remove the <CR> when processing the file.

quincyjones · June 4, 2017, 1:06pm

updated. The file should be tab delimited.

RudiC · June 4, 2017, 1:08pm

Still the DOS terminators persist...

RavinderSingh13 · June 4, 2017, 1:32pm

Hello quincyjones,

Could you please try following and let me know if this helps you. This should remove if any carriage characters there and will provide the output as TAB delimited too.

awk '{gsub(/\r/,"");printf("%s\t",$1);for(i=2;i<=NF;i+=2){printf("%s%s",$i,i==NF?"":"\t")};print ""}'   Input_file

Thanks,
R. Singh

quincyjones · June 4, 2017, 1:56pm

I have 12k columns and 200k rows. I am running this script using 60GB memory but it was failing. Is it it possible to run this more memory efficient? thanks @RavinderSingh13

Scrutinizer · June 4, 2017, 2:35pm

The examples given should not be using much memory at all..

How was your script failing exactly?

quincyjones · June 4, 2017, 2:40pm

sorry my bad. it is working fine.

RavinderSingh13 · June 4, 2017, 2:42pm

Hello quincyjones,

Hard to believe that it uses 60 GB of memory, it's really a plenty of it?

Since I don't have that much size of Input_file so solution will be on assumption or a try only. In my previous solution I have put 1 condition which will check if variable i's value is equal to value of NF(Number of fields) or not, which was off course to NOT to have a TAB at last of line, since we are talking in terms of performance here then you could try to remove than condition and let me know, if you are ok with a TAB at last of line then let it be same code.

awk '{gsub(/\r/,"");printf("%s\t",$1);for(i=2;i<=NF;i+=2){printf("%s\t",$i)};print ""}'   Input_file

If you want to remove TAB at last of each line then you could add | awk '{sub(/[[:space:]]+$/,"");print}' after Input_file in above code I think that should be more faster than checking conditions in each fields etc. Kindly try it out and let me know how it goes then.

Thanks,
R. Singh