Print every 5 lines with special condition

jacobs.smith · March 5, 2013, 11:21am

Hi Friends,

I have an input file like this

chr1 100 200
chr1 200 300
chr1 300 400
chr1 400 500
chr1 500 600
chr1 600 700
chr1 700 800
chr1 800 900
chr1 900 920
chr1 940 960

I would like to get the first line's second column and the fifth line's 3rd column as one single line. This should happen for every 5 lines.

So, the output will be

chr1 100 600
chr1 600 960

Thanks

Yoda · March 5, 2013, 11:36am

awk ' NR == 1 {
                printf "%s %d ", $1, $2
                next
} NR%5 == 0 {
                printf "%d\n", $3
                getline
                printf "%s %d ", $1, $2
} END {
                printf "\n"
} ' file

MadeInGermany · March 5, 2013, 12:51pm

A little simpler is

awk '{ x=NR%5 }
x==1 { printf "%s %d",$1,$2 }
x==0 { printf " %d\n",$3 }
' file

rdrtx1 · March 5, 2013, 4:15pm

also:

awk 'NR%5==1 {a=$2} ! NR%5 {$2=a; print}' infile

Scrutinizer · March 5, 2013, 5:25pm

sed:

sed 'N;N;N;N;s/[^ ]*\n.* //' file

elixir_sinari · March 5, 2013, 10:19pm

A Perlish way:

perl -lane 'if(my $pos = ($.%5 ... $.%5==0)) {
 $start = "$F[0] $F[1]" if $pos==1;
 print "$start $F[2]" if $pos =~ /E0$/; 
}' file

Jotne · March 6, 2013, 1:40am

This did not work for me, gives no output

awk 'NR%5==1 {a=$2} ! NR%5 {$2=a; print}' infile

This does work

awk 'NR%5==1 {a=$2} NR%5==0 {$2=a; print}' infile
chr1 100 600
chr1 600 960

GNU Awk 3.1.8

Scrutinizer · March 6, 2013, 2:25am

Indeed it is better to use !(NR%5) rather than ! NR%5

Jotne · March 6, 2013, 2:37am

!(NR%5)
This works for me to

hanson44 · March 6, 2013, 2:37am

I would suggest a better test file, to make sure the operation is correct. If all the lines start with chr1, it seems hard to tell if a script is really working.

chr01 100 199
chr02 200 299
chr03 300 399
chr04 400 499
chr05 500 599
chr06 600 699
chr07 700 799
chr08 800 899
chr09 900 939
chr10 940 960

Correct output is:

chr01 100 599
chr06 600 960

For example, when I ran this improved (I think) test file against the following proposed solution, it did not work correctly:

awk 'NR%5==1 {a=$2} NR%5==0 {$2=a; print}' infile
chr05 100 599
chr10 600 960

Jotne · March 6, 2013, 2:49am

But this is the original input, that you have now changed and its need a new code.

chr1 100 200
chr1 200 300
chr1 300 400
chr1 400 500
chr1 500 600
chr1 600 700
chr1 700 800
chr1 800 900
chr1 900 920
chr1 940 960

---------- Post updated at 08:49 ---------- Previous update was at 08:39 ----------

This should then work

awk 'NR%5==1 {printf "%s %s ",$1,$2;getline;getline;getline;getline;print $3}'
chr01 100 600
chr06 600 960

hanson44 · March 6, 2013, 3:10am

The OP said "first line's second column and the fifth line's 3rd column as one single line. This should happen for every 5 lines."

I just changed the test input to make sure that condition is being met.

Of course, if the first column doesn't matter, then the original input is OK. The OP didn't say.

Jotne · March 6, 2013, 3:51am

All above reflect this. Data from first line second column and fifth line 3rd column. Since data on column #1 was not important, you get what you get.

That's why its always important to post real data or as close to as possible.

hanson44 · March 6, 2013, 4:06am

Can't argue with that. You're right. OP didn't say anything special about first column. I was trying to "read his mind". It would seem kind of silly if first column ALWAYS had same data. But silly data files happen all the time.