Print every 5 lines with special condition

Hi Friends,

I have an input file like this

chr1 100 200
chr1 200 300
chr1 300 400
chr1 400 500
chr1 500 600
chr1 600 700
chr1 700 800
chr1 800 900
chr1 900 920
chr1 940 960

I would like to get the first line's second column and the fifth line's 3rd column as one single line. This should happen for every 5 lines.

So, the output will be

chr1 100 600
chr1 600 960

Thanks

awk ' NR == 1 {
                printf "%s %d ", $1, $2
                next
} NR%5 == 0 {
                printf "%d\n", $3
                getline
                printf "%s %d ", $1, $2
} END {
                printf "\n"
} ' file
1 Like

A little simpler is

awk '{ x=NR%5 }
x==1 { printf "%s %d",$1,$2 }
x==0 { printf " %d\n",$3 }
' file
2 Likes

also:

awk 'NR%5==1 {a=$2} ! NR%5 {$2=a; print}' infile
1 Like

sed:

sed 'N;N;N;N;s/[^ ]*\n.* //' file

A Perlish way:

perl -lane 'if(my $pos = ($.%5 ... $.%5==0)) {
 $start = "$F[0] $F[1]" if $pos==1;
 print "$start $F[2]" if $pos =~ /E0$/; 
}' file

This did not work for me, gives no output

awk 'NR%5==1 {a=$2} ! NR%5 {$2=a; print}' infile

This does work

awk 'NR%5==1 {a=$2} NR%5==0 {$2=a; print}' infile
chr1 100 600
chr1 600 960

GNU Awk 3.1.8

Indeed it is better to use !(NR%5) rather than ! NR%5

!(NR%5)
This works for me to :slight_smile:

I would suggest a better test file, to make sure the operation is correct. If all the lines start with chr1, it seems hard to tell if a script is really working.

chr01 100 199
chr02 200 299
chr03 300 399
chr04 400 499
chr05 500 599
chr06 600 699
chr07 700 799
chr08 800 899
chr09 900 939
chr10 940 960

Correct output is:

chr01 100 599
chr06 600 960

For example, when I ran this improved (I think) test file against the following proposed solution, it did not work correctly:

awk 'NR%5==1 {a=$2} NR%5==0 {$2=a; print}' infile
chr05 100 599
chr10 600 960

But this is the original input, that you have now changed and its need a new code.

chr1 100 200
chr1 200 300
chr1 300 400
chr1 400 500
chr1 500 600
chr1 600 700
chr1 700 800
chr1 800 900
chr1 900 920
chr1 940 960

---------- Post updated at 08:49 ---------- Previous update was at 08:39 ----------

This should then work

awk 'NR%5==1 {printf "%s %s ",$1,$2;getline;getline;getline;getline;print $3}'
chr01 100 600
chr06 600 960

The OP said "first line's second column and the fifth line's 3rd column as one single line. This should happen for every 5 lines."

I just changed the test input to make sure that condition is being met.

Of course, if the first column doesn't matter, then the original input is OK. The OP didn't say.

All above reflect this. Data from first line second column and fifth line 3rd column. Since data on column #1 was not important, you get what you get.

That's why its always important to post real data or as close to as possible.

Can't argue with that. You're right. OP didn't say anything special about first column. I was trying to "read his mind". It would seem kind of silly if first column ALWAYS had same data. But silly data files happen all the time.