Hi Friends,
I have an input file like this
chr1 100 200
chr1 200 300
chr1 300 400
chr1 400 500
chr1 500 600
chr1 600 700
chr1 700 800
chr1 800 900
chr1 900 920
chr1 940 960
I would like to get the first line's second column and the fifth line's 3rd column as one single line. This should happen for every 5 lines.
So, the output will be
chr1 100 600
chr1 600 960
Thanks
Yoda
March 5, 2013, 11:36am
2
awk ' NR == 1 {
printf "%s %d ", $1, $2
next
} NR%5 == 0 {
printf "%d\n", $3
getline
printf "%s %d ", $1, $2
} END {
printf "\n"
} ' file
1 Like
A little simpler is
awk '{ x=NR%5 }
x==1 { printf "%s %d",$1,$2 }
x==0 { printf " %d\n",$3 }
' file
2 Likes
rdrtx1
March 5, 2013, 4:15pm
4
also:
awk 'NR%5==1 {a=$2} ! NR%5 {$2=a; print}' infile
1 Like
sed:
sed 'N;N;N;N;s/[^ ]*\n.* //' file
A Perlish way:
perl -lane 'if(my $pos = ($.%5 ... $.%5==0)) {
$start = "$F[0] $F[1]" if $pos==1;
print "$start $F[2]" if $pos =~ /E0$/;
}' file
Jotne
March 6, 2013, 1:40am
7
This did not work for me, gives no output
awk 'NR%5==1 {a=$2} ! NR%5 {$2=a; print}' infile
This does work
awk 'NR%5==1 {a=$2} NR%5==0 {$2=a; print}' infile
chr1 100 600
chr1 600 960
GNU Awk 3.1.8
Indeed it is better to use !(NR%5)
rather than ! NR%5
Jotne
March 6, 2013, 2:37am
9
!(NR%5)
This works for me to
I would suggest a better test file, to make sure the operation is correct. If all the lines start with chr1, it seems hard to tell if a script is really working.
chr01 100 199
chr02 200 299
chr03 300 399
chr04 400 499
chr05 500 599
chr06 600 699
chr07 700 799
chr08 800 899
chr09 900 939
chr10 940 960
Correct output is:
chr01 100 599
chr06 600 960
For example, when I ran this improved (I think) test file against the following proposed solution, it did not work correctly:
awk 'NR%5==1 {a=$2} NR%5==0 {$2=a; print}' infile
chr05 100 599
chr10 600 960
Jotne
March 6, 2013, 2:49am
11
But this is the original input, that you have now changed and its need a new code.
chr1 100 200
chr1 200 300
chr1 300 400
chr1 400 500
chr1 500 600
chr1 600 700
chr1 700 800
chr1 800 900
chr1 900 920
chr1 940 960
---------- Post updated at 08:49 ---------- Previous update was at 08:39 ----------
This should then work
awk 'NR%5==1 {printf "%s %s ",$1,$2;getline;getline;getline;getline;print $3}'
chr01 100 600
chr06 600 960
The OP said "first line's second column and the fifth line's 3rd column as one single line. This should happen for every 5 lines."
I just changed the test input to make sure that condition is being met.
Of course, if the first column doesn't matter, then the original input is OK. The OP didn't say.
Jotne
March 6, 2013, 3:51am
13
All above reflect this. Data from first line second column and fifth line 3rd column. Since data on column #1 was not important, you get what you get.
That's why its always important to post real data or as close to as possible.
Can't argue with that. You're right. OP didn't say anything special about first column. I was trying to "read his mind". It would seem kind of silly if first column ALWAYS had same data. But silly data files happen all the time.