emiley
July 26, 2016, 5:02pm
1
Hi there,
I would like to use awk to reformat a tab-delimited file containing three columns as follows:
Data file:
sample 1 173
sample 269 530
sample 687 733
sample 1699 1779
Desired output file:
sample 174..265, 531..686, 734..1698
I need the value in the third column +1 to be paired with the next value in the second column -1, and so on. The resultant output would be a single line containing the name of the first column followed by a space and then the parsed paired coordinates.
Thank you very much for your help!
What have you tried so far? Are you getting errors? Please post the awk commands and any output.
emiley
July 26, 2016, 5:54pm
3
I have tried this (among a variety of other variations), but clearly far from where I want to be.... Very much a beginner...
awk '{print $1" "$3+1".."$2-1;}' input
With the following output:
sample 174..0
sample 531..268
sample 734..686
sample 1780..1698
Thank you for your help, I really appreciate it!
rdrtx1
July 26, 2016, 7:09pm
4
awk -F"\t" '
NR==1 {printf $1 " " ; l=$NF + 1;}
NR>2 {printf ", "}
NR>1 {printf l ".." $2 - 1; l=$NF + 1}
END {print ""}
' infile
In the 2nd column of your 2nd line you have the number 269. In your output you have got 265, shouldn't this be 268?
The below is a pure shell solution:
f=0 ; while read -r one two three; do
((!$f)) && { printf "%s" "${one} $(($three+1)).."; f=1; continue;}
printf "%s, %s.." "$(($two-1))" "$(($three+1))"
done < datafile; print
Yoda
July 27, 2016, 12:36pm
6
Another awk approach if you have different first column records:-
awk -F'\t' '
{
++R[$1]
A2[$1 FS R[$1]] = $2
A3[$1 FS R[$1]] = $3
}
END {
for ( k in R )
{
i = 1
printf "%s\t", k
while ( i < R[k] )
{
printf "%d..%d ", A3[k FS i]+1, A2[k FS (i+1)]-1
++i
}
printf "\n"
}
}
' file