awk to parse current and next row in tab-delimited file

emiley · July 26, 2016, 5:02pm

Hi there,

I would like to use awk to reformat a tab-delimited file containing three columns as follows:
Data file:

sample    1    173
sample    269    530
sample    687    733
sample    1699 1779

Desired output file:

sample 174..265, 531..686, 734..1698

I need the value in the third column +1 to be paired with the next value in the second column -1, and so on. The resultant output would be a single line containing the name of the first column followed by a space and then the parsed paired coordinates.

Thank you very much for your help!

frank_rizzo · July 26, 2016, 5:22pm

What have you tried so far? Are you getting errors? Please post the awk commands and any output.

emiley · July 26, 2016, 5:54pm

I have tried this (among a variety of other variations), but clearly far from where I want to be.... Very much a beginner...

awk '{print $1" "$3+1".."$2-1;}' input

With the following output:

sample 174..0
sample 531..268
sample 734..686
sample 1780..1698

Thank you for your help, I really appreciate it!

rdrtx1 · July 26, 2016, 7:09pm

awk -F"\t" '
NR==1 {printf $1 " " ; l=$NF + 1;}
NR>2 {printf ", "}
NR>1 {printf l ".." $2 - 1; l=$NF + 1}
END {print ""}
' infile

pilnet101 · July 27, 2016, 11:53am

In the 2nd column of your 2nd line you have the number 269. In your output you have got 265, shouldn't this be 268?

The below is a pure shell solution:

f=0 ; while read -r one two three; do
  ((!$f)) && { printf "%s" "${one} $(($three+1)).."; f=1; continue;}
  printf "%s, %s.." "$(($two-1))" "$(($three+1))"
done < datafile; print

Yoda · July 27, 2016, 12:36pm

Another awk approach if you have different first column records:-

awk -F'\t' '
        {
                ++R[$1]
                A2[$1 FS R[$1]] = $2
                A3[$1 FS R[$1]] = $3
        }
        END {
                for ( k in R )
                {
                        i = 1
                        printf "%s\t", k
                        while ( i < R[k] )
                        {
                                printf "%d..%d ", A3[k FS i]+1, A2[k FS (i+1)]-1
                                ++i
                        }
                        printf "\n"
                }
        }
' file