Print every 5 4th column values as separate row with different first column

jacobs.smith · February 20, 2013, 2:15pm

Hi,

I have the following file,

chr1 100 200 20
chr1 201 300 22
chr1 220 345 23
chr1 230 456 33.5
chr1 243 567 90
chr1 345 600 20
chr1 430 619 21.78
chr1 870 910 112.3
chr1 914 920 12
chr1 930 999 13

My output would be

peak1 20 22 23 33.5 90
peak2 20 21.78 112.3 12 13

Here the name "peak" should be in the code.

Don_Cragun · February 20, 2013, 2:47pm

This seems to do what you want:

awk 'NR % 5 == 1 {
        # 1st line in a set of 5 lines.
        # Set the saved output string to the 4th field in this line.
        o = $4
        next
}       
NR % 5 {# 2nd through 4th line in a set of 5 lines.
        # Add the 4th field in this line to the saved output string.
        o = o " " $4
        next
}       
{       # 5th line in a set of 5 lines.
        # Print the result of processing this set of 5 lines.
        printf("peak%d %s %s\n", ++oc, o, $4)
}       
END {   # If the input file had less than 5 lines in the last set, print the
        # partial set.
        if(NR % 5) printf("peak%d %s\n", ++oc, o)
}' file

If you are using a Solaris/SusOS system, use /usr/xpg4/bin/awk or nawk instead of awk .

user8 · February 20, 2013, 2:51pm

Another option which also seems to work:

awk '{str = str FS $4}!(NR%5){print "peak" ++c str; str=""}END{if (str) print "peak" ++c str}'

DGPickett · February 20, 2013, 4:20pm

ct=0 suf=0 kr=
while read x x x k
do
 kr="$kr $k"
 if (( ++ct > 4 ))
  then
   echo "peak$(( ++suf ))$kr"
   ct=0 kr=
  fi
 done < input