The below awk
executes as is and produces the current output. It isvery close but what Ican not seem to do is add the -exon...
, the ... portion comes from $1
and the _exon
is static and will never change. If there is +
sign in $4
then the ... is in acending order or sequential. If there is a -
in $4
then the order is descending or in reverse. I think I need and if statement but not sure how to increment or subtract the value correctly. Thank you :).
example of ordering based on $4
+ = exon 1,2,3
- = exon 3,2,1
file tab-delimited
208 NR_120664.1 chr5 + 141704857 141843619 141843619 141843619 4 141704857,141724980,141732790,141843534, 141704935,141725050,141733148,141843619, 0 SPRY4-AS1 unk unk -1,-1,-1,-1,
1161 NM_021615.4 chr16 - 75507021 75528926 75512538 75513726 3 75507021,75515714,75528837, 75513742,75515789,75528926, 0 CHST6 cmpl cmpl 0,-1,-1,
1799 NM_002036.3 chr1 + 159173802 159176290 159174749 159176240 2 159173802,159175250, 159174770,159176290, 0 ACKR1 cmpl cmpl 0,0,
current output tab-delimited
4 + SPRY4-AS1 NR_120664.1 chr5:141704857-141704935 chr5:141724980-141725050 chr5:141732790-141733148 chr5:141843534-141843619
3 - CHST6 NM_021615.4 chr16:75507021-75513742 chr16:75515714-75515789 chr16:75528837-75528926
2 + ACKR1 NM_002036.3 chr1:159173802-159174770 chr1:159175250-159176290
desired output tab-delimited
4 + SPRY4-AS1 NR_120664.1 chr5:141704857-141704935_exon1,chr5:141724980-141725050_exon2,chr5:141732790-141733148_exon3,chr5:141843534-141843619_exon4
3 - CHST6 NM_021615.4 chr16:75507021-75513742_exon3,chr16:75515714-75515789_exon2,chr16:75528837-75528926_exon1
2 + ACKR1 NM_002036.3 chr1:159173802-159174770_exon1 chr1:159175250-159176290_exon2
awk
awk -F '\t' '{sf="";len1=split($10,s1,",");split($11,s2,","); for (i=1;i<len1;i++){sf=sf $3":"s1"-"s2" "}print $9,$4,$13,$2,sf}' OFS='\t' file > out