Shell Scripting

Sir, I have a big datafile in the given format. I have to extract the lines which are starting with �Chr� followed by number like �Chr5�, �Chr25� etc. and fix the range for each line by subtracting 10 from that number left and adding 10 to that number right. (or in other way; remove the lines starting with miR, let,CH, number, Chr without number: shown in blue color) For example, Chr5-26236044 has to be displayed like Chr5:26236034-26236054. If the �Chr5� with same range is came in the output file, n-1 entries are to be removed otherwise all Chr5 must be present. In the given output Chr5 with same data range came twice. One has to be removed (shown in red color). Similarly Chr2 came 3 times with different ranges. So all three should be there. I given the output required. The output format must be like this Chrno:number1-number2 (no spaces). Shell scripting for this is highly appreciated. Thanks in Advance.

 
Chr
 
Chr
Chr5-26236044
Chr25-2622227
Chr10-23813153
ChrX-62081599
miR-1-1-3p;Chr13:55237544-55237619
Chr18-31139230
miR-2331-3p;Chr19:15308148-15308218
CH240-242E2-CH240-416P12-96217
Chr2-66268692
miR-2379-5p;Chr23:30788153-30788230
Chr13-3857984
Chr23-29971922
let-7a-2-5p;Chr15:33347557-33347652
Chr4-120427453
Chr2-119023403
miR-2347-3p;Chr19:51593973-51594031
Chr25-21194342
miR-449b-5p;Chr20:23967269-23967366
Chr25-9506360
Chr2-66270795 
Chr5-26236044
miR-2484-5p;ChrX:20461131-20461206
93748382

Output required:

 
Chr5:26236034-26236055
Chr25:2622217-2622237
Chr10:23813143-26813163
ChrX:62081589-62081609
Chr18:31139220-31139240
Chr2:66268682-66268702
Chr13:3857974-3857994
Chr23:29971912-29971932
Chr4:120427443-120427463
Chr2:119023393-119023413
Chr25:32391131-32391151
Chr25:9506350-9506370
Chr2:66270785-66270805
Chr5:26236034-26236055
awk -F"-" '/^Chr[0-9[X]/{print $1":"$2-10"-"$2+10}' file
1 Like
grep ^Chr. input | while read a ; do echo "${a%-*}:$((${a#*-}-10))-$((${a#*-}+10))" ; done
1 Like
awk -F- '/^Chr[0-9]/&&!A[$0]++{print $1":"$2-10 FS $2+10}' file
1 Like

Didn't read this part so my command is not totaly correct. @op, you should use scrutinizer answer.

@scrutinizer:
I believe you're missing a X in /^Chr[0-9X]/ ?

The OP's post is contradictory, because on the first line it says it should be followed by a number, but if ChrX is valid too then indeed it should be /^Chr[0-9X]/

1 Like