Dear folks
Hello
I have a data set which one of the column of this data set are string and I want to extract numbers which is between two ":". However, I know the substr command which will do this operation but my problem is the numbers between two ":" have different digits. this will make my extraction difficult. I will put a part of my data set for better understanding.
Thank you kenshinhimura for your suggestion. Because this column in the data set is the tenth column of my data set. Could you please tell me how could I specify the number of column in this command?
I want to extract the red numbers from the string column I have. when I run your suggestion command it remove the red numbers which are between two colon.I just want to keep those numbers.
---------- Post updated at 04:33 PM ---------- Previous update was at 04:29 PM ----------
Dear Don Cragun
I have 300 files which I used this command below to extract column 1, 2, 4, 5, and the first three number of the 10 column of my data set.
for file in *.work; do awk '{print $1,$2,$4,$5,substr($10,1,1),substr($10,3,1)}' $file > "$(basename "$file" .work).info"; done
My problem is that the number which are in the middle of the sting column tenth have different digit numbers.
This my one row of my raw data set with 10 column:
gi|358485511|ref|NC_006088.3| 699545 . A G 122.03 PASS AC=2;AF=1.00;AN=2;DP=6;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=56.04;MQ0=0;QD=20.34 GT:AD:PP:GQ:PL 1/1:0,6:8:12:150,12,0
My desire output is:
gi|358485511|ref|NC_006088.3| 699545 A G 1 1 8
To reminding, I have 300 file with a same data set structure.