Kind of cumbersome I know, but the first thing we needed to do was clean up the file with sed then use awk.
The file awk.test contains the following:
1 3000012 . A G 126 . DP=51;VDB=0.0000;AF1=1;AC1=2;DP4=3,0,47,1;MQ=31;FQ=-99;PV4=1,1,0.31,1
sed -e 's: * :;:g' awk.test \
| sed 's:[A-Z][A-Z][A-Z]=::g' \
| sed -e 's:[A-Z][A-Z][A-Z][0-9]=::'g \
| sed -e 's:[A-Z][A-Z][0-9]=::g' \
| sed -e 's:[A-Z][A-Z]=::g' \
| awk -F\; '{print$1" "$2" "$3" "$4" "$5" "$6" "$7" "$8" "$9" "$10" "$11" "$12" "$13" "$14" "$15}'
The first sed command is cleaning up the white space in the beginning and replacing it with semicolon. The next 4 sed commands are getting rid of the identifier and just leaving it with the value. Once everything is done, the last awk command prints out your results.
Let me know if this helps or if you need more.
That produces this:
1 3000012 . A G 126 . 51 0.0000 1 2 3,0,47,1 31 -99 1,1,0.31,1
I am sorry that i was not clear with the question.
The file is a tab-delimited text file with 8 columns and the 8th column having the text DP=51;VDB=0.0000;AF1=1;AC1=2;DP4=3,0,47,1;MQ=31;FQ=-99;PV4=1,1,0.31,1
I just need to split the text under INFO into columns, which means the text under INFO should be split into individual coulmns
CHROM POS ID REF ALT QUAL FILTER DP VDB AF1 AC1..................PV4
1 3000012 . A G 126 . 51 0.000 1 2 1,1,0.31,1.
awktest.txt is the original file and test1.txt is the file which was generated by using the commands posted by you in which only the identifiers are removed and the numbers are left alone. I need the numbers into different columns which are tab-separated with their respective headers.
The awk code removes the alphabets [A-Z] in the 8th column and replaced with nothing. Could it be possible to keep those alphabets as headers to the respective numbers which should look like,