Find and replace with wildcard

daashti · June 14, 2017, 11:21am

HI there,

I am trying to find and replace with wildcard with
data

chr1	69511	69511	A	G	1/1:0,34:791,78,0:78:34	0/1:55,60:1130,0,1513:99:116	1/1:0,28:630,63,0:63:28	0/1:0,34:626,57,0:57:34

To this

chr1	69511	69511	A	G	homo	hetero	homo	hetero

Where I find and replace 0/1 with wildcard* to hetero
and 1/1 with wildcard* to homo

been experimenting with

sed 's/0\/1.*/hetero/g' file

but did achieve the desired result

Corona688 · June 14, 2017, 11:33am

sed does not understand fields / columns without a lot of effort, awk can loop through them.

Loop starts at 6 for efficiency, if the 0/1 can come in any field change it to 1.

$ awk -F"\t" -v OFS="\t" '{ for(N=6; N<=NF; N++) { if($N ~ /^1\/1:/) $N="homo" ; if($N ~ /^0\/1:/) $N="hetero" } } 1' het.txt

chr1    69511   69511   A       G       homo    hetero  homo    hetero

$

daashti · June 14, 2017, 2:14pm

Many thanks

Don_Cragun · June 14, 2017, 2:23pm

One could also try:

sed 's,0/1[^[:space:]]*,homo,g
s,1/1[^[:space:]]*,hetero,g' file

If all of your fields are <tab> separated and some fields might contain <space>s, replace each occurrences of the string [:space:] in the above with a literal <tab> character.

If you are using a Solaris/SunOS system, and the above command doesn't work; change sed to /usr/xpg4/bin/sed .