awk to change specific string to new value if found in text file

I am trying to use awk to change a specific string in a field, if it is found, to another value. In the tab-delimited file the text in bold in $3 contains the string 23 , which is always right before a . , if it is present.

I am trying to change that string to X , keeping the formatting and the name of the file the same. The awk below seems to change the formatting and may not be the best way. Thank you :).

file

Input Variant	Errors	Chromosomal Variant	Coding Variant(s)
NM_004992.3:c.274G>T		NC_000023.10:g.153297761C>A	XM_005274683.1:c.-6G>T	XM_005274682.1:c.-6G>T	XM_005274681.1:c.274G>T	LRG_764t2:c.274G>T	NM_004992.3:c.274G>T	LRG_764t1:c.310G>T	NM_001110792.1:c.310G>T

awk

awk -F'\t' '{ $3 = ($3 == "23" ? X : $3) } 1' OFS="\t" file

desired output

Input Variant	Errors	Chromosomal Variant	Coding Variant(s)
NM_004992.3:c.274G>T		NC_0000X.10:g.153297761C>A	XM_005274683.1:c.-6G>T	XM_005274682.1:c.-6G>T	XM_005274681.1:c.274G>T	LRG_764t2:c.274G>T	NM_004992.3:c.274G>T	LRG_764t1:c.310G>T	NM_001110792.1:c.310G>T

== is an equality comparison, it doesn't search substrings.

Try

awk -F"\t" '{ sub(/23/, "X", $3); } 1' infile > outfile
1 Like

To keep from matching 23 when it does not appear immediately before a period, you might want to change that to:

awk -F"\t" '{ sub(/23[.]/, "X.", $3); } 1' infile > outfile
1 Like

Thank you both very much :).