[Solved] Replace character in 3rd column and leave 1rst and last

Hello to all,

I have the following text where columns are separated by spaces. I want to have the 3rd column separating
3 strings with 2 "_" in the format below:

LeftSring_CentralString_RightString

So, in 3rd column I want to replace all "" with "-", except the first and last ""

The input file is (in red "_" that should be replaced):

573839 12737 XFFK_UUD-KKDD_JDUDU_PPWI28
24300123 9927827 LUUUEO_OP-MJJF_JJI98_5526_TTTKC
4429999 988601 UDDYDYY_AAAABBNV_RWYYY
7878 92222 HHDRR_IPd.koo_ljjdLPP-IIDUUDD.OPPE_T23JJK

and the output desired (in orange new character "-"):

573839 12737 XFFK_UUD-KKDD-JDUDU_PPWI28
24300123 9927827 LUUUEO_OP-MJJF-JJI98-5526_TTTKC
4429999 988601 UDDYDYY_AAAABBNV_RWYYY
7878 92222 HHDRR_IPd.koo-ljjdLPP-IIDUUDD.OPPE_T23JJK

May somebody help to do this with awk or sed?

Thanks in advance

What have you tried?

Hello Don,

Sorry for my late response. I've tried with gensub as below, but is replacing all "" with "-" in 3rd column
and I don't want to replace the first "
" and last "_", only replace from the 2nd underscore to n-1 underscore.

Code tried:

awk '{print $1, $2, gensub(/_/,"-",g,$3) }'
573839 12737 XFFK-UUD-KKDD-JDUDU-PPWI28
24300123 9927827 LUUUEO-OP-MJJF-JJI98-5526-TTTKC
4429999 988601 UDDYDYY-AAAABBNV-RWYYY
7878 92222 HHDRR-IPd.koo-ljjdLPP-IIDUUDD.OPPE-T23JJK

For the output I'm getting above, the dashes in red should not be replaced from underscore to dash. The output desired below:

573839 12737 XFFK_UUD-KKDD-JDUDU_PPWI28
24300123 9927827 LUUUEO_OP-MJJF-JJI98-5526_TTTKC
4429999 988601 UDDYDYY_AAAABBNV_RWYYY
7878 92222 HHDRR_IPd.koo-ljjdLPP-IIDUUDD.OPPE_T23JJK

Maybe someone could help me with this.

Thanks

$ awk -F"_" ' { printf("%s",$1 FS $2);for(i=3;i<NF;i++) printf("%s","-"$i); printf("%s\n",FS NF) } ' file
573839 12737 XFFK_UUD-KKDD-JDUDU_4
24300123 9927827 LUUUEO_OP-MJJF-JJI98-5526_5
4429999 988601 UDDYDYY_AAAABBNV_3
7878 92222 HHDRR_IPd.koo-ljjdLPP-IIDUUDD.OPPE_4
1 Like

I'm not familiar with gensub(). Assuming that there are at least 2 underscores in field 3, you could try something like:

awk '
{       n = split($3, x, /_/)
        printf("%s %s %s_", $1, $2, x[1])
        for(i = 2; i <= n; i++)
                printf("%s%s", x, i == n ? "\n" : i == (n - 1) ? "_" : "-")
}' input

This should work even if there are underscores in field 1 or field 2.

If there will never be any underscores in field 1 or field 2 and there are at least two underscores in field 3, anbu23's suggestion should also work if you change the last printf from printf("%s\n",FS NF) to printf("%s\n",FS $NF) .

If you want to run either of these script on a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of the default /usr/bin/awk .

1 Like

Thanks so much anbu23 and Don for your solution and fix anbu�s code. It worked so nice!

Regards