[Solved] Replace character in 3rd column and leave 1rst and last

Ophiuchus · February 1, 2014, 12:47am

Hello to all,

I have the following text where columns are separated by spaces. I want to have the 3rd column separating
3 strings with 2 "_" in the format below:

LeftSring_CentralString_RightString

So, in 3rd column I want to replace all "" with "-", except the first and last ""

The input file is (in red "_" that should be replaced):

573839 12737 XFFK_UUD-KKDD_JDUDU_PPWI28
24300123 9927827 LUUUEO_OP-MJJF_JJI98_5526_TTTKC
4429999 988601 UDDYDYY_AAAABBNV_RWYYY
7878 92222 HHDRR_IPd.koo_ljjdLPP-IIDUUDD.OPPE_T23JJK

and the output desired (in orange new character "-"):

573839 12737 XFFK_UUD-KKDD-JDUDU_PPWI28
24300123 9927827 LUUUEO_OP-MJJF-JJI98-5526_TTTKC
4429999 988601 UDDYDYY_AAAABBNV_RWYYY
7878 92222 HHDRR_IPd.koo-ljjdLPP-IIDUUDD.OPPE_T23JJK

May somebody help to do this with awk or sed?

Thanks in advance

Don_Cragun · February 1, 2014, 2:03am

What have you tried?

Ophiuchus · February 3, 2014, 10:16am

Hello Don,

Sorry for my late response. I've tried with gensub as below, but is replacing all "" with "-" in 3rd column
and I don't want to replace the first "" and last "_", only replace from the 2nd underscore to n-1 underscore.

Code tried:

awk '{print $1, $2, gensub(/_/,"-",g,$3) }'
573839 12737 XFFK-UUD-KKDD-JDUDU-PPWI28
24300123 9927827 LUUUEO-OP-MJJF-JJI98-5526-TTTKC
4429999 988601 UDDYDYY-AAAABBNV-RWYYY
7878 92222 HHDRR-IPd.koo-ljjdLPP-IIDUUDD.OPPE-T23JJK

For the output I'm getting above, the dashes in red should not be replaced from underscore to dash. The output desired below:

573839 12737 XFFK_UUD-KKDD-JDUDU_PPWI28
24300123 9927827 LUUUEO_OP-MJJF-JJI98-5526_TTTKC
4429999 988601 UDDYDYY_AAAABBNV_RWYYY
7878 92222 HHDRR_IPd.koo-ljjdLPP-IIDUUDD.OPPE_T23JJK

Maybe someone could help me with this.

Thanks

anbu23 · February 3, 2014, 11:17am

$ awk -F"_" ' { printf("%s",$1 FS $2);for(i=3;i<NF;i++) printf("%s","-"$i); printf("%s\n",FS NF) } ' file
573839 12737 XFFK_UUD-KKDD-JDUDU_4
24300123 9927827 LUUUEO_OP-MJJF-JJI98-5526_5
4429999 988601 UDDYDYY_AAAABBNV_3
7878 92222 HHDRR_IPd.koo-ljjdLPP-IIDUUDD.OPPE_4

Don_Cragun · February 3, 2014, 11:49am

I'm not familiar with gensub(). Assuming that there are at least 2 underscores in field 3, you could try something like:

awk '
{       n = split($3, x, /_/)
        printf("%s %s %s_", $1, $2, x[1])
        for(i = 2; i <= n; i++)
                printf("%s%s", x, i == n ? "\n" : i == (n - 1) ? "_" : "-")
}' input

This should work even if there are underscores in field 1 or field 2.

If there will never be any underscores in field 1 or field 2 and there are at least two underscores in field 3, anbu23's suggestion should also work if you change the last printf from printf("%s\n",FS NF) to printf("%s\n",FS $NF) .

If you want to run either of these script on a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of the default /usr/bin/awk .

Ophiuchus · February 3, 2014, 3:11pm

Thanks so much anbu23 and Don for your solution and fix anbu�s code. It worked so nice!

Regards