Remove part of a string from second field

rajv · September 20, 2011, 6:59am

I have a file in below format (pipe delimited):

1234__abc|John__abc|xyz
3345__abc|Kate__abc|xyz
55344|Linda__abc|xyz
33434|Murray|xyz

I want to remove any occurence of "__abc" in the second field of this file.
I did some research and found a way to replace the entire second field with another string:

sed 's/^\([^|]*\)|[^|]*|/\1|9999|/'

But I am not able to remove the "__abc" alone in the second field. Any help to do this would me much appreciated.

guruprasadpr · September 20, 2011, 7:06am

$ cat fil
1234__abc|John__abc|xyz
3345__abc|Kate__abc|xyz
55344|Linda__abc|xyz
33434|Murray|xyz

Output:

$ awk -F"|" '{sub("__abc","",$2);}1' OFS="|" fil
1234__abc|John|xyz
3345__abc|Kate|xyz
55344|Linda|xyz
33434|Murray|xyz

Guru.

rajv · September 20, 2011, 7:29am

Thanks a lot. That worked.

sk1418 · September 20, 2011, 7:41am

well if you have already touched sed, you were very close.

sed 's/__abc//2'  file

will give you what you need. I guess the missing part was the "2", right?

ctsgnb · September 20, 2011, 8:43am

@sk1418 :

No, your statement would remove the second "__abc" found.
So you would miss the __abc occurrence that appear in the third line because it is the first occurrence in the line.

$ cat tst
1234__abc|John__abc|xyz
3345__abc|Kate__abc|xyz
55344|Linda__abc|xyz
33434|Murray|xyz

$ sed 's/__abc//2' tst
1234__abc|John|xyz
3345__abc|Kate|xyz
55344|Linda__abc|xyz
33434|Murray|xyz

But you could go with this instead (assuming the whole file having the same formatting than the given example) :

$ cat tst
1234__abc|John__abc|xyz
3345__abc|Kate__abc|xyz
55344|Linda__abc|xyz
33434|Murray|xyz

$ sed 's/|\(.*\)__abc|/|\1|/' tst
1234__abc|John|xyz
3345__abc|Kate|xyz
55344|Linda|xyz
33434|Murray|xyz

sk1418 · September 20, 2011, 9:15am

ctsgnb:

@sk1418 :

No, your statement would remove the second "__abc" found.
So you would miss the __abc occurrence that appear in the third line because it is the first occurrence in the line.
$ cat tst
1234__abc|John__abc|xyz
3345__abc|Kate__abc|xyz
55344|Linda__abc|xyz
33434|Murray|xyz
$ sed 's/__abc//2' tst
1234__abc|John|xyz
3345__abc|Kate|xyz
55344|Linda__abc|xyz
33434|Murray|xyz
But you could go with this instead (assuming the whole file having the same formatting than the given example) :
$ cat tst
1234__abc|John__abc|xyz
3345__abc|Kate__abc|xyz
55344|Linda__abc|xyz
33434|Murray|xyz
$ sed 's/|$.*$__abc|/|\1|/' tst
1234__abc|John|xyz
3345__abc|Kate|xyz
55344|Linda|xyz
33434|Murray|xyz

thanks for pointing this out. I didn't notice the special 3rd line.

your sed 's/|$.*$__abc|/|\1|/' tst works great for this example. however also not so generic.

e.g.

kent$  cat a
1234__abc|John__abc|xyz
3345__abc|Kate__abc|xyz
55344|Linda__abc|xyz__abc|xx
33434|Murray|xyz

yours:
kent$  sed 's/|\(.*\)__abc|/|\1|/' a
1234__abc|John|xyz
3345__abc|Kate|xyz
55344|Linda__abc|xyz|xx
33434|Murray|xyz

I made one with sed, it works, however don't know if it is the best solution with sed.

kent$  sed -r 's/\|/\x034/2;s/__abc\x034/|/;s/\x034/|/' a
1234__abc|John|xyz
3345__abc|Kate|xyz
55344|Linda|xyz__abc|xx
33434|Murray|xyz