Replace consecutive occurrence of string in same line

Hi All,

I have a requirement to replace consecutive occurence of same string nedd to be replaced. Below is the input and desired output.

Input:
---------

123.5|ABC|.|.|.
234.4|DEF|.|.|.|.|.|

Output:
---------

123.5|ABC|||.
234.4|DEF|||||

so basically "|.|" need to be replaced with "||"

I tried with sed but its replacing only once.
sed 's/|\.|/||/g' . below is the output from this sed command.

123.5|ABC||.|.
234.4|DEF||.||.||

Please any one provide the correct command.

$ sed "s/\.|/|/g" file
123.5|ABC|||.
234.4|DEF||||||

This doesn't really satisfy the "requirement to replace consecutive occurence of same string", it only removes full-stops.

1 Like

Hi Scott,
thank you for checking this issue. your solution is working for most of the cases of my input file but its giving issue if the data as below.

Input:

123.5|ABC.|.|.|.
234.4|DEF|.|.|.|.|.|

Desired output :

123.5|ABC.|||.
234.4|DEF||||||

But your solution is giving the output as below.

123.5|ABC|||.
234.4|DEF||||||

So basically if the string ABC has included '.' then its not working as expected. So I thing we need to look for the entire string "|.|" instead of ".|"

Thank you.

With the g modifier, what matched once is out of scope for another match.
But a loop can do it

sed '
:L
s/|\.|/||/
tL
'

Another solution is perl: by using a look-ahead an RE substitution with g modifier will do it.

---------- Post updated at 03:10 ---------- Previous update was at 00:58 ----------

For completeness, here it is:

perl -pe 's/\|\.(?=\|)/|/g'

The (?= ) is the look-ahead; hard to remember, I always consult this tutorial.
Because the look-ahead is not part of the match, it must be not restored in the substitution.
Perl(version >= 5) uses an extended regular expression: | means "or", must be \ escaped.

1 Like

Another solution is to just run your global substitution twice:

sed -e 's/|\.|/||/g' -e 's/|\.|/||/g'

which, with your latest sample input:

123.5|ABC.|.|.|.
234.4|DEF|.|.|.|.|.|

produces the output:

123.5|ABC.|||.
234.4|DEF||||||
2 Likes

Running the sed twice will also fixing the issue but the there are chances that the string will repeat more than twice also and hard to identify how many times the string will repeat in the file.
Hence used the perl solution which is working in all conditions.
Thank you so much all of you for the solution.

I didn't say to run sed twice; I said to run sed once using the global substitution twice. Running that global substitute twice will take care of ALL occurrences of the pattern you said you wanted to change. You might remember that the 2nd sample input you provided:

234.4|DEF|.|.|.|.|.|

contained 5 occurrences of the pattern you wanted to modify and that sed command produced exactly the output you said you wanted:

234.4|DEF||||||

removing all 5 occurrences of periods between vertical bars; not just two of them.

1 Like

Hi Don,

yes, global substitution twice will resolve the issue :b: . Sorry I over looked your solution.

Try:

awk '{for(i=2; i<NF; i++) if($i==".") $i=x}1' FS=\| OFS=\| file

How Don's solution works:

original string....: |.|.|.|.|.|.|.|.|.|.|.|.|.|.|
after first regexp.: ||.||.||.||.||.||.||.|
after second regexp: |||||||||||||||

I hope this helps.

bakunin

1 Like