How to replace multiple "&nbsp;" entry with in <td> tag into single entry using sed?

I have the input file like this.

Input file: 12.txt

1) There are one or more than one <tr> tags in same line.
2) Some tr tags may have one <td> or more tna one <td> tags within it.
3) Few <td> tags having "<td> � </td>". Few having more than one "�" entry in it.

<tr> some td tags </tr>
<tr> some td tags </tr><tr>some td tags</tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> � </td> <td> � </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> � � </td> <td> � </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � � � </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � � � � � </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � � � </td> <td> � </td></tr><tr>some td tags</tr>

Expected Output file:
I want to remove the multiple "�" entry if exists within <td> and want to display only one "�" entry like <td> � </td> like below.

<tr> some td tags </tr><tr>some td tags</tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> � </td> <td> � </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> � </td> <td> � </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � </td> <td> � </td></tr><tr>some td tags</tr>

Tried with these some sed commands. Not getting expected output. Please help on this.

sed -e 's/<td> � � /<td> � /g' 12.txt
sed -e 's/<td> � /<td> � /g' 12.txt

info sed :

So - use the escaped \& sequence there. And, for multiple search patterns, use a group regex with the * character:

sed 's/<td> \(� \)*/<td> \� /g' file
<tr> some td tags </tr>
<tr> some td tags </tr><tr>some td tags</tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> � </td> <td> � </td><td>some text</td></tr>
<tr> some td tags </tr><tr><td> � </td> <td> � </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � </td><td>some text</td></tr><tr>some td tags</tr>
<tr> some td tags </tr><tr><td> � </td> <td> � </td></tr><tr>some td tags</tr>
1 Like

-e is redundant here. If your SED supports extended regexps:

$ echo "� � � a � � �" | sed -r 's/(� *)+/\�/g'
�a �

$

The ( ) brackets group a whole section, after which is a + for "one or more repeats of this expression".

The & has to be escaped in the output expression as \&, otherwise & has the special meaning "the entire matched expression" which would end up adding MORE �

Question though: If specific numbers of non-breaking spaces aren't meant to be there, are any non-breaking spaces meant to be there? Why not replace them entirely with non-breaking spaces?

1 Like

Corona688 is right - use the + repetition indicator instead of the * (which indicates zero or more repetitions).

1 Like

You could also try the following, which, other than copying the 1st line of your sample input file unchanged to the output, seems to create the output you said you want (and I don't understand from your description why the 1st line should not be copied unchanged):

sed ':x
s/� �/\�/g
tx' 12.txt
1 Like