Pattern replacing

Hi,

I have a text file with lots of text (strings,numbers,special characters etc). I am trying to replace any occurrence of these strings :

90%
91%
92%
....
100%

I want to replace them with :

"90%"
"91%"
"92%"
....
"100%"

I am now using 10 sed commands for replacement but I know it's stupid. There should be a better way. How do I do it with an one liner?

sed -e 's/[0-9]\{1,\}%/"&"/g' file
perl -pe 's/\d+%/"$&"/g' file
1 Like

Thanks for the quick reply. I am facing a small issue. Its replacing other numbers too. I want to specifically replace only percentages from 90 to 100.

sed -e 's/\(^\| \)\(9[0-9]%\|100%\)/\1"\2"/g' file
perl -pe 's/\b(?:9[0-9]|100)%/"$&"/g' file
1 Like

Aia,
Standard sed uses BREs; not EREs. Basic regular expressions do not include alternation (i.e., BRE|BRE ). Furthermore, we don't know what separates a percentage to be quoted from its surroundings. With the following input:

123% 992% 100% 90% 93%92%,100%(90%+10%) 77.99%

a standard sed (with your suggested command) produces the output:

123% 992% 100% 90% 93%92%,100%(90%+10%) 77.99%

(which I do not believe is what is wanted) and your perl script produces:

123% 992% "100%" "90%" "93%""92%","100%"("90%"+10%) 77."99%"

(which I assume is closer to what is wanted).

I believe that what was requested was:

123% 9"92%" "100%" "90%" "93%""92%","100%"("90%"+10%) 77."99%"

but, without confirmation that that really is what is wanted from ctrid, I'm not going to try to produce a different sed or awk script that does what I might think would be a more reasonable interpretation.

ctrid,
Please give us a clear specification of what, if any, characters or strings appearing adjacent to a percentage should keep it from being quoted. (If a period or comma is to be interpreted as part of a percentage, are these characters locale specific?) Should something like 91.50% (in the C Locale) be quoted (since it is in the range 90% to 100%, inclusive)?

1 Like

Don,
A times overthinking it is paralyzing, as it just happened to you.
Appreciate you.

1 Like

Hi Don,

Very good catch. Even I didn't anticipate the decimals.
As you said there could be decimals.

But

992%

or

 93%92%

or

100%(90%+10%)

would not be in my input text file. All percentages are delimited by space and no periods or any other characters appear anywhere.

Hence only danger I see is of decimals. Don, Thanks once again for pointing out this.
Aia, your one liner is cool, it works for now. As Don said I have an issue only if my input file changes with decimals. How do I modify this perl statement to take care of that?

With a standards conforming sed utility you could try:

sed -e 's/^100\([.]0*\)\{0,1\}%/"&"/g' \
    -e 's/ \(100\([.]0*\)\{0,1\}%\)/ "\1"/g' \
    -e 's/^9[0-9]\([.][0-9]*\)\{0,1\}%/"&"/g' \
    -e 's/ \(9[0-9]\([.][0-9]*\)\{0,1\}%\)/ "\1"/g' file

On a system using a GNU sed utility, you'd have to change that to:

sed --posix -e 's/^100\([.]0*\)\{0,1\}%/"&"/g' \
    -e 's/ \(100\([.]0*\)\{0,1\}%\)/ "\1"/g' \
    -e 's/^9[0-9]\([.][0-9]*\)\{0,1\}%/"&"/g' \
    -e 's/ \(9[0-9]\([.][0-9]*\)\{0,1\}%\)/ "\1"/g' file

You can flatten these to one-liners by removing the backslashes and <newline>s, but I find them easier to read this way.

If the file named file contains:

100% 100%
100.00000000% 100.00%
100.0000000000000001% 100.1%
101% 123%
10% 10%
10.0% 10.0%
89.9999999% 89.9%
90% 90%
90.123% 90.987%
99.94% 99.94%
90% 90% 8.98% 92% 193% 96.96%
9.98% 9.98%

it produces the output:

"100%" "100%"
"100.00000000%" "100.00%"
100.0000000000000001% 100.1%
101% 123%
10% 10%
10.0% 10.0%
89.9999999% 89.9%
"90%" "90%"
"90.123%" "90.987%"
"99.94%" "99.94%"
"90%" "90%" 8.98% "92%" 193% "96.96%"
9.98% 9.98%
1 Like

Try and see if does what you want.

perl -pe 's/(?:9\d(?<![0-8]\d)\.\d+|(?<!\S)9\d|(?<!\S)100(\.0+)?)%/"$&"/g' file
1 Like

That doesn't quote either of the following in my sample input file:

100.00000000% 100.00%

which both seem to meet the stated requirements.

1 Like

Thanks a ton for the suggestions. Its a privilege to interact with experts like you. Thanks Aia and Don.

This awk should do it.

awk ' { gsub("[0-9]%","&\"",$0); gsub("^[0-9]","\"&",$0); gsub(" [0-9]","\"&",$0); gsub("\" ",FS"\"",$0) }1'

Anything wrong with this ?

The request was to quote percentages between 90% and 100% inclusive when the percentage appears at the start of a line or is preceded by a space character. So, with input like:

Get a loan of 1000 dollars at 5% interest.

no change should be made. But your suggestion changes it to:

Get a loan of "1000 dollars at "5% "interest.