Match exactly a string

I am formatting my code and for that I am trying to write a script which can quicken some repetitive work.

I need to match "==" exactly in a string and replace it by inserting a (single) blank space before and after it.

Sample Strings:

this.something    =='something'.that
this.something    == 'something'.that
this.something=='something'.that
this.something    ==    'something'.that
this.something== 'something'.that

Expected Output:

this.something == 'something'.that

My expression:

(\w+)\s*(==)\s*(['"]?\w+["']?)

works fine when there's only "==" between two statements but it fails when when it encounters "===". It matches the "==" in "===" and produces output as:

this.something == =something.that

something that I don't want.

What's confusing me is that the above code seems to work fine for me in Notepad++ but doesn't work in grep/sed etc. Somehow I just feel that I must be doing something silly.

Please provide your suggestions.

Thanks.

Hello prohank,

Could you please try following and let me know if this goes good.

awk '{sub(/ +==/," == ");print}'   Input_file

Thanks,
R. Singh

Hi R. Singh,

Thanks for your reply.

Actually there could be more than one type of input statements. The provided one is just one of the type.

Let me add some more. Apologies from my part.

Try

awk '{gsub (/ *==/, "=="); gsub (/==  */, "=="); gsub (/==+/, " & ")}1' file

It's not nice to change your problem statement after someone has posted code that attempts to solve your original problem. Readers now can only guess at the problem that Ravinder was trying to help you solve.

I also note that your sample input lines contain leading spaces, but your sample output does not. Your problem statement doesn't say anything about removing leading whitespace???

Your problem statement is not clear about what should happen if more than one occurrence of == that is not adjacent to another = is present on a single line nor of what should happen to lines that do not contain any occurrences of == that are not adjacent to another = ???

When posting in the UNIX & Linux forums, you might frequently hear that using Notepad++ to edit UNIX format text files is a silly mistake and that you should learn to use vi . But, since you clearly can't use grep to edit files, it isn't at all clear what you are really trying to do.

If one were trying to change each line in a file that contains the first of one or more occurrences of a string starting with a character that is not an equal sign followed by zero or more space and/or tab characters followed by exactly two equal signs followed by zero or more space and/or tab characters followed by another character that is not an equal sign so that the two equal signs and the preceding and following spaces and tabs are change to one space, two equal signs and one more space AND any leading space and/or tab characters at the start of that line are removed, and all other lines in th file are printed unchanged, one could try something like:

awk '
match($0, /[^=][[:space:]]*==[[:space:]]*[^=]/) {
	$0 = substr($0, 1, RSTART) " == " substr($0, RSTART + RLENGTH - 1)
	sub(/^[[:space:]]*/, "")
}
1' file

Of course, you also didn't bother telling us what operating system and shell you're using (which could be very important for this problem if one were to suggest code using awk or sed . If you want to try the above code on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

1 Like

Thanks RudiC.

I tried your solution but its failing for this case:
input:

this.something    	                     == 'something'.that

returns:

this.something           == 'something'.that

Hi Don,

My original problem was still the same, its just that I wrongly expected the reader to identify other scenarios just by one sample input and a statement and I apologized for it too. (I am not a native English speaking person)

I don't need to remove the leading blank spaces. I will correct it.

Its not in my requirement.

I mentioned "grep" because I was trying to find the pattern initially and if I get the desired result I would run "sed" over it. But "grep" returned lines with "===" also while searching(with my code) for "==", hence the mention.
I am not actually editing unix files. I am just using unix shell commands to format my EJS code.
I am on windows 7 (not my machine) and don't have privileges to install other software so can't use vi. So I'm running script commands through Git Bash. Please don't judge me for it.

Whoa!
Thanks for your time. I thought samples would mean more.

Alas, thanks for your solution Mr Admin.

It doesn't consider <TAB> chars as those were not mentioned in post#1 nor in the examples. Try [[:space:]]* in lieu of the two * in my proposal.

1 Like

I think your regular expression is correct.
It uses perl extensions, and I see that some perl extensions have found their way into the Linux RE and ERE:mad:

# grep '\s\s\s' file
this.something    =='something'.that
this.something    == 'something'.that
this.something    ==    'something'.that
egrep '\s\s\s' file
this.something    =='something'.that
this.something    == 'something'.that
this.something    ==    'something'.that

Is this compliant with standards? Don, your valuable opinion?

I stick to perl for demonstration.
Your problem might be embedded perl/sed/grep code.
Within 'embedded' code you need to escape a ' as '\'' .

perl -lne '/(\w+)\s*(==)\s*(['\''"]?\w+["'\'']?)/ and print' file

Perl even allows to print the match in each ( )

perl -lne '/(\w+)\s*(==)\s*(['\''"]?\w+["'\'']?)/ and print "($1) ($2) ($3)"' file

No it is not. The standard doesn't use or mention perl REs in general nor \s in particular.

Utilities in the standard using basic REs (AKA BREs) such as grep (without -F and -E ), ed , ex , and sed don't even have the common C backslash escapes in REs. Except for awk and lex , the same is true for utilities in the standard using extended REs (AKA EREs). The awk and lex utilities do require that the C escapes for the alert ( \a ), backspace ( \b ), form feed ( \f ), new line ( \n ), carriage return ( \r ), tab ( \t ), and vertical tab ( \v ) characters be recognized in EREs. In all other utilities, the standard assumes that the literal characters (rather than backslash escape representations of them) will be used in REs. The standard also allows use of character class expressions (e.g., [[:space:]] ).

1 Like

True I'm sorry.:o

Adding [[:space::]] worked just fine.

Many thanks.

Hi @MadeInGermany,

Thanks for your inputs.