I know this is strictly a programming forum - but I consider vi a programming enabler and the question relates to regex you'd use with awk/sed anyway....
I have a file which is 50,000+ lines long and need to change many many instances of
word_word_word
to be
word+word+word
where 'word' can be any 'word' and it is the underscore character that needs to be changed to a plus character.
I'm happy with
:1,$ s/some pattern/new pattern/g
type changes, but can't seem to nail this.
I've tried
:1,$ s/[Aa-Zz]_[Aa-Zz]/[Aa-Zz]+[Aa-Zz]/g
but that puts many [Aa-Zz] patterns in the file......
Any help humbly appreciated.
-----------------------------------------------------------
Brett
Brett, buddy...
%s/\([^_][^_]*\)_\([^_][^_]*\)_\([^_][^_]*\)/\1+\2+\3/g
or similarelly with sed:
sed 's/\([^_][^_]*\)_\([^_][^_]*\)_\([^_][^_]*\)/\1+\2+\3/g' myFile.txt > myNewFile.txt
Vgresh,
Can you explain what does the numbers 1 2 3 siginfy here in the script.
Please help me to understand.
quoting 'man sed'
The characters \n, where n is a
digit, will be replaced by the text matched by
the corresponding backreference expression.
For each backslash (\) encountered in scanning
replacement from beginning to end, the follow-
ing character loses its special meaning (if
any). It is unspecified what special meaning
is given to any character other than &, \ or
digits.
let me 'color-code' it:
%s/\([^_][^_]*\)_\([^_][^_]*\)_\([^_][^_]*\)/\1+\2+\3/g
Thanks Vgresh. One more question please.
What does
?
^ signifies starting of the line. But what does _ represent?
I appreciate your reply.
Thanks friend.
quoting from 'man regexp':
1.4 A non-empty string of characters enclosed in square
brackets ([]) is a one-character RE that matches any
one character in that string. If, however, the first
character of the string is a circumflex (^), the one-
character RE matches any character except new-line and
the remaining characters in the string. The ^ has this
special meaning only if it occurs first in the string.
The minus (-) may be used to indicate a range of con-
secutive characters; for example, [0-9] is equivalent
to [0123456789]. The - loses this special meaning if
it occurs first (after an initial ^, if any) or last
in the string. The right square bracket (]) does not
terminate such a string when it is the first character
within it (after an initial ^, if any); for example,
[]a-f] matches either a right square bracket (]) or
one of the ASCII letters a through f inclusive.
given the above, the ' [^]' means: any character BUT the ''