finding and removing block of identical strings

cocostaec · May 16, 2011, 10:14am

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings...
for example the awk code removing consecutive identical word is:

#!/usr/bin/awk -f
BEGIN{
RS="[[:space:]]+";
ORS=""
}
match($0,/^([[:punct:]]*)([^[:punct:]]+)([[:punct:]]*)$/,f){
if(x != f[2])
{
print y$0;z = FNR
}
x = f[2];
y = RT
}
END{
if(z != FNR)
print f[3]"\n"
}

input:

"ana are mere mere
mere si portocale
ion are prune prune."

output:

"ana are mere si portocale
ion are prune."

and now i want to expand the code to do the following:
input:

"ana are ana are mere
ion are prune ion are prune"

output:

"ana are mere
ion are prune"

thanks

vgersh99 · May 16, 2011, 10:29am

what's the difference between:

[unct:]

and

[: punct:]

cocostaec · May 16, 2011, 10:30am

sorry...it is

  [:punct:]

in both cases...the punctuation signs