I am trying to write a script that will take 2 or more instances of repetitive alphabets (ZZ) to be removed from a field. This should only happen from beginning and end of a field.
For Example :
Input File
a) ZZZIBM Corporation
b) ZZZIBM Corporation ZZZZZ
b) IBM ZZZ Corporation
Output Result should be as follow :
a) IBM Corporation
b) IBM Corporation
b) IBM ZZZ Corporation
It will be all Z's but it can be 2 or more repetitive Z. So will the sed command will work for 2 or more Z? Also, will it take Z's from beginning and end of a field? Z should not be taken away in between the words. Please advise.
I tried it and it returns the same result without cleaning ZZ.
sed 's/\(^.. \)Z*/\1/;s/Z*$//' zzz_test.dat
ZZZIBM Corporation
ZZZIBM Corporation ZZZZZ
IBM ZZZ Corporation
I also tried
sed "s/^\(.\)\1\{1,\}//;s/\(.\)\1\{1,\}$//" file
and it also not work.
Also, I need this to be done on a field not a file. I am extracting a field from a file already while looping through each line. Please advise. I am doing this in Linux OS.
$ cat file
ZZZIBM Corporation
ZZZIBM Corporation ZZZZZ
IBM ZZZ Corporation
$ sed "s/^\(.\)\1\{1,\}//;s/\(.\)\1\{1,\}$//" file
IBM Corporation
IBM Corporation
IBM ZZZ Corporation
$ cat zzz_test.dat
ZZZIBM Corporation
ZZZIBM Corporation ZZZZZ
IBM ZZZ Corporation
$ sed "s/^\(.\)\1\{1,\}//;s/\(.\)\1\{1,\}$//" zzz_test.dat
IBM Corporation
IBM Corporation ZZZZZ
IBM ZZZ Corporation
It is not able to clear up end ZZZZZ. Please advise.
It is end of line character at end. No Space. Everything else is working fine except the last ZZZ.
$ sed "s/^\(.\)\1\{1,\}//;s/\(.\)\1\{1,\} *$//" zzz_test.dat
IBM Corporation
IBM Corporation ZZZZZ
IBM ZZZZ Corporation
Please advise. Thanks a bunch for the help in this. The reason for this is we are writing a cleansing routine at work to clean data. All other code is done except the last part which is where I am stuck.