Retaining Spaces within a word
--------------------------------------------------------------------------------
Hi Experts,
I have a 2 GB flat file which have unicode field, some of them are blanks and its size is 4000 character. In the existing system SED command removes the spaces. Because of this field itself....it is taking almost three days to complete the file processing. I removed sed and used tr command...it worked in less than a minute. Now the challenging part is the character fields have more than one space, I am tr -s ' ' '' to remove the spaces, but it is removing the spaces inbetween the characters which is more than one space.
My sample record is this:
262774372|58959454 | Rajiv Rajiv | tuerueeu | | erueirei
647585858|784783434 | Ramesha Ramesha| tyuu5u4o| | ruieieiei
Earlier following is the command used to remove spaces:
sed s/[[:space]]|/|/g; s/[ \t]$//g < File1 > File2
Output was:
262774372|58959454|Rajiv Rajiv|tuerueeu||erueirei
647585858|784783434|Ramesha Ramesha|tyuu5u4o||ruieieiei
Time taken to process file was 3.5 days
Later I added tr command before the sed to remove spaces faster by adding the following
tr -s ' ' '' < File1 > File2
sed 's/[[:space]]|/|/g; s/[ \t]$//g;s/^[ \t]*//g;' < File 2 > File3
Output was:
262774372|58959454|Rajiv Rajiv|tuerueeu||erueirei
647585858|784783434| Ramesha Ramesha|tyuu5u4o||ruieieiei
Time taken to process file was less than a minute, since the big spaces are translated faster.
I am not able to retain the spaces between the characters as is, since tr -s will squeeze the space to one space.
The value | Rajiv Rajiv | -> changed to |Rajiv Rajiv|
I have to retain the space..... ie., |Rajiv Rajiv|
Please let me know if you have any workaround...
Thanks,
Rajiv