sed working slow on big files

sumoka · February 15, 2011, 6:35am

HI Experts ,

I'm using the following code to remove spaces appearing at the end of the file.

 
sed "s/[ ]*$//g" <filename> > <new_filename>
 
mv <new_filename> <filename>

this is working fine for volumes upto 20-25 GB.

for the bigger files it is taking more time that it is required for the original file generation.

I need to process files which will be 4-5 times bigger than this.

Please suggest a faster way.

Perderabo · February 15, 2011, 4:53pm

Not sure how much this buys you but: sed 's/ *$//' file > file1 seems like it should be a little faster.

sumoka · February 16, 2011, 6:15am

Thanks ! I can see slight saving on time.

but , is there any other way to do it ? any other command than sed ?

Perderabo · February 16, 2011, 1:16pm

Some process must write the file. Rewrite the process to omit the trailing spaces. Or pipe that process through the sed command as the file is written. As for faster, maybe perl:

perl -pe 's/ *$//'

. For real speed a custom c program is needed. But not writing the spaces to start with would be optimum.

sumoka · February 17, 2011, 1:41am

Thank you!

you are right. one last question is cut faster than sed ?

to avoid spaces I have modified the code but getting 2 junk characters at the beginning of every line . want to use

cut -c 3- <filename>

to remove those.

methyl · February 18, 2011, 6:28pm

What software is producing this file? Might be easier to fix at source?

Can you post say four sample lines with control codes visible. Just wondering if this not a proper unix text file.

sed -n l four_sample_lines.txt