How to replace characters 7 through 14 of every line in a file

jakSun8 · December 12, 2007, 2:40am

Hi all,

I have a file with multiple lines. I want to replace characters 7 through 14 of every line with 0000000

Input:
12345678901234567890
23456789012345678901

Output
12345600000004567890
23456700000005678901

Please help.

JaK

rikxik · December 12, 2007, 2:58am

awk '{print substr($0,1,6)"0000000"substr($0,14,10)}' infile

jakSun8 · December 12, 2007, 3:27am

Thanks rikxik, it works but when i run the same command on record which has lenght of 6656 bytes .. it says "record too long" .. is there some limitation on awk? How do i get around it?

Please help
JaK

ghostdog74 · December 12, 2007, 3:34am

in bash

# while read line; do echo ${line//${line:7:7}/0000000}; done < file

or

# for line in `cat file`; do echo ${line//${line:7:7}/0000000}; done

bakunin · December 12, 2007, 3:50am

sed 's/$.\{7\}$.\{7\}/\10000000/' oldfile > newfile

A lot faster than calling substr() two times in awk.

If your lines are too long to handle you could cut them before making the exchange and then put them together again:

typeset line=""
typeset start=""
typeset end=""

cat oldfile | while read line ; do
     start="(print - "$line" | cut -c1-14)"
     end="$(print - "$line" | cut -c15-)"
     start="${start%%???????}0000000"
     print - "${start}${end}" >> newfile
done

bakunin

jakSun8 · December 12, 2007, 4:05am

Thanks guys. To get around "too long" issue, I just used nawk instead and it worked.

Great to see people helping others!
JaK

rikxik · December 12, 2007, 5:18am

Consider this:

@bakunin

sed definitely takes the cake (as long as no too long line issues):

But wrongly produces lines like this - nothing which cannot be fixed:

Instead of this:

However, speaking of speed, the bash script you posted is as fast as a Snail:

And it produced incorrect output too.

@ghostdog74
This wasn't very fast either:

Mine wasn't too bad - not as fast as the sed version:

Nothing against anyone - just fair comparison.

HTH

bakunin · December 12, 2007, 6:25am

True. To be honest i had a "comprehension error" and thought OP wanted to change character 8-15 instead of 7-14 - my fault entirely. As you say this can easily be corrected. Of course the ksh-script was based on this faulty assumption too, which explains its wrong output. It could also be easily corrected.

True - it was meant as a fallback solution in case sed wouldn't do the job and written not to be as fast as possible, but to be as easy to understand as possible.

Thank you for taking the time to compare the different solutions, though, it justs points out my mantra: use sed if you can and awk, if you must for maximum performance.

bakunin

ghostdog74 · December 12, 2007, 6:46am

@rikxik:
while loops in shell are almost always slower than tools like awk/sed. the "looping" construct of awk/sed is already "inside" the language. In shell, you have to "explicitly" code the loop.

rikxik · December 13, 2007, 1:13am

@ghostdog74

Correct - looping is always slow. One more thing, I believe sed (being the stream editor) operates on streams with newline differentiating lines - so it is not exactly "looping" - I suppose thats why sed aces awk. I'm not very sure about the internal mechanism used by awk.