Deleting Characters at specific position in a line if the line is certain length

Cailet · December 15, 2008, 3:55pm

I've got a file that would have lines similar to:
12345678 x.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00
23456781 x.00 xx.00 xx.00 xx.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00
34567812 x.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00
45678123 x.00 xx.00 xx.00 xx.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00 xx.00 xx.00

I'm looking to basically do:

if [ current line > 1208 characters ]
delete characters at position 102-106
delete characters at position 1208-1214
copy modified line to new file
else
copy line as is to new file

so the new file would be:
12345678 x.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00
23456781 x.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00
34567812 x.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00
45678123 x.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00

all lines will start with an 8 digit key
we don't know what the characters we are deleting will be (so no pattern searches)

Right now I've been looking at doing a while read inputline loop and checking the character length of each line with the wc -m command. If -gt 1208 perform corrections and copy to new file.

I wasn't sure if there was an easier way to do this -- and I haven't actually been able to figure out the best way to make the corrections to each individual line.

Any ideas?
Thanks!

vgersh99 · December 15, 2008, 4:11pm

nawk '
{
   if (length <= 1280) 
     print
   else
     print substr($0, 1, 102) substr($0,106, 1208 -106) substr($0, 1214)
}
' myFile > myNewFile

Cailet · December 16, 2008, 2:01pm

Thanks vgersh99!

I had to change nawk to awk (it was telling me nawk not found) and that seemed to do the trick

Cailet · December 16, 2008, 2:11pm

Ok, just kidding. I don't know what happened the first time, but now it is no longer working. Is this maybe a problem with trying to switch from nawk to awk?

vgersh99 · December 16, 2008, 2:14pm

maybe...
pls define 'no longer working'. what's happenin'?
do you have 'gawk' available (maybe under /usr/local/bin)?
What OS?

Cailet · December 16, 2008, 2:41pm

I thought it had worked, but I must have had my line numbers mixed up when I checked it.

Right now I have this:

    awk '{if (length($0) <= 1280) 
            print
          else
            print substr($0, 1, 602) substr($0,610, 1208 -610) substr($0, 1214)
          }' $oldFile > $newFile

It is printing to the new file fine until it gets to the first line which needs to be changed. Then it stops (no error messages).

I did a which gawk command and it says no gawk found

I'm doing some more research myself to see if I can figure it out

OS is Windows XP pro

vgersh99 · December 16, 2008, 3:16pm

Windows XP, eh?

try putting the 'body' of the code code in a separate file - cailet.awk:

cailet.awk:

{
   if (length($0) <= 1280) 
            print
   else
            print substr($0, 1, 602) substr($0,610, 1208 -610) substr($0, 1214)
}

and execute it like so:

awk -f cailet.awk $oldFile > $newFile

Cailet · December 16, 2008, 3:37pm

Ok - tried that. Now it tells me cannot have more than 199 fields (referencing the first line that needs correction)

vgersh99 · December 16, 2008, 3:41pm

ok, it's the limitation of your version of awk.
lets try to fool it by setting the field separator to something would not appear in the data file - no guarantee that it would not like something else:

awk -F'^' -f cailet.awk $oldFile > $newFile

Cailet · December 16, 2008, 3:50pm

deleted

Cailet · December 17, 2008, 1:41pm

Alrighty-
so no matter what I tried it kept giving me the "can't have more than 199 fields" error. (Looks like the file has 202 - arg)

I think I figured out how to get rid of the extra characters (still need to verify that the invalid lines will have exactly the same data as the test file) with a sed command

now I just need to figure out how to either separate the invalid lines into a different file (to use sed)