Help to edit a large file

jxh461 · May 1, 2003, 5:38pm

I am trying to edit a file that has 33k+ records. In this file I need to edit each record that has a 'Y' in the 107th position and change the 10 fields before the 'Y' to blanks. Not all records have a 'Y' in the 107th field.

ex:

.....................................2222222222Y..................

to

..................................... Y..................

Thanks in advance

oombera · May 1, 2003, 10:33pm

What kind of file is this?

sed 's/..........Y/          Y/g' someFile > TMP_00
mv TMP_00 someFile

will do it except that it doesn't check for the 107th position.. could there be a Y at other places on each line too, or just in the 107th position?

oombera · May 1, 2003, 11:01pm

I figured out a solution..

while read LINE
do
  theTest=`echo $LINE | awk '{print substr ($0, 107, 1)}'`
  if [[ $theTest == "Y" ]]; then
    echo $LINE | awk '{print substr ($0, 1, 96) "          Y" substr ($0, 108, length ($0) - 107)}' >> newFile
  else
    echo $LINE >> newFile
  fi
done < someFile

mv newFile someFile

criglerj · May 2, 2003, 8:23am

Perl's extensions to the RE vocabulary are helpful here (WARNING: untested code[/b):

perl -pe '/^(.{96}).{10}(Y.*)/ && do { $_ = $1 . " " x 10 . $2 }'

This takes adavntage of the -p option, which prints $_ after processing each record.

Or you can combine oombera's separate invocations of awk into one:

awk 'substr($0,107,1) == "Y" { $0 = substr($0,1,96) "          " substr($0,107,length($0) - 107) }
{print}' someFile > newFile
mv newFile someFile

Awk is still record length limited AFIAK, so if your records are [b]very long (> 8K ?), you may not be able to read entire lines (I was bitten by this earlier this week).

criglerj · May 2, 2003, 8:26am

In oombera's solution, each place you see $LINE (lines 3, 5, 7) you should probably change it to "$LINE" since otherwise, consecutive blanks in your records will change the records in an unwanted way.

oombera · May 2, 2003, 9:36am

criglerj, that's cool - i can't believe the code can be that shortened! And I should've remembered the quotes.. they protect spaces, but I suppose potentially other special characters too..

Just a couple typos..

"          " should be "          Y"
substr($0,107,len... should be substr($0,108,len...

criglerj · May 8, 2003, 10:36pm

I think I got it right the first time. When the OP wrote "edit each record that has a 'Y' in the 107th position", I guessed the positions started from one, which is the way awk does it. But I stand by my blank string: You want to leave the Y where it is, i.e., you keep characters 1..96, change 97..106 to blanks, then keep 107..end. Char 107 is the "Y" which is left intact. But if the OP's count starts from zero, then all the positions get adjusted by one.

oombera · May 9, 2003, 9:55am

I should've tried the code first .. it works both ways - the way you posted it and with my "corrections".. either you detect the Y and replace the 10 characters before with blanks or detect the Y and replace the 10 characters plus the Y with 10 blanks and a Y..

jxh461 · May 19, 2003, 4:38pm

Gentlemen & GentleLadies

Thanks for your responses. You guys are awesome.