Line too long error Replace string with new line line character

ducati · May 16, 2011, 10:16am

I get a file which has all its content in a single row.
The file contains xml data containing 3000 records, but all in a single row, making it difficult for Unix to Process the file.

I decided to insert a new line character at all occurrences of a particular string in this file (say replacing "<record>" with "\n<record>", so that the file has multiple rows without impacting the xml data in it.

I have tried sed, awk and perl commands but probably these commands aren't processing the file with too long line in it.
I cannot 'fold' the file as it breaks the file at a fixed width, disrupting the xml tags and xml data.

How can I edit this file.

hergp · May 16, 2011, 11:09am

Try GNU's version of sed. It seems to have no line length limit: sed, a stream editor. You can download GNU sed from sunfreeware.com.

Skrynesaver · May 17, 2011, 3:15am

Are you sure you tried a Perl script, the following seems to work (Tested on Solaris 5.10 sparc with Perl-5.8.4)

skrynesaver@busybox ~/tmp$ perl -w -Mstrict -e 'open my $long , ">" , "longLines.txt"; for (0..30000){print $long "<record><field1>data</field1><field2>other data </field2></record>";}'
skrynesaver@busybox ~/tmp$ wc  longLines.txt
      0   60003 1980066 longLines.txt
skrynesaver@busybox ~/tmp$ perl -w -Mstrict -e 'open(my $long, "<", "longLines.txt") || die "Could not open longLines.txt $!\n";my $data=readline($long);close $long;$data=~s/<record>/\n<record>/g;open my $short, ">", "shortLines";print $short $data;close $short'
skrynesaver@busybox ~/tmp$  wc  shortLines
  30001   90003 2010067 shortLines

ctsgnb · May 17, 2011, 3:47am

1) What is your file size ?
2) Is you sed/awk implementation version compiled for 64bit or 32bit plateform ?
(What gives the command file <yourfile> )
3) Which sed command did you try ?

Did you try something like :

cat infile | sed 's/</#</g' | tr '#' '\n'

(this UUOC is just for test purpose to see if sed can better handle it as a stream than as a file)

(or choose another character than the hash # , choose one so that it doesn't appear in your original file)

Also give a try to:

tr '<' '\n' <infile | sed '1!s/^/</'

ducati · May 17, 2011, 6:54am

The perl script seems to be doing the trick. Thanks Skrynesaver