String Processing

bthomas · February 19, 2008, 9:29am

I had a file with 150k records in it and I ran a tr on it to remove all new lines/CR which created one large record(whoops). Is there a way to add a \n after every 39th character without using 'dd' to turn it back into the original format?

dd is way to slow.

shell is ksh.

drl · February 19, 2008, 10:13am

Hi.

Here is one way with perl:

#!/usr/bin/perl

# @(#) p1       Demonstrate read fixed-length records.
#           See Recipe 8.15 perl Cookbook, Ed 1.

use warnings;
use strict;

my ($debug);
$debug = 0;
$debug = 1;

my ($lines) = 0;
my ( $f, $file );
my ($record);
my ($RECORDSIZE) = 10;

$f = shift || die " Need a file name.\n";
open( $file, "<", $f ) || die " Cannot open $f\n";
until ( eof($file) ) {
  $lines++;
  read( $file, $record, $RECORDSIZE ) == $RECORDSIZE
    or die " Short read on $f at line $lines\n";
  print $record, "\n";
}

print STDERR " ( Lines written: $lines )\n";

exit(0);

which can be called from a ksh script:

#!/bin/ksh -

# @(#) s1       Demonstrate perl reading fixed-length records.

echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version =o $(_eat $0 $1) tr perl

FILE=data1
tr -d '\n' <data0 >$FILE

echo
echo " Input file $FILE:"
cat $FILE

echo
echo
echo " Output from perl script:"
./p1 data1

exit 0

Producing:

% ./s1
(Versions displayed with local utility "version")
Linux 2.6.11-x1
pdksh 5.2.14 99/07/13.2
tr (coreutils) 5.2.1
perl 5.8.4

 Input file data1:
a234567890b234567890c234567890d234567890e234567890f234567890g234567890h234567890i234567890j234567890

 Output from perl script:
a234567890
b234567890
c234567890
d234567890
e234567890
f234567890
g234567890
h234567890
i234567890
j234567890
 ( Lines written: 10 )

Change the record length to fit your situation ... cheers, drl