split based on the number of characters

Hello,

if i have file like this:
010000890306932455804 05306977653873 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC30693599000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011302942311 010000890306946317387 05306977313623 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202010300391748 010000890306945153336 05306977918990 0520080417010521ISMS SMT ZZZZZZZZZZZZZOC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011304607230 010000890306948068406 05306977404213 0520080417010523ISMS SMT ZZZZZZZZZZZZZOC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202010000717971 010000890306998573372

How can i perform a split based on the number of characters?
Foa example i want in array[0] to be stored the 70 first characters of the file and in array[1] the next 70 charactets etc...

How can i do this?

A 70 character split can be done as:

sed -e 's/.\{69\}/&\n/g' <file>

//Jadu

Thank you...

What about if i want to perform the split in perl always using 'size' as a limit

I know this isn't exactly what you wanted, but this might come in handy -

split -b 60 filename.txt

Would split a file into multiple 60 byte (character) text files.

(It returns files in the format of xaa, xab, xac, xad, etc, each file having the specified number of bytes)

Here is a small sample to give you an idea.

 
#!/usr/bin/perl
$teststring = "1234567890abcdefghij0987654321ABCDEFGHIJlmnop";
@chunks = split /(.{10})/, $teststring;
foreach (@chunks) {
  printf "%s\n", $_;
}
 

In this case, I'm using 10 characters as the size of the pieces to extract. The pattern used with split is for the delimiter/separator. Here, we say match any 10 characters as the separator. If it matches every 10 characters as a separator, then it is returning null strings for the split fields. Normally, the separator is not returned but we want the separator because these will be the actual values of interest. The parentheses that are included in the pattern tell perl to also return the separators.

If you execute this, you get the following:
Hostname:> testscript3.sh

1234567890

abcdefghij

0987654321

ABCDEFGHIJ
lmnop
Hostname:>

This is because you have null strings interspersed with the separators. There is no null string before the last 5-character substring because we did not have a full 10 characters to match.

I'll leave it for you as an exercise to remove the null strings or otherwise decide how you will skip/ignore them. How exactly you end up incorporating this into your code will also be dependent on your data file. From your description, I could not tell if records spanned lines or not.

Hi.

Here is another method:

#!/usr/bin/perl

# @(#) p2       Demonstrate perl unpack to break apart a long line.

use warnings;
use strict;

my ($debug);
$debug = 0;
$debug = 1;

my ( @a, $i, $nc, $nv );
my ($lines) = 0;

while (<>) {
  $lines++;
  chomp;
  @a  = unpack( "(a70)*", $_ );
  $nc = length($_);
  $nv = scalar(@a);
  print " Unpacked $nv strings from line $lines (length $nc characters)\n";
  for ( $i = 0; $i < $nv; $i++ ) {
    print "$i: $a[$i]\n";
  }
}

print STDERR " ( Lines read: $lines )\n";

exit(0);

Producing:

% ./p2 data1
 Unpacked 9 strings from line 1 (length 572 characters)
0: 010000890306932455804 05306977653873 0520080417010520ISMS SMT ZZZZZZZZ
1: ZZZZZOC30693599000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011302942311 010
2: 000890306946317387 05306977313623 0520080417010520ISMS SMT ZZZZZZZZZZZ
3: ZZOC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202010300391748 01000
4: 0890306945153336 05306977918990 0520080417010521ISMS SMT ZZZZZZZZZZZZZ
5: OC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011304607230 0100008
6: 90306948068406 05306977404213 0520080417010523ISMS SMT ZZZZZZZZZZZZZOC
7: 306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202010000717971 010000890
8: 306998573372
 ( Lines read: 1 )

For your data in file data1:

% wc data1
  1  29 573 data1

Eliminating the newline, 70 * 8 -> 560, + 12 => 572 ... cheers, drl

Using fold:

fold -b70 input
1 Like