Help fix my garbage - File Split Program in Perl

mkastin · June 26, 2009, 3:29pm

Hi,

I have the following, it doesn't work and I know it's crap code.

The objective is to split a file with a givin number of codes such as:

01,02,03,...,99

Then return all records with each seperate identifier in a new file.

The files being split have lrecl=500, recfm=F, and I am using two fields in these files, one at byte 377 with length of 2 (this is the segment identifier) and a second at byte 393 with length of 25 (this is the keycode for the segment) which is in general shorted then 25 charaters and needs to being included in the output filename.

Here is my garbage:

 
#!/usr/bin/perl
 
($ProgName = $0) =~ s%.*/%%;
print STDOUT "Enter the number of segments to split\n";
$splitnum = <STDIN>;
chomp($splitnum);
if ($#ARGV>0) { print STDERR " *** One file at a time, please!\n", exit 1; }
if ($#ARGV<0) { print STDERR " *** Sorry, need a filename!\n", exit 1; }
print STDOUT "\n$ProgName: Beginning $splitnum way split on $ARGV[0]\n\n";
open INFILE, "<$ARGV[0]" or die "\n$ProgName: *** Can't open data file '$ARGV[0]': $!\n\n";
while (<INFILE>)
 {
  $counter++;
  if ($counter%100000 == 0) { print "$counter\n"; }
  my $seg = substr($_,376,2);
  my $key = substr($_,392,25);
  chop($key);
  $a = 1;
  while($i<=$splitnum)
   {
    $oflename = "OFLE$key";
    open $oflename, ">$ARGV[0].$key.dat" or die "\n$ProgName: *** Can't open data file '$ARGV[0].$key.dat'";
    if ($a <~ m/10/i) { $b="0$a"; if ($seg =~ m/$b/i) { print $oflename; } }
    if ($a >=~ m/10/i) { if ($seg =~ m/$a/i) { print $oflename; } }
    $a++
   }
 }

I appreciate the help and am also open to any other suggestions to use other shell scripts, I was thinking awk could do this easily, but don't know that well either.

Thanks!

jim_mcnamara · June 26, 2009, 4:58pm

This is not a perl answer, but you do know that csplit will do this for you.

Try man csplit.

If this is not a practical app you are developing, but programming learning, go for it.

JerryHone · June 26, 2009, 5:09pm

By far the easiest way to find out why Perl code isn't working the way you think it should is to use the Perl debugger.

mkastin · June 26, 2009, 5:29pm

Thanks for your input, I'm not familiar with csplit and after reading the manual I'm still not certain how to utilize it to do what I need without passing the command multiple times which I would rather not do especially when some file can have up to 99 segments and over 100million lines. I was looking at awk and I figured out something but it is also not working 100% how I'd like

awk '!/^$/{
key=substr($0,393,25)
print $0 > key".dat"
}' testmail.dat

Here I'm trying to do the same as mentioned above and the splitting works, but I can't get the output nameing convention the way I want. I need the output filename formatted as 'input filename'.$key.'dat' and I need the $key variable trimmed on extranious white spaces after the actual value ends as it is not always 25 characters long. Also how can I turn this into a self contained script so I can use it as a single command such as $ awksplit testmail.dat

Thanks again!!!

drl · June 26, 2009, 11:44pm

Hi.

Welcome to the forum.

Using phrases like it doesn't work gives us very little information. If it worked, you would probably not be here, so to say it is not working just is not helpful.

However, I (and I suspect others) would not be likely to wade through your code. Asking specific questions is is the way to get the best help here and in most other forums.

If I needed to work with fixed-length data, I would use perl's read function instead of the <> syntax:

       read FILEHANDLE,SCALAR,LENGTH,OFFSET
       read FILEHANDLE,SCALAR,LENGTH
               Attempts to read LENGTH characters of data into variable SCALAR
               from the specified FILEHANDLE.  Returns the number of
               characters actually read, 0 at end of file, or undef if there
               was an error (in the latter case $! is also set). 

-- excerpt from perldoc -f read

On the other hand, if awk works for you, great.

When I am tackling a problem, I usually create a small data set for testing. I would also make sure it was small enough to post in a forum, because posting sample data and expected output is is a good way to attract answers. In your case, I would not post the long records, but I would make up a dataset with lrecl=50 or so, with perhaps 10 records or whatever is a representative sample.

Best wishes ... cheers, drl