How to remove page breaks from a flat file???

Hi All,

I get a flat file with its last field data splitting onto a new line.I got this program from Vgersh which when run would cancatenate the split data back to the end of the previous records.But this program fails when it encounters a page break between the split data and the previous record.So if these page breaks are removed,then the program works fine.

Program

#!/usr/bin/ksh

BEGIN {
  FS=OFS="|"

  FLD_max=11
  
  stderr="cat 2>&1" 
}
(fld + NF-1) > FLD_max {
       if (fld == FLD_max)
          print rec
       else
          printf("Incomplete record: [%d] :: [%s]\n", FNR, rec) | stderr
       rec=$0; fld=NF;next
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec $0 : $0; fld+=(NF-1);next }
{rec=$0; fld=NF}
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}

Input...

000000|Apr 14 2007 7:59:58:376AM| |ASDFASFSDA |000000|0|0|0|3111|SDFSDF|�PP:?��?
/there is a page break here(a kind of straight line shown in Ultra Edit,but not showing here.This needs to be removed/
���?K
000004|Apr 14 2007 7:59:58:790AM| |ASFASFAS|000000|0|0|0|111|DSFSDF|?e͢��c?
��?�d
000000|Apr 14 2007 7:59:59:970AM| |ASFAFASA |00000|0|0|0|1111|SFDSFSD|?��ק�R���RS?
00000|Apr 14 2007 8:00:01:693AM| |ASFSAFAS |000000|0|0|0|111SDFSDF|�h>`=a�?��N?��H
000000|Apr 14 2007 8:00:02:350AM| |ASFAFA|00000|0|0|0111|SDFSD1|?�
???������?
000000|Apr 14 2007 8:00:02:700AM| |ASFSAFASSA |00000|0|0|0|9964|SDFSD|3`
�"�:`��I�?9V?

Output:

000000|Apr 14 2007 7:59:58:376AM| |ASDFASFSDA |000000|0|0|0|3111|SDFSDF|�PP:?��?���?K
000004|Apr 14 2007 7:59:58:790AM| |ASFASFAS|000000|0|0|0|111|DSFSDF|?e͢��c?
��?�d000000|Apr 14 2007 7:59:59:970AM| |ASFAFASA |00000|0|0|0|1111|SFDSFSD|?��ק�R���RS?
00000|Apr 14 2007 8:00:01:693AM| |ASFSAFAS |000000|0|0|0|111SDFSDF|�h>`=a�?��N?��H
000000|Apr 14 2007 8:00:02:350AM| |ASFAFA|00000|0|0|0111|SDFSD1|?�???������?
000000|Apr 14 2007 8:00:02:700AM| |ASFSAFASSA |00000|0|0|0|9964|SDFSD|3`�"�:`��I�?9V?

Thanks
Kumar

If I understand correctly the requirement with GNU Awk (on Linux, for example) you could try something like this (if all the records start with 0):

awk '$1=$1' RS="\n0"  inputfile

The records doesnt start with 0.In order to mask the actual data,i just put some dummy values while maintaining the structure of the records.The record start with two numeric formats...like 100**** and 99****

Regards,
Kumar

So, what about (with GNU Awk):

awk '$1=$1{print $0 RT}' ORS= RS="\n(100|99)" inputfile
#!/usr/bin/nawk -f

BEGIN {
  FS=OFS="|"

  FLD_max=11

  FF=sprintf("\f")
  
  stderr="cat 2>&1" 
}
$0 ~ FF { gsub(FF, ""); $1=$1 }

(fld + NF-1) > FLD_max {
       if (fld == FLD_max)
          print rec
       else
          printf("Incomplete record: [%d] :: [%s]\n", FNR, rec) | stderr
       rec=$0; fld=NF;next
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec $0 : $0; fld+=(NF-1);next }
{rec=$0; fld=NF}
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}

vgersh99

You are an absolute genius,i feel.It works really great.Thank you so much.

Regards,
Kumar