Can we convert a '|' file into a fixed lenght???

Hi All,

I have a pipe separated flat file.But there is often some problem with the records.So is it possible to convert the '|' separated file into a fixed length file by means of some script.

The file has 11 columns which means 10 pipes.Your help is appreciated.

i'm using Sun OS Version 5.10

Thank you,
Kumar

You can do something like that :

awk '
   BEGIN {
      fields_count = split("5,5,5,5,5,5,5,5,5,5,5", fsize, ",");
      FS  = "|"
      OFS = "";
   }
   function cnv_field(fld   ) {
      if (length($fld) > fsize[fld]) {
         printf("Line %d, field %d is too long (%d > %d)\n", NR, fld, length($fld), fsize[fld]) | "cat >&2";
         status = 1;
      }
      $fld = sprintf("%-5.5s", $fld);
   }
   {
      if (NF != fields_count) { 
         printf("Line %d, fields count is invalid (%d != %d)\n", NR, NF, fields_count) | "cat >&2";
         status = 1;
      }
      for (f=1; f<=NF; f++) cnv_field(f);
      print;
   }
   END {
      exit status;
   }
    '  $1 > $2

The length of each field is specified in the fields_count assignment. In my code all the fields are 5 characters.
In the output, the field separator is set to "" but you can modify it. For example if you want a space modify the OFS assignment :
OFS = " "
Example (assume script file is convert.sh) :

$ cat input_file
111|22|333|444|555||77|888|9999|000|1
aa|bbbbbb|cc|dd|rr|ff|ggggggg|hh|ii|jjj|hhh
xxx|yyy|zzz
$ convert.sh input_file output_file
Line 2, field 2 is too long (6 > 5)
Line 2, field 7 is too long (7 > 5)
Line 3, fields count is invalid (3 != 11)
$ echo $?
1
$ cat output_file
111  22   333  444  555       77   888  9999 000  1    
aa   bbbbbcc   dd   rr   ff   ggggghh   ii   jjj  hhh  
xxx  yyy  zzz  
$

Jean-Pierre.

As the "fields" in your file are separated by a constant char ("|") use cut to separate them, then print the lines via printf (i assume Kornshell here, use 'echo' instead of 'print' if you are using something else):

cat infile | while read line ; do
     # split each input line to fields and catch these in variables
     field1="$(print - "$line" | cut -d'|' -f1)"
     field2="$(print - "$line" | cut -d'|' -f2)"
     field3="$(print - "$line" | cut -d'|' -f3)"
     .....
     
     # after you are done with the line print it out again
     # i assume here that the first column should be 20 chars wide, the next
     # two 15, and so on. see the second example below.
     printf '%20s %15s %15s [...]\n' "$field1" "field2" "$field3" [...] >> outfile
done

This is using (a fixed number of) fixed-width columns and you have to know the widths in advance. It is possible to create dynamically formatted columns but you will have to read the infile two times:

maxlength1=0
maxlength2=0
....
cat infile | while read line ; do
     # in the first run we split and get the max width for each column
     field1="$(print - "$line" | cut -d'|' -f1)"
     length1=$(print - "$field1" | wc -c)
     if [ $length1 -gt $maxlength1 ] ; then
          maxlength1=$length1
     fi
     field2="$(print - "$line" | cut -d'|' -f2)"
     length2=$(print - "$field2" | wc -c)
     if [ $length2 -gt $maxlength2 ] ; then
          maxlength2=$length1
     fi
     .....
done

# put together the output template for printf
template='%'"$maxlength1"'s   %"'$maxlength2"'s [.....]\n'
   
cat infile | while read line ; do
     # in the second run we split again and print using the found widths
     field1="$(print - "$line" | cut -d'|' -f1)"
     field2="$(print - "$line" | cut -d'|' -f2)"
     ....
     printf "$template" "$field1" "field2" "$field3" [...] >> outfile
done

I'd suggest you use (dynamical) arrays instead the numbered variables to make the script able to deal with a variable number of fields in the input file as a further enhancement. The column separator could then be provided as a parameter making the script as widely usable as possible.

bakunin