convert fixed length file to CSV

Newbie

Looking for a script to convert my input file to delimited text file. Not familier with AWK or shell programing. Below is sample record in my input file and the expected output format. My OS is HPUX 11.23.

Thanks in advance for your assistance.

tbtbs

input file:
4002000W1ABCDABCD7821 123456789071001080600W1VUF 34216002902291LSN 1230-1630 +000000+000400+00000000+00009164+0
00400+0000916450125+00000000

4002000W1JKLMABCD6022 123456789071001080600W1VUF 34360001302894UOM +000000+000000+00000000+00014514+0
00000+0001451413125+00000000

4002000W1JKLMABCD6022 123456789071001080600W1VUF 34360001302791RFN123456 +000000+008000+00000000+00223280+0
08000+0022328050125+00000000

4002000W1JKLMABCD6022 123456789071001080600W1VUFC34360001303067UJN +000000+000000+00000000+00015213+0
00000+0001521313125+00000000

Output file should look like this:

4002000 W1ABCD ABCD 7821 123456789 071001 080600 W1VUF 34216002902291 LSN 1230-1630 +000000 +000400 +00000000 +00009164 +000400 +0000916450125 +00000000

4002000 W1JKLM ABCD 6022 123456789 071001 080600 W1VUF 34360001302894 UOM +000000 +000000 +00000000 +00014514 +000000 +0001451413125 +00000000

4002000 W1JKLM ABCD 6022 123456789 071001 080600 W1VUF 34360001302791 RFN 123456 +000000 +008000 +00000000 +00223280 +008000 +0022328050125 +00000000

4002000 W1JKLM ABCD 6022 123456789 071001 080600 W1VUFC 34360001303067 UJN +000000 +000000 +00000000 +00015213 +000000 +0001521313125 +00000000

It looks like each of the first three records will have slightly different rules.
(1) has 1230-1630 as field
(2) does not have this field
(3) creates this field by splitting off the previous field

rec (4) appears like rec (2)

Yes, that is correct the rules will be different. How do you set the delimiter(,) if there is nothing in the field(ie field 11 of record 2,3 and 4?

1230-1630 is a field. Basically REC1 is the master record format.

There are three possible record layouts. So, read the record.
if five fields separated by " " space characters, then type A
if four fields and field 3 is 17 characters, then type B
if four fields and field 3 is 23 characters, then type C (similar to type A)

Is this correct logic?

Can you use perl? If so, you can build a mask for each of the three fixed format records, then unpack() them and join() them using your delimiter (looks like a space, supposed to be comma in CSV??)....

$mycsv = join(" ", unpack("mask-goes-here"));

that is correct

Hi Quine

sorry Im not a programmer. No knowledge of Perl.

What follows is (a) input file, (b) working model for script, (c) execution of script. So, does it properly classify the records into one of the three type? If so, then all that is left is to break the lines into pieces to write out in a format.

> cat file1
4002000W1ABCDABCD7821 123456789071001080600W1VUF 34216002902291LSN 1230-1630 +000000+000400+00000000+00009164+000400+0000916450125+00000000
4002000W1JKLMABCD6022 123456789071001080600W1VUF 34360001302894UOM +000000+000000+00000000+00014514+000000+0001451413125+00000000
4002000W1JKLMABCD6022 123456789071001080600W1VUF 34360001302791RFN123456 +000000+008000+00000000+00223280+008000+0022328050125+00000000
4002000W1JKLMABCD6022 123456789071001080600W1VUFC34360001303067UJN +000000+000000+00000000+00015213+000000+0001521313125+00000000

> cat do_file
#! /usr/bin/bash

while read zf
  do
#  echo "$zf"
  fldcnt=$(echo "$zf" | cut -c50- | wc -w)
  fld3cnt=$(echo "$zf" | cut -c50- | cut -d" " -f1 | wc -c)
  flag=$(echo "$fldcnt""$fld3cnt")
  echo "$flag"
# now do the file formats
  if [ "$flag" = "318" ]
     then
     echo "a"
  fi
  if [ "$flag" = "218" ]
     then
     echo "b"
  fi
  if [ "$flag" = "224" ]
     then
     echo "c"
  fi

done <file1

> do_file
318
a
218
b
224
c
218
b

How big is the input file? It will have a little bearing on how to write the rest of the coding.

The following appears to work, although it might be slow for large record counts. Fields f08 and f21 had some little tweaks to make the work properly.

> cat do_file
#! /usr/bin/bash

while read zf
  do
#  echo "$zf"
  fcnt=$(echo "$zf" | cut -c50- | wc -w)
  f3cnt=$(echo "$zf" | cut -c50- | cut -d" " -f1 | wc -c)
  flag=$(echo "$fcnt""$f3cnt")
#  echo "$flag"
# common layouts
  f01=$(echo "$zf" | cut -c1-7)
  f02=$(echo "$zf" | cut -c8-13)
  f03=$(echo "$zf" | cut -c14-17)
  f04=$(echo "$zf" | cut -c18-21)
  f05=$(echo "$zf" | cut -c23-31)
  f06=$(echo "$zf" | cut -c32-37)
  f07=$(echo "$zf" | cut -c38-43)
  f08=$(echo "$zf" | cut -c44-49 | tr -d " ")
  f09=$(echo "$zf" | cut -c50-63)
  f10=$(echo "$zf" | cut -c64-66)

  fpre=$(echo "$f01" "$f02" "$f03" "$f04" "$f05" "$f06" "$f07" "$f08" "$f09" "$f10")

# now do the file formats
  if [ "$flag" = "318" ]
     then
     f21=$(echo "$zf" | cut -c68- | cut -d" " -f1)
     f22=$(echo "$zf" | cut -c68- | cut -d" " -f2)
     f31=$(echo "$f22" | cut -d"+" -f2)
     f32=$(echo "$f22" | cut -d"+" -f3)
     f33=$(echo "$f22" | cut -d"+" -f4)
     f34=$(echo "$f22" | cut -d"+" -f5)
     f35=$(echo "$f22" | cut -d"+" -f6)
     f36=$(echo "$f22" | cut -d"+" -f7)
     f37=$(echo "$f22" | cut -d"+" -f8)

     echo "a" "$fpre" "$f21" "+$f31" "+$f32" "+$f33" "+$f34" "+$f35" "+$f36" "+$f37"
  fi
  if [ "$flag" = "218" ]
     then
     f22=$(echo "$zf" | cut -c68- | cut -d" " -f2)
     f31=$(echo "$f22" | cut -d"+" -f2)
     f32=$(echo "$f22" | cut -d"+" -f3)
     f33=$(echo "$f22" | cut -d"+" -f4)
     f34=$(echo "$f22" | cut -d"+" -f5)
     f35=$(echo "$f22" | cut -d"+" -f6)
     f36=$(echo "$f22" | cut -d"+" -f7)
     f37=$(echo "$f22" | cut -d"+" -f8)
     echo "b" "$fpre" "+$f31" "+$f32" "+$f33" "+$f34" "+$f35" "+$f36" "+$f37"
  fi
  if [ "$flag" = "224" ]
     then
     f21=$(echo "$zf" | cut -c67-72)
     f22=$(echo "$zf" | cut -c68- | cut -d" " -f2)
     f31=$(echo "$f22" | cut -d"+" -f2)
     f32=$(echo "$f22" | cut -d"+" -f3)
     f33=$(echo "$f22" | cut -d"+" -f4)
     f34=$(echo "$f22" | cut -d"+" -f5)
     f35=$(echo "$f22" | cut -d"+" -f6)
     f36=$(echo "$f22" | cut -d"+" -f7)
     f37=$(echo "$f22" | cut -d"+" -f8)
     echo "c" "$fpre" "$f21" "+$f31" "+$f32" "+$f33" "+$f34" "+$f35" "+$f36" "+$f37"
  fi
# output data
done <file1

and the output

> do_file
4002000 W1ABCD ABCD 7821 123456789 071001 080600 W1VUF 34216002902291 LSN 1230-1630 +000000 +000400 +00000000 +00009164 +000400 +0000916450125 +00000000
4002000 W1JKLM ABCD 6022 123456789 071001 080600 W1VUF 34360001302894 UOM +000000 +000000 +00000000 +00014514 +000000 +0001451413125 +00000000
4002000 W1JKLM ABCD 6022 123456789 071001 080600 W1VUF 34360001302791 RFN 123456 +000000 +008000 +00000000 +00223280 +008000 +0022328050125 +00000000
4002000 W1JKLM ABCD 6022 123456789 071001 080600 W1VUFC 34360001303067 UJN +000000 +000000 +00000000 +00015213 +000000 +0001521313125 +00000000

Joeyg

The output is exactly what im looking for. How do I run this?

If you copy/paste that do_file script to unix, do the chmod +x to make executable. It can be run.
The only other change might be in the filename, as near the very end I have a "<file1" -- substitute your filename for file1.

joejy

I ran the script. Im getting an error message as follows: Fork Function failed, not enough memory. Any suggestion would be greatly appreciated.