Parse

Does anybody know how do we parse a file (ex. SIF file) into a delimited text file in UNIX?

What's a SIF file?

Can you describe the layout of what it is now versus what you want ti to look like?

It is actually called, system integration file. In unix, we have SIF files that we receive from other systems.

The format looks like this...,

XYZHEADER 20020503

AAAAAAAABBBBBBBBBBBBCCCCCCCCCDDDDDDDDDDDDDEEEEEEEFFFFFFFFFFFFFFFGGGGGGGGGHHHHHHH

AAAAAAAABBBBBBBBBBBBCCCCCCCCCDDDDDDDDDDDDDEEEEEEEFFFFFFFFFFFFFFFGGGGGGGGGHHHHHHH

AAAAAAAABBBBBBBBBBBBCCCCCCCCCDDDDDDDDDDDDDEEEEEEEFFFFFFFFFFFFFFFGGGGGGGGGHHHHHHH

XYZTRAILER 0000003000.

It has a header record with the date, trailer has number of records in the file and the actual records having fixed length fileds (as I have shown above, A is one filed, B is another filed). Now the task is to divide each multiline row with a delimiter (anything a , or : or tab or spaces) and convert that into a text file.

like this...,

AAAAAAAA, BBBBBBBBBBBB, CCCCCCCCC, DDDDDDDDDDDD.

Then we want to upload this text file into tables and from there we want to develop a front end screen showing label for each filed and displaying the corresponding value for it...

As soon as we get the delimited text file, the other part will be done easily.

I know we could use so many cut,grep and awk... but I need something simple.

Using awk IS the something simple!

The following awk script parses a file called "data.txt" which contains the data from your example. To modify the number and/or size of the fields just change the numbers in the "fieldCount = split ..." line. To change the field delimiter change the SECOND setting for fieldSep.

awk '
    BEGIN {
        fieldCount = split ("8,12,9,13,7,15,9,7", fieldWidth, ",")
        lineLength=0

        for (i=1; i<=fieldCount; i++)
            lineLength += fieldWidth
    }
    (length ($0) == lineLength) {
        fieldSep=""
        startPos=1

        for (i=1; i<=fieldCount; i++) {
            printf "%s%s", fieldSep, substr ($0, startPos, fieldWidth)
            startPos += fieldWidth
            fieldSep = ", "
        }

        printf "\n"
    }
' data.txt

Note the following assumptions:

1) each data line is assumed to be EXACTLY the number of characters required in length

2) the number of records is correct; no checks are done against the the header or trailer

3) each line is short enough to be processed by awk!

Thank you very much for your concern. Really appreciate it.

Hi Kemisola,
I tried your script. It is not giving any errors and at the same time I could not see the output. I tried redirecting the output to another file and it did not work either. Where and how can I see the result file?
I tried the following command to see if the print statement is working or not.
awk '{printf "%s%s", ",", substr ($0, 1, 8)'} data.txt
and it is workign fine.
I did not understand why it is not printing anything when the statement is inside the script...

Is it because the (length ($0) == lineLength) condition ????

Yes, that would be it. The script will ignore all lines that are not the exact length it is expecting, which would be the sum of all the column widths as shown in the split command. In this case, each line would have to be exactly 80 characters. If you have even one trailing space on a line, that would cause the line to be ignored.

Depending on your requirements, there are options available ...

If there is possibility of trailing spaces, the script could chop those off before checking the line length.

Instead of requiring exact line length, it could require a minimum line length, and ignore any excess.

The script could easily output two files: processed lines and non-processed lines/ignored excess.

To see your line lengths, do:

awk '{print length($0)}' data.txt

try using perl