Convert fixed value fields to comma separated values

Hi All,

Hope you are doing Great!!!.

Today i have came up with a problem to say exactly it was for performance improvement.

I have written code in perl as a solution for this to cut in specific range, but it is taking time to run for files thousands of lines so i am expecting

a sed command kind of thing to make it run quickly.

input format:

--> TP  ID: TEST TP                        XLATE KEY:   ANSIXX99  AL:   D
      INT ID: TESTREFORMAT            XLATE TABLE: X820XR99
      DOC ID: 820    DIR: I STD: ANSI     COM: X      VERS: NONCTX       STAT: P

--> TP  ID: TEST TP                             XLATE KEY:   ANSIXX41  AL:   D
      INT ID: TESTREFORMAT                        XLATE TABLE: X820XR99
      DOC ID: 820    DIR: I STD: ANSI     COM: X      VERS: 004010       STAT: P

--> TP  ID: TEST TP                             XLATE KEY:   XXXXXXXX   AL:   D
     INT ID: TESTREFORMAT                        XLATE TABLE: XXXXXXXX
     DOC ID: 820    DIR: I STD: ANSI     COM: X      VERS: 004010       STAT: T

output format required:

TEST TP,ANSIXX99,D,TESTREFORMAT,X820XR99,820,I,ANSI,X,NONCTX,P
TEST TP,ANSIXX41,D,TESTREFORMAT,X820XR99,820,I,ANSI,X,004010,P
TEST TP,XXXXXXXX,D,TESTREFORMAT,XXXXXXXX,820,I,ANSI,X,004010,T

I have a file with input format given as example in above.

Rules for input format:

  • The "--> TP ID:" will repeat for every three lines.
    All the values after ":" were variable in length with in length given to it but all the values before ":" was fixed in length.

So our motivation was to make the values after ":" to be like in output format with comma separated value.

I am using AIX V6.0 OS. SED command as solution will be a preferable .

Thanks.

---------- Post updated at 11:03 PM ---------- Previous update was at 11:01 PM ----------

I have pasted actual data but i could see a empty line in between every 3 lines but in actual data there will not be any empty lines in input file.

It looks like sed could be useful here. This is how you could start your control file (file name after sed -f).

/-->/ {  N
         N
         ....................
         s/  */,/g
       }

From here you have all three input records in your pattern space, separated by "\n" character. A set of "s" commands can take out the "boiler plate" constants and the new-line characters. The last "s" will make it ready to output, there are two spaces in front of the asterisk.

Your problem is - at least to me - unsolvable. You have one or multiple spaces as field separators, plus one or more spaces in the field values themselves. So you can't reliably and consistently tell values from labels etc. Anything proposed would be quite hazardous...

This is as far as I can get:

awk '
        {getline X
         getline Y
         $0 = $0 FS X FS Y
         sub (/^--> TP *ID: /, _)
         FS = "|"
         gsub (/  +/, FS)

         $1 = $1
         for (i=1; i<=NF; i++)  sub (/^.*: */, _, $i)
         gsub (/,,/, ",")
         print
        }
' OFS=, file
TEST TP,ANSIXX99,D,TESTREFORMAT,X820XR99,820,ANSI,X,NONCTX,P
TEST TP,ANSIXX41,D,TESTREFORMAT,X820XR99,820,ANSI,X,004010,P
TEST TP,XXXXXXXX,D,TESTREFORMAT,XXXXXXXX,820,ANSI,X,004010,T

I can't isolate the DIR: I stuff. . . and, as said before, I wouldn't rely on this proposal .

1 Like

RudiC thanks for reply but i strongly believe that this is solvable since field labels (The values in left side of :slight_smile: are in fixed position. What do you think about this since this data is in pattern?

If we go to post #1 in this thread and look at the XLATE KEY , XLATE TABLE , and COM fields in your sample input, it is immediately obvious that your fields are not fixed width (i.e., they do not appear at the same locations in each record).

Since your fields are not fixed width, since you have not defined what fields are present in general (as opposed to three record sample including blank lines that you say are not present in your real data), and since we have no idea what fields will be present or where they will be located in your real data; there is little we can do to help you solve your problem.

Hi Don,
The the xlate key, xlate table, com were not on the same position in 3 records but they repeat in the same position for every 3 records. The sample which i posted here is for demo data. I attached the sample data input file. Please let me know your thoughts after seeing this.

Hi RudiC,

Thanks for bringing this awk command, but in my project we are not using awk any more because we are getting issues when we upgrade perl version.

Let's try this for a sed command file. You will need to specify -n before the name of your sed command file.

/-->/ {  N
         N
         s/  *//
         s/--//
         s/>//
         s/TP  ID: //
         s/ /~/
         s/  *XLATE KEY:  */ /
         s/\n/ /g
         s/  *AL:  */ /
         s/  *INT ID:  */ /
         s/  *XLATE TABLE:  */ /
         s/  *DOC ID:  */ /
         s/  *DIR:  */ /
         s/  *STD:  */ /
         s/  *COM:  */ /
         s/  *VERS:  */ /
         s/  *STAT:  */ /
         s/  */,/g
         s/~/ /
         s/^/ /
         p
       }
1 Like

I will try this solution today and let you know the result.
Thanks.