EDI File Parser

I've one EDI file which is to be parsed into 7 different file.

I managed to extract required segments for a file(HEADER) to a separate file(sample3.dat) and is given below.

$ cat sample3.dat
REF*EI*273543997~
REF*2U*HELLO~
REF*G2*77685|132~
CLM*1000*0.00***12>B>1*N*A*Y*I~
CN1*05~
SBR*P*18*HELLO******16~
AMT*D*0.00~
OI***Y***I~
NM1*IL*1*ABC*DEF*A***MI*1234A~
DTP*573*D8*99991231~
CLM*1001*0.00***12>B>1*N*A*Y*I~
CN1*05~
REF*F8*1000~
SBR*P*18*HELLO******16~
AMT*D*0.00~
OI***Y***I~
NM1*IL*1*ABC*DEF*A***MI*1234A~
DTP*573*D8*99991231~

Expected output is as below

1000||||1234A|1234A|0.00|99991231|0.00|||||HELLO||12|1|||||||||||||05|HELLO|
1001||||1234A|1234A|0.00|99991231|0.00|||||HELLO||12|1|||||||||||1000||05|HELLO|F8

Second CLM segment is the child claim of first CLM and REF segment is available under that. I wrote the below script to parse the file.
Commented portion of the script below throws syntax error.

awk -F"*" '{OFS="|"}
/^CLM/ {CLM_NBR = $2}
/^NM1/ {SUB_ID = $10}
/^AMT/ {($2=="D")? CP_AMT = $3 : CP_AMT = $100}
/^AMT/ {($2=="A8")? DSALL_AMT = $3 : DSALL_AMT = $100}
/^AMT/ {($2=="F5")? PATPAID_AMT = $3 : PATPAID_AMT = $100}
/^AMT/ {($2=="A8")? NC = $3 : NC = $100}
/^AMT/ {($2=="EAF")? RPL = $3: RPL = $100}
/^SBR/ {GRP_POL = $4; GRP_NM = $5}
/^CLM/ {split($6,x,">") ; FREQ = x[3]}
##/^CLM/ {split($6,x,">");(x[3]=="7"||x[3]=="8")? (/^REF\*F8/ {REP_CLM = $3;REF_QUAL = $2}) : {REP_CLM = $100;REF_QUAL = $100}} 
REP_CLM = $100;REF_QUAL = $100  ## Temporarily assigned NULL value to be printed as above line throws syntax error
/^CN1/ {ICAP_IND = $2}
/^REF\*2U/ {($2=="2U")? HP = $3 : HP = $100}
/^DTP/ {($2 == "573")? PD_DATE = $4 : PD_DATE = $100
print CLM_NBR,$100,$100,$100,SUB_ID,SUB_ID,CP_AMT,
PD_DATE,"0\.00",DSALL_AMT,PATPAID_AMT,NC,RPL,
GRP_POL,GRP_NM,"12",FREQ,
$100,$100,$100,$100,$100,$100,$100,$100,$100,$100,
REP_CLM,$100,ICAP_IND,HP,REF_QUAL}' sample3.dat

Output after executing the file is given below

||.00|||||HELLO||12|1|||||||||100||||05~
||.00|||||HELLO||12|1|||||||||100||||05~

I don't understand where my script is going wrong. Can someone please help me in understanding this.
Thank you

When I executed your script, I get:

1000||||1234A~|1234A~|0.00~|99991231~|0.00|||||HELLO||12|1|||||||||||||05~|HELLO~|
1001||||1234A~|1234A~|0.00~|99991231~|0.00|||||HELLO||12|1|||||||||||||05~|HELLO~|

What OS and version and what awk are you using?

The commented line has faulty syntax. Just try replacing it with simple if statements..

I do not understand what the script is doing. For one I do not get this $100 stuff is that another way of assigning an empty string?.

gawk on Windows 7 Enterprise edition.

I get error while using if statement also.

$100 is to assign NULL values. I forgot to add command to replace '~' which is not a priority.

Simple explanation of my code:

/^CLM/ {CLM_NBR = $2}

If a line begins with CLM, assign second field value to CLM_NBR.

/^CLM/ {split($6,x,">");(x[3]=="7"||x[3]=="8")? (/^REF\*F8/ {REP_CLM = $3;REF_QUAL = $2}) : {REP_CLM = $100;REF_QUAL = $100}} 

For record starting with CLM, 6th field is again delimited by ">". 6th column is splitted and value is held in an array x. If x[3] is either 7 or 8, then consider the line starting with REF*F8.

Once DTP line is encountered, print all the value stored in the variables.

Thanks.
Ashok

That may account for the actual output you showed; make sure there's no DOS <CR> line terminators in your data file.

I don't think so if using if correctly.

And I think rightly, for syntactical as well as logical/semantical reasons:

  • ... ? ... : ... is the conditional assignment operator. You can't use it for flow control. As Scrutinizer said, use if ... else ...
  • ( ... ) can't be used in flow control; use { ... }
  • /^REF\*F8 can't work as desired here
    a) syntactically: a pattern cannot be used inside an action. Use if (/.../) and it will fly
    b) logically: it's working on a line that starts with CLM , so ^REF will never be true. When a line with ^REF is encountered, it will not enter this action.

---------- Post updated at 17:29 ---------- Previous update was at 17:02 ----------

Tried to prettify your script. First version delivered the same output from your sample input as did your script. Some modifications of both script and data file delivered the desired output (or pretty close, at least):

awk -F"*" -vOFS="|" '   {sub (/~$/, NUL)}
         /^CLM/         {CLM_NBR = $2}
         /^NM1/         {SUB_ID = $10}
         /^AMT\*D\*/    {CP_AMT = $3}
         /^AMT\*A8\*/   {DSALL_AMT = $3
                         NC = $3}
         /^AMT\*F5\*/   {PATPAID_AMT = $3}
         /^AMT\*EAF\*/  {RPL = $3}
         /^SBR/         {GRP_POL = $4
                         GRP_NM = $5}
         /^CLM/         {split($6,x,">")
                         FREQ = x[3]
                         FREQ78 = (x[3]=="7"||x[3]=="8")
                        }
         /^CN1/         {ICAP_IND = $2}
         /^REF\*2U\*/   {HP = $3}
         /^REF\*F8\*/ &&
            FREQ78      {REP_CLM = $3
                         REF_QUAL = $2}
         /^DTP/         {if ($2 == "573") PD_DATE = $4
                         print  CLM_NBR, NUL, NUL, NUL, SUB_ID, SUB_ID, CP_AMT, PD_DATE, "0\.00",
                                DSALL_AMT, PATPAID_AMT, NC, RPL, GRP_POL, GRP_NM, "12", FREQ, NUL,
                                NUL, NUL, NUL, NUL, NUL, NUL, NUL, NUL, NUL, REP_CLM, NUL, ICAP_IND, HP, REF_QUAL
                         CLM_NBR = SUB_ID = CP_AMT = PD_DATE = DSALL_AMT = PATPAID_AMT = NC = RPL = GRP_POL = \
                                   GRP_NM = FREQ = REP_CLM = ICAP_IND = HP = REF_QUAL = NUL
                        }
        ' file
1000||||1234A|1234A|0.00|99991231|0\.00|||||HELLO||12|1|||||||||||||05|HELLO|
1001||||1234A|1234A|0.00|99991231|0\.00|||||HELLO||12|7|||||||||||1000||05||F8

Had to make some assumptions (e.g. CLM record appearing before REF F8 record) and had to set the FREQ to 7 in your sample data, but you get the gist (I hope).

1 Like

Thank you @RudiC.
Please explain why you used below code after print statement. I got the desired output when I removed that.

CLM_NBR = SUB_ID = CP_AMT = PD_DATE = DSALL_AMT = PATPAID_AMT = NC = RPL = GRP_POL = \                                    
GRP_NM = FREQ = REP_CLM = ICAP_IND = HP = REF_QUAL = NUL

The sample file which I used had DOS character(^M) and script worked when I removed it.

Thanks,
Ashok

It replaces all those PATPAID_AMT = $100 in the conditional assignments above, and it is relevant only if you have multiple records to process as it resets the variables to NUL (an unassigned variable equivalent to "empty", more understandable than $100).

1 Like