Field validations in multiple files CSV

hyperion.krish · January 30, 2012, 10:03am

Hi,

I am regular reader of this forum. My advanced thanks to everyone.

Below given are the sample files

INDATA (Main data)

 
Fild1�fld2�fld3�..
Fild1�fld2�fld3�..
Fild1�fld2�fld3�..
Fild1�fld2�fld3�..
Fild1�fld2�fld3�..
 
.
.
N records (140000) eg
 
GRPDATA (Reference file)
Fild1�fld2�fld3�..
Fild1�fld2�fld3�..
Fild1�fld2�fld3�..
.
.
100 or 150 (small file)

I have to prepare one output file by using this indata and referring the fields in grp data files. Means I have to parse simultaneously.
its like a Transformation process.

The I prepared which I done without using awk is working perfect but its taking a huge time like 2 r 3 hrs to generate outfile.
When I use awk I unable to parse simultaneously. It process sequentially ie IN_DATA then GRPDATA.
I have to open both files at a time and all the first rec in IN_DATA should be verified with all the records in GRPDATA (until get match)
The second rec in IN_DATA should be verified with all the records in GRPDATA (until get match)
Like wise�..

Please show me light on this and that should not affect the performance.
Its very urgent i am almost reaching the dead line.

My coding

#Include File for Environment variable usage
. /opt/hyperion/Payer_Transformation/Scripts/PayersTrnEnv.env
#Defining Log file for this treatment
awk -F"�" 'BEGIN {
    FS = "�"
    OFS = "|"  }
   {
FILENAME=="$IN_DATA" 
 if ( $7 == "RETAIL" )
   {
    Tar_Loc=$7
    if ( $9 -le 83 )
     Tar_Lob="RETAIL30";
     else Tar_Lob ="RETAIL90";
   }
   
  if ( $7 == "MAIL" )
   { 
    Tar_Loc=$7; 
    Tar_Lob="MAIL";
    if ( $2 ~ /PCS/ )
     { if ( $3 ~ /V/ )
      {Group_ID="V";}
      else if ($3 ~ /2407/ || $3 ~ /2428/) 
       {Group_ID="HME";}
     }
   } 
  Tar_Year=$1;
  Tar_Num_Rxs=$11;
  Tar_Tot_Rev=$12;
  Tar_GP_Wac=$13; }
  FILENAME=="$GRP_DATA"
  while ((getline grp < "$GRP_DATA") > 0){
  split(grp, grpfield, "�")
/##   if [grpfield (2)== IN_DATA .$2 ] && [grpfield (3) == "$Group_ID" ] && [grpfield (4) == IN_DATA.$7 ]; 
    then Tar_Grp_Nam= grpfield (1);
  fi
###/ 
If  Tar_Grp_Nam = �� then if [grpfield (2)== IN_DATA .$2 ] && [grpfield (3) == "$Group_ID" ] && [grpfield (4) == �MAIL� ]; 
Tar_Grp_Nam= grpfield (1);
 
Print Tar_Year,Tar_Loc,Tar_Lob,Tar_Grp_Nam,Tar_Num_Rxs,Tar_Tot_Rev,Tar_GP_Wac 
 } ' $IN_DATA $GRP_DATA

Shell_Life · January 30, 2012, 10:42am

You should have displayed the desired output.

Thus, making easier to understand your requirement.

In any event, an easy way to find if records of file A is present if file B:

grep -f FileB FileA

hyperion.krish · January 31, 2012, 6:53am

Thanks Shell Life.

I am strugling to use grep inside the awk here.

awk '
BEGIN {
FS = "�"
OFS = "|"
}
{

##Need to comare
if (indata.$2=grpdata.col2 && indata.$7= grpdata.col4)
{ groupname = grpdata.col3
}
##### not working ##### grep "^[^,]*,$2,[^,]*,$7,[^,]*,[^,]*," "'$GRP_DATA'"
print groupname
}' $IN_DATA

if i can compare variables in IN_DATA and GRP_DATA(ref file) problem will be solved.

thanks in advance