hello,
i have an input file of about 50,00,000 lines. few of its lines are as follows:
<CR:0023498789,TPO-14987084;BO=IC&SUB=ALLP
<CF:0023498789,CB=YES;BIL&NC=NO
<CF:0023498789,CW=NO;NS=NO
<GC:0023498789,CG=YES;TPO&NC=YES
<CR:0024659841,TPO-14484621;BO=NO&BA=OC&SUB=ALLH
<CF:0024659841,CB=YES;NC=NO
<CF:0024659841,CW=YES;NC=NO&NS=YES
<GS:0024659841,CU=1234;
<GL:0024659841,PCU=3462;NS=NO
<CR:0026454521,TPO-14525893;BO=IC&SUB=ALLJ
<GL:0026454521,PCU=75321;NC=NO&NS=NO
there no blank lines in input file.
0023498789 , 0024659841 , 0026454521. these are some needed numbers.
there are about 8,00,000 unique numbers in file. Off which usefull are just 2,50,000. i do not require numbers, whose lines contain 'BIL'. Like the second line.
<CF:0023498789,CB=YES;BIL&NC=NO
since BIL occurs so i do not require any line containing 0023498789.
in my output i require
number#TPO#number_if_BO=IC#number_if_BA=OC#SUB#number_if_CU_exists #number_if_PCU_exists
the output for above lines should be:
0024659841#14484621##0024659841#ALLH#0024659841#0024659841
0026454521#14525893#14525893##ALLJ##0026454521