search patterns

rochitsharma · February 24, 2006, 2:08pm

hello,

i have an input file of about 50,00,000 lines. few of its lines are as follows:

<CR:0023498789,TPO-14987084;BO=IC&SUB=ALLP
<CF:0023498789,CB=YES;BIL&NC=NO
<CF:0023498789,CW=NO;NS=NO
<GC:0023498789,CG=YES;TPO&NC=YES

<CR:0024659841,TPO-14484621;BO=NO&BA=OC&SUB=ALLH
<CF:0024659841,CB=YES;NC=NO
<CF:0024659841,CW=YES;NC=NO&NS=YES
<GS:0024659841,CU=1234;
<GL:0024659841,PCU=3462;NS=NO

<CR:0026454521,TPO-14525893;BO=IC&SUB=ALLJ
<GL:0026454521,PCU=75321;NC=NO&NS=NO

there no blank lines in input file.
0023498789 , 0024659841 , 0026454521. these are some needed numbers.
there are about 8,00,000 unique numbers in file. Off which usefull are just 2,50,000. i do not require numbers, whose lines contain 'BIL'. Like the second line.
<CF:0023498789,CB=YES;BIL&NC=NO
since BIL occurs so i do not require any line containing 0023498789.

in my output i require
number#TPO#number_if_BO=IC#number_if_BA=OC#SUB#number_if_CU_exists #number_if_PCU_exists

the output for above lines should be:
0024659841#14484621##0024659841#ALLH#0024659841#0024659841
0026454521#14525893#14525893##ALLJ##0026454521

matrixmadhan · February 27, 2006, 3:14am

just a pointer to go with,

grep -v `sed -n '/BIL/p' data | sed -e 's/^.*://;s/,.*//'` data

regarding formatting i am not clear with.