Compare one file and get the count of multiple file

Hi All,

I need to compare a record in one file and find the matching across two file.

This is the master file. File name : CUST.dat

CUST_ID
9998
10000
10004
10005

DATAFILE1

9998;80000091;4;687582837443;;;;;;;;;
9998;80000091;4;687582841003;;;;;;;;;
9998;80000091;4;797582801705;;;;;;;;;
10000;80000091;4;85033400411;;;;;;;;;
10000;80000091;4;9648830021;;;;;;;;;
10000;80000091;4;9648830022;;;;;;;;;
10005;80000091;4;687582832052;;;;;;;;;
10005;80000091;4;687582842566;;;;;;;;;
10005;80000091;4;687582843915;;;;;;;;;

DATAFILE2

687582832052
687582842566

Expected output

9998;3;0
10000;3;0
10004;0;0
10005;3;2

Basically I need to take the key from CUST.dat and find the match in DATAFILE1 and extract the column 4 and get the match across DATAFILE2 and get the count of matching

I am doing this on a separate command . Getting the count of each grep and doing it for individually. Since I have 10,000 comparison it is taking more time

 grep ^9998 DATAFILE1
9998;80000091;4;687582837443;;;;;;;;;
9998;80000091;4;687582841003;;;;;;;;;
9998;80000091;4;797582801705;;;;;;;;;

grep 687582837443  DATAFILE2
grep 687582841003 DATAFILE2
grep 797582801705 DATAFILE2

How about

awk -F\; '
FNR==1          {FILE++
                }
FILE == 1       {CNT1[$1]++
                 GR[$4] = $1
                 next
                }
FILE == 2       {CNT2[GR[$1]]++
                 next
                }
FNR > 1         {print $1, CNT1[$1]+0, CNT2[$1]+0
                }
' OFS=";" DATAFILE[12] CUST.dat
9998;3;0
10000;3;0
10004;0;0
10005;3;2
1 Like

Does my files needs to be in sorted order ?. It worked for the sample record I am having big file with that it is not working.

As always: without sufficient data / info, nobody can for the life of it give you any meaningful help.
Post the failure mode and error (message) verbatim and the context that lead to it.
If you post a non-representative sample, you will get a not-necessarily working solution...

1 Like

My mistake , my file I was wrong with file. The above awk solution worked perfect