Hi All,
I need to compare a record in one file and find the matching across two file.
This is the master file. File name : CUST.dat
CUST_ID
9998
10000
10004
10005
DATAFILE1
9998;80000091;4;687582837443;;;;;;;;;
9998;80000091;4;687582841003;;;;;;;;;
9998;80000091;4;797582801705;;;;;;;;;
10000;80000091;4;85033400411;;;;;;;;;
10000;80000091;4;9648830021;;;;;;;;;
10000;80000091;4;9648830022;;;;;;;;;
10005;80000091;4;687582832052;;;;;;;;;
10005;80000091;4;687582842566;;;;;;;;;
10005;80000091;4;687582843915;;;;;;;;;
DATAFILE2
687582832052
687582842566
Expected output
9998;3;0
10000;3;0
10004;0;0
10005;3;2
Basically I need to take the key from CUST.dat
and find the match in DATAFILE1
and extract the column 4 and get the match across DATAFILE2
and get the count of matching
I am doing this on a separate command . Getting the count of each grep and doing it for individually. Since I have 10,000 comparison it is taking more time
grep ^9998 DATAFILE1
9998;80000091;4;687582837443;;;;;;;;;
9998;80000091;4;687582841003;;;;;;;;;
9998;80000091;4;797582801705;;;;;;;;;
grep 687582837443 DATAFILE2
grep 687582841003 DATAFILE2
grep 797582801705 DATAFILE2
RudiC
March 31, 2018, 9:17am
2
How about
awk -F\; '
FNR==1 {FILE++
}
FILE == 1 {CNT1[$1]++
GR[$4] = $1
next
}
FILE == 2 {CNT2[GR[$1]]++
next
}
FNR > 1 {print $1, CNT1[$1]+0, CNT2[$1]+0
}
' OFS=";" DATAFILE[12] CUST.dat
9998;3;0
10000;3;0
10004;0;0
10005;3;2
1 Like
rudic:
How about
awk -F\; '
FNR==1 {FILE++
}
FILE == 1 {CNT1[$1]++
GR[$4] = $1
next
}
FILE == 2 {CNT2[GR[$1]]++
next
}
FNR > 1 {print $1, CNT1[$1]+0, CNT2[$1]+0
}
' OFS=";" DATAFILE[12] CUST.dat
9998;3;0
10000;3;0
10004;0;0
10005;3;2
Does my files needs to be in sorted order ?. It worked for the sample record I am having big file with that it is not working.
RudiC
March 31, 2018, 4:26pm
4
As always: without sufficient data / info, nobody can for the life of it give you any meaningful help.
Post the failure mode and error (message) verbatim and the context that lead to it.
If you post a non-representative sample, you will get a not-necessarily working solution...
1 Like
rudic:
As always: without sufficient data / info, nobody can for the life of it give you any meaningful help.
Post the failure mode and error (message) verbatim and the context that lead to it.
If you post a non-representative sample, you will get a not-necessarily working solution...
My mistake , my file I was wrong with file. The above awk
solution worked perfect