Hi,
I have the following command in place
nawk -F, '!a[$1,$2,$3]++' file > file.uniq
It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error:
bash-3.2$ nawk -F, '!a[$1,$2,$3]++' OTCTempD.dat > OTCTemp.uniq
nawk: symbol table overflow at 4044735840353890OTC
input record number 5.42076e+07, file OTCTempD.dat
source line number 1
More information:
- No of records in file:
bash-3.2$ cat OTCTempD.dat | wc -l
179128368
- Size of the file :
-rw-r--r-- 1 magt2 grip 7338355879 Apr 12 14:08 OTCTempD.dat
Contents of file:
a,b,c,2,2,3
a,b,c,1,2,3
a,b,E,1,2,3
a,b,c,1,2,3
Output should be:
a,b,c,2,2,3 //take first record only out of dupes
a,b,E,1,2,3
Now how to resolve this.
Does awk uses some kind of memory and its exceeding its limit?
What should be the best approach to acheive the desired purpose?
Cant we use uniq directly to get remove such criteria based dupes?
Kindly Suggest.