Need optimized awk/perl/shell to give the statistics for the Large delimited file

I have a file size is around 24 G with 14 columns, delimiter with "|"

My requirement- can anyone provide me the fastest and best to get the below results

Number of records of the file
First column and second Column- Unique counts

Thanks for your time
Karti

------ Post updated at 04:03 PM ------

Correction -

Number of records of the file
First column and second Column- Distinct column values , not the counts.

Try:

awk -F\| '!A[$1]++{c1++}; !B[$2]++{c2++} END{print c1, c2, NR}' file 
1 Like

Thanks , I need to redirect the Distinct column1 and column2 to dis_col1.txt and dis_col2.txt files. File SIze is Huge ( 24 G). Appreciate for your quick reply and time

something like:

awk -F\| '!a[$1]++ { print $1 > "dis_col1.txt"; } !b[$2]++ { print $2 > "dis_col2.txt"; } END { print NR; }' file
1 Like