Need optimized awk/perl/shell to give the statistics for the Large delimited file

kartikirans · September 13, 2018, 5:03pm

I have a file size is around 24 G with 14 columns, delimiter with "|"

My requirement- can anyone provide me the fastest and best to get the below results

Number of records of the file
First column and second Column- Unique counts

Thanks for your time
Karti

------ Post updated at 04:03 PM ------

Correction -

Number of records of the file
First column and second Column- Distinct column values , not the counts.

Scrutinizer · September 13, 2018, 5:12pm

Try:

awk -F\| '!A[$1]++{c1++}; !B[$2]++{c2++} END{print c1, c2, NR}' file

kartikirans · September 13, 2018, 5:24pm

Thanks , I need to redirect the Distinct column1 and column2 to dis_col1.txt and dis_col2.txt files. File SIze is Huge ( 24 G). Appreciate for your quick reply and time

neutronscott · September 14, 2018, 11:54pm

something like:

awk -F\| '!a[$1]++ { print $1 > "dis_col1.txt"; } !b[$2]++ { print $2 > "dis_col2.txt"; } END { print NR; }' file