I have a directory and sub-directory that having �n' number of .log file in nearly 1GB.
The file is comma separated file. I need to recursively grep and uniq first column values only.
I did in perl. But i wish to know more command line utilities to calculate the time for grep and uniq.
An alternate (which may be horribly slow, I don't know) could be:-
cut -f1 -d, *.log | sort -u
.... although this will fail for excessive number of input files because the command grows too long. I suppose you could also wrap it in a find like this:-
Hi Robin, it depends, how the OP's question should be interpreted. I interpreted it to be the unique values among all of the files in the directory and in its subdirectories. Then my solution would be most efficient and it will provide the right answer.
If the idea is to list the unique values per file then your second option should be used, although I think for that to be of use the filename should be printed as well.
Your first option cannot be used in either case, it might happen to provide the right answer if the total number is such that the awk is only called once for all those files. If it is called multiple times then the answer will be incorrect.