Problem with parsing a large file

gauravgoel · October 18, 2006, 11:17am

Hi All,

Following is the sample file

and following is the op desired

that is the last entry of each unique first field is required.

My solution is as follows

However the original file has around a million entries and around a 100,000 uniques first fields, so this soln. will take damn long time to execute.

Is there a better and faster way of doing it

Regards,
Gaurav

vgersh99 · October 18, 2006, 11:28am

nawk -F',' {a[$1]=$0} END { for (i in a) print a}' myFile.txt | sort -n

gauravgoel · October 18, 2006, 11:31am

awesome vgersh, can you please explain the command

vgersh99 · October 18, 2006, 11:47am

using awk's assiciative arrays... read in the file, for every record/line create/update the assiciative arrave indexed by the value of first field with the value of the record/line itself. The last update of A entry in an array will done for the LAST record for a given index [FIRST filed in a record/line].

After processing ALL the records/lines in a file ['END' block of 'awk']... iterate through the previously populated array 'a' indexed by the iterator 'i' [the first field in the original file and output the value for a given index [the original record/line] in a file.

Because the final array iteration does not guarantee the ORDER of the entries, 'sort -n' the output - the sorting is numeric and is done based on the 'FIRST' column.

Hope it's clearER

gauravgoel · October 18, 2006, 11:58am

Thanks vgersh for your timely help it will definitely save a lot of time

sbasetty · October 18, 2006, 5:36pm

Hi Vgersh,

How can we search and replace a pattern in a file with out opening it and also with replacing it.

Thanks,
Sam.

vgersh99 · October 18, 2006, 5:48pm

First, pls open a NEW thread.
Second, define what you mean by 'without openining a file'?