Problem with parsing a large file

Hi All,

Following is the sample file

and following is the op desired

that is the last entry of each unique first field is required.

My solution is as follows

However the original file has around a million entries and around a 100,000 uniques first fields, so this soln. will take damn long time to execute.

Is there a better and faster way of doing it

Regards,
Gaurav

nawk -F',' {a[$1]=$0} END { for (i in a) print a}' myFile.txt | sort -n

awesome vgersh, can you please explain the command

using awk's assiciative arrays... read in the file, for every record/line create/update the assiciative arrave indexed by the value of first field with the value of the record/line itself. The last update of A entry in an array will done for the LAST record for a given index [FIRST filed in a record/line].

After processing ALL the records/lines in a file ['END' block of 'awk']... iterate through the previously populated array 'a' indexed by the iterator 'i' [the first field in the original file and output the value for a given index [the original record/line] in a file.

Because the final array iteration does not guarantee the ORDER of the entries, 'sort -n' the output - the sorting is numeric and is done based on the 'FIRST' column.

Hope it's clearER :wink:

Thanks vgersh for your timely help it will definitely save a lot of time

Hi Vgersh,

How can we search and replace a pattern in a file with out opening it and also with replacing it.

Thanks, :slight_smile:
Sam.

First, pls open a NEW thread.
Second, define what you mean by 'without openining a file'?