Sort flat file by 3rd column in perl

Hello Guys

I want to sort a flat file by the third column (numeric ) and store it in some other name

I/P

 
9924873|20111114|00000000000013013|130|13|10/15/2010 12:36:22|W860944|N|00
9924873|20111114|00000000000013009|130|09|10/15/2010 12:36:22|W860944|N|00
9924873|20111114|00000000000029207|292|07|05/29/2001 10:35:32|DADS_JAMESA|N|00

O/P

 
9924873|20111114|00000000000013009|130|09|10/15/2010 12:36:22|W860944|N|00
9924873|20111114|00000000000013013|130|13|10/15/2010 12:36:22|W860944|N|00
9924873|20111114|00000000000029207|292|07|05/29/2001 10:35:32|DADS_JAMESA|N|00

Thanks a lot guys!!!!!!!!!!!

Show any tries, thanks.

Following on from zaxxon's reasonable request try coding up the following and coming back with specific questions that are causing you issues.

  • open the file
  • slurp records into an array
  • call Perl's sort with a sort function which uses the numeric comparison operator on $a and $b 3rd field
  • write the sorted array to the new file.

It's much more rewarding if you learn something in the process :wink:

You guys are absolutely correct
Here what I come up with ...I dont have any working knowledge in perl

 
#!/usr/bin/env perl
use strict;
use warnings;
my %link_strengths;
while (<>) {
chomp;
my @rec = split /\|/; 
my $key = $rec[2]; 
# my ($link, $strength) = split /\s+/;
$link_strengths{$key} = $_;
}
my @sorted = sort {
$link_strengths{$a} <=> $link_strengths{$b}
} keys %link_strengths;
for my $link (@sorted) {
print "$link: $link_strengths{$link}\n";
}

I am confused how to store the third value and also how can I use the numeric comparison operator on $a and $b 3rd field .

Please help ,if you send some url it will be of gr8 help!!!!!!!!!

In shell ..

$ sort -t\| +2n < infile

Thanks but I need it in perl!!

#!/usr/bin/perl
use strict;
use warnings;

open (my $data , '<', $ARGV[0])|| die "could not open $ARGV[0]:\n$!";
my @array=(<$data>);
my @sorted=sort {(split(/\|/,$a))[2]<=>(split(/\|/,$b))[2]} @array;
print @sorted;

The "magic" line here is the sort routine, we use split to create an unnamed temporary array and then the numeric comparison operator ( <=> , also known as the spaceship operator ), to provide a numeric comparison. Perl's sort function is extremely powerful precisely because one of the arguments can be a function, (have a look at perldoc -f sort ).

The method you are trying to use above works for this case, however it depends on the third field being unique to each record.

#!/usr/bin/perl
use strict;
use warnings;

my (%link_strength);
open (my $data , '<', $ARGV[0])|| die "could not open $ARGV[0]:\n$!";
my @array=(<$data>);
for (@array){
    $link_strength{$1}=$_  if /(?:[^|]+\|){2}([^|]+)/;
}
print $link_strength{$_} for (sort {$a<=>$b} keys %link_strength);
1 Like
$
$
$ cat f40
9924873|20111114|00000000000013013|130|13|10/15/2010 12:36:22|W860944|N|00
9924873|20111114|00000000000013009|130|09|10/15/2010 12:36:22|W860944|N|00
9924873|20111114|00000000000029207|292|07|05/29/2001 10:35:32|DADS_JAMESA|N|00
$
$
$ perl -lne 'push @x,$_; END {print for (sort {substr($a,17,17) <=> substr($b,17,17)} @x)}' f40
9924873|20111114|00000000000013009|130|09|10/15/2010 12:36:22|W860944|N|00
9924873|20111114|00000000000013013|130|13|10/15/2010 12:36:22|W860944|N|00
9924873|20111114|00000000000029207|292|07|05/29/2001 10:35:32|DADS_JAMESA|N|00
$
$
$

tyler_durden

1 Like

Thanks a lot !!!!!!

---------- Post updated at 01:47 AM ---------- Previous update was at 01:43 AM ----------

Thanks a lot Skrynesaver for not only solving the issue but also for the explanation

Can you please kind enough to help me to read the line
$link_strength{$1}=$_ if /(?:[^|]+\|){2}([^|]+)/;

Is it storing the 3rd column in the hash value

How the keys value generated in the last line
print $link_strength{$_} for (sort {$a<=>$b} keys %link_strength);

No it's actually creating an entry using the 3rd field ass a key for the hash and the entire record as the value.

At this point we have slurped the entire file into an array and because we are stepping through the array ( for (@array){ ) the default variable $_ is the entire record.

If the record matches the pattern in the regex the first capturing parenthesis matches the third field and so the third field is stored in $1

We now store the record in the hash with the 3rd field as key.

We retrieve the keys of the hash (field 3) and sort them numerically and process this list in the provided order. For each key we then print the value which is the stored record.

As I said above this method depends on the 3rd field being unique to each record. You could modify it to use the extracted 3rd field as a value and the record as a key ( $link_strength{$_}=$1 if /(?:[^|]+\|){2}([^|]+)/; this would mean that each record would have to be unique but the link strengths could be the same in several records.) then cycle through the hash with a sort function in place, something like ( for (sort {$link_strength{$a} <=> $link_strength{$b}} keys %link_strength){

1 Like

Thanks a lot everyone to help me

Can you please please let me know how to read the array from file ........
actually after sorting the data in two file there is need of file comparison

So the array needs to be stored in to files before comparing....

I have tried with below but its not working

 
open my $fh1, '<', @sorted or die "Can't open $file1: $!";
print $fh3;

You guys helped me a lot!!!!thanks a ton for that!!!!!!!!!

The call to the open function is incorrect. Have a look at the online Perl documentation:

open - perldoc.perl.org

Since you'd open a file, you may want to put the file name as the third argument. Also, $fh3 is undefined in the print statement.

tyler_durden

1 Like

Hello Tyler

Thanks a lot for the explanation

What I am trying here is to copy a sorted array to a file .then compare it to another file

My objective here is I have to compare two hugeeeeee file ,so I want to sort them first and then trying to compare them (Will it optimize the search)

So already you guys showed me how to sort the input file to array .....so I have to copy the array to a file and then have to compare or can I compare two array directly

Please find the attached script for file comparison

In the while loop is it possible to pass the arrat instead of file

Please please help.....:wall: