Displaying the first field if the second field matches the pattern using Perl

royalibrahim · November 4, 2012, 8:20pm

Hi,

I am trying with the below Perl command to print the first field when the second field matches the given pattern:

perl -lane 'open F, "< myfile"; for $i (<F>) {chomp $i; if ($F[1] =~ /patt$/) {my $f = (split(" ", $i))[0]; print "$f";}} close F' dummy_file

I know I can achieve the same with the following thread: Displaying lines of a file where the second field matches a pattern. But still I am curious why this code is not returning any expected result and how to correct it. Any help please.

elixir_sinari · November 4, 2012, 10:10pm

The code you've provided does what it's supposed to do.

First of all, you are opening the file myfile , reading it completely and then closing the associated filehandle, once for each line in the file dummy_file .

For every line from dummy_file in which the 2nd field $F[1] (split on white-space using the -a run-time switch) matches the pattern patt anchored at the end, the first element of the list slice (split(" ", $i))[0] , obtained by splitting the line from myfile on white-space, is displayed.

What exactly are you trying to do? Please elaborate with input and output samples.

royalibrahim · November 5, 2012, 9:11pm

Thank you, but the dummy file is just an empty file to fulfill or complete the syntax (else Perl reads from stdin). The actual file in action is "myfile" where I am looping over its contents by reading from the file handle 'F'. The white space splitting is done on the myfile's contents and not of the dummy file.

Lets say, my input is, myfile's contents:

aa bb cc
1a 2a 3a

I would like to get the output as:
aa

where 'bb' is the pattern to be matched.

balajesuri · November 5, 2012, 9:23pm

perl -lane '$F[1] =~ /bb$/ && print $F[0]' myfile

OR

 awk '$2 ~ /bb$/ {print $1}' myfile

elixir_sinari · November 5, 2012, 9:26pm

That's as simple as:

perl -lane 'print $F[0] if $F[1] =~ /bb$/' myfile

You don't need any dummy file.

royalibrahim · November 6, 2012, 1:56am

Thank you balajesuri and elixir_sinari for the simplified code

Ok, now I got the idea and I have revised my code as below:

perl -lane 'open F, "< f1"; for $i (<F>) {chomp $i; if ($i =~ /$F[1]$/) {my $f = (split(" ", $i))[0]; print "$f";}} close F' f2

I am printing here the 1st field of file f1 if any of its line contains file f2's second column's string as a pattern to be matched at the end of the line

The input contents are:

$ cat f1
aa b c patt
11 2 3 4

$ cat f2
This patt
1234 xxxx

Now, it's working like a charm and I am getting the expected result

elixir_sinari · November 6, 2012, 2:12am

Of course, yes. You did it, right?

Why would perl complain when what you were doing was syntactically correct but logically wrong?

Never use a for (or foreach) loop to read a file. In this case, all lines from the file (depending on the input record separator) will be read in at once and stored in memory to form a list. Your loop will then iterate over the elements of this list. So, you might hit your system's memory limits.

A while loop is much much better as it reads one record at a time.

balajesuri · November 6, 2012, 11:52am

@royalibrahim:

perl -n -e '#code' file

is equivalent to

#! /usr/bin/perl
open F, "< file";
while (<F>) {
    #code
}
close F;

So, if you really want to do it by explicitly on command line by opening a file handle, you may do it this way,

perl -e 'open F, "< file"; while (<F>) { #code }; close F'