Pattern.txt - It contains patterns to be matched. It has large number of patterns to be matched.
Cat Pattern.txt
Ram
Shyam
Mohan
Jhon
I have another file which has actual data and records are delimted by single or multiple spaces.
2. Content.txt
Cat Content.txt
1@GU00450012@Ram @@@@ bla1 lba2.
2@GU11950004@David @@@@ uss Ram
3@GU11950004@Shyam @@@@ uss rupa
etc etc
Now I need to find the pattern in content.txt but only in first field. I tried using
grep -F -f pattern.txt content.txt
It returns me rows like
2@GU11950004@David @@@@ uss Ram
Becuase it contains pattern called 'Ram' somewhere
It seems to work but it looks the pattern all over the file. I need to restrict the search to first field only. Hen
I know we I can store patterns using awk in array using
NR==FNR
but not sure how to search each of them in content.txt in first field only.
Notice my awkcode line assumes the pattern to match is preceded by @ and must match to end of field. Take a look at the commented out awkcode line if it should match just on the name regardless of where located.
Using the commented out one, the following lines would match for Masters:
awk '
FNR==NR { # prevents loading Content.txt into array s
s[$0] # load Pattern.txt file into array s
next # move to process next line of Pattern.txt
}
{
for (p in s){ # iterate each pattern
if(match($1, p)){ # check pattern for match against first field
print # print record if match is found
next # stop pattern iteration for this record, match was found already
}
}
}' Pattern.txt Content.txt
Perl version
Copy as search.pl and run as perl search.pl Pattern.txt Context.txt
#!/usr/bin/perl
# search.pl
# Perl facilities to help avoiding errors
use strict;
use warnings;
# files names to obtain from command line
my $pattern_file = shift or die;
my $context_file = shift or die;
# open pattern file for read
open my $fh, '<', $pattern_file or die;
# load pattern file into an array
my @patterns =<$fh>;
# dismiss patterns file handle
close $fh;
# remove the newline at end of record
chomp(@patterns);
# open context file for read
open $fh, '<', $context_file or die;
# iterate line by line through the context file
while(<$fh>) {
# obtain the first field
my ($field) = split;
# search field for pattern; move to next line if match found
for my $p (@patterns) {
$field =~ /$p/ and print and next;
}
}
# dismiss context file handle
close $fh;
I have been trying to solve this through sed.
inline!
sed -n -e '/\@{sed -e '1p' pattern.txt}/p' content.txt
Also tried curlys with many other combination, just can't get it working.
I Like the idea of passing the result of one sed to another with this sub {sed} convention. This is the code I found That put me in this direction
sed -e '/<TEXT1>/{r File1' -e 'd}' File2
particular example, which is not exactly what I need but tried to modify to fit this one.
No go!
---------- Post updated at 04:15 PM ---------- Previous update was at 04:01 PM ----------
Earlier I was able to pass it with xargs, but I still I would prefer sed only.
sed -n '1p' < pattern.txt | xargs -I output sed -n '/\@output/p' content.txt
Actually, it's a grep version. The sed command just makes sure grep is working on the first field by adding the needed regex parts (no spaces up to the name, trailing space).
I had the impression the result required was the @Ram line, but I see all Ram should print.
sed -n '1p' < a.txt | xargs -I output sed -n '/output/p' b.txt
So this small adjustment prints all Ram in all forms.
Hope this helps.
I learned a whole lot with your problem, about /dev/stdout and r (read) option in sed. Mostly I'm now aware of them and trying to implement them in examples.