print line if 2nd field exists in text

2 files, first one has 3 fields seperated by ||| and 2nd one is plain text.

I want to copy the lines from the first file if the 2nd field is present anywhere in the text file. This is what I've tried, but I'm new to awk and shell scripting in general so it's kinda broken.

#!/bin/awk -f
BEGIN 
{
   FS=" ||| ";
}

FILENAME=="./input/text" 
{
   # for all lines
   for (i; i < NR; i++) {
   # for all records on the current line
      for (j, j < NF; j++) {
         # if the 2nd field is seen in the text, increase its count
         if ($2 == $j) {
            seen[$2]++;
         }
      }
   }
}


# If the 2nd expression was seen at least once, print its corresponding line
{
   if ( seen[$2] > 0 ) {
      print $1 " \|\|\| " $2 " \|\|\| " $3;
   }
}

Please post a few lines from each file.

1st file
1 ||| abc kek ||| 2
1 ||| def foo ||| 3
2 ||| ghi bleh blah ||| 4
3 ||| jkl ||| 5

2nd file (text)
bla bla bla jkl bla bla bla
def foo test 123. abc kek

Expected output
1 ||| abc kek ||| 2
1 ||| def foo ||| 3
3 ||| jkl ||| 5

sick command, but it seems to work :slight_smile:

sed 's/ ||| /|/g' file1 | while read line ; do grep -qw "$(echo $line | awk -F "|" '{print $2}')" file2 && echo $line ; done| sed 's/|/ ||| /g'

Thanks, it works perfect! I'm just not too sure if I understand how it operates, can you elaborate?

the first sed command just merges " ||| " to "|", to make the awk-statement easier, last sed changes | to ||| again

$(echo $line | awk -F "|" '{print $2}')

this part stands for the part in the middle of your input file
abc kek
def foo
..
..

an easier way perhaps is, to write that output to a variable, and grep for that variable, it's just shorter

so you grep (-q for quite, so no output from grep) , line per line, the middle part of file1 out of file2, and if the return code of grep is 0, the string was found, and the whole line is printed out, that's

&& echo $line

and this line per line

cheers

just awk will do. no need to involve that much piping of various commands that do similar things.

awk   'BEGIN{FS=" [|][|][|] ";s="";}
FNR==NR{
 for ( i=1;i<=NF;i++){
  s=s" "$i
 } 
 next
}match(s,$2 )' file2 file1

output

# ./test.sh
1 ||| abc kek ||| 2
1 ||| def foo ||| 3
3 ||| jkl ||| 5

hey using your field separator " [|][|][|] ", there is just one pipe in my script ^^

while read line ; do grep -qw "$(echo $line | awk -F " [|][|][|] " '{print $2}')" file2 && echo $line ; done < file1

but I agree with you, yours is much faster

with this method , you have to call grep+awk 500 times if there are 500 lines in file1. further adding to the inefficiency is if there are many lines in file2 (x500 times).

Hey,

First of all, thanks a bunch, that script is allowing me to get rid of a lot of unnecessary data in my files. I've been using the script from ghostdog74 and it works great for the most part but it gets stuck when the expression has reserved signs like the ones listed below (and probably some others).

? | + -

Any way to force those characters to be considered as text?

I'm thinking about just changing all special characters by adding a "\" before them, and then undo that change when the processing is done. Is that optimal?

your sample file doesn't contain those special characters right? then show one that does.

1st file
1 ||| 123/ eb ||| 513
2 ||| k + 12 ||| 51
3 ||| | ||| 5
4 ||| a ) ||| 7
5 ||| bla ? ||| 123
6 ||| foobar ||| 543
7 ||| foobar2 ||| 12346

2nd file (text)
bla bla k + 12 bla jkl bla | bla bla
def foo test 123/ eb . abc kek a )
bla ? abcdefg

Expected output
1 ||| 123/ eb ||| 513
2 ||| k + 12 ||| 51
3 ||| | ||| 5
4 ||| a ) ||| 7
5 ||| bla ? ||| 123

awk   'BEGIN{FS=" [|][|][|] ";s="";}
FNR==NR{
 for ( i=1;i<=NF;i++){
  s=s $i
 } 
 next
}
{
  s2=$2
  gsub("+","\\+",s2) #escape special chars
  gsub("?","\\?",s2)
}
match(s,s2 ) 
' file2 file1

perl:

use strict;
my %hash;
open my $fh,"<","file2.txt";
while(<$fh>){
	chomp;
	my @tmp=split(/\|{3}/,$_);
	$tmp[1]=~s/^\s*//;
	$tmp[1]=~s/\s*$//;
	$hash{$tmp[1]}=$_;
}
close $fh;
while(<DATA>){
	foreach my $item(keys %hash){
		if (/$item/){
			print $hash{$item},"\n" ;
			delete $hash{$item};
		}
	}
}
__DATA__
bla bla bla jkl bla bla bla 
def foo test 123. abc kek jkl

@summer cherry. OP's file can have special characters.