Based on column in file1, find match in file2 and print matching lines

file1:

file2:

I need to find matches for any lines in file1 that appear in file2. Desired output is '>' plus the file1 term, followed by the line after the match in file2 (so the title is a little misleading):

This is honestly beyond what I can do without spending the whole night on it, so I'm hoping someone out there is feeling altruistic.

If "169_(-)_comp100001_c0_seq1" is always third field in your file2 based on delimiter colon and you have two fields "169_(-)_" before "comp100001_c0_seq1", then try this else modify the command accordingly to your input

$ awk -F: ' NR == FNR { arr[$0]=1; next } { sub("[^_]+_[^_]+_","",$3); if(arr[$3]){print $3;getline; print } }' file1 file2
comp1000362_c0_seq1
QSLPFPPNYISLSHAGTLSVNPCTAYRLLKDFVSLSTGDFIIQNGANSGVGRVVIQLCKA
$ cat temp.sh
while read pattern; do
  grep -q $pattern file2
  if [ $? -ne 0 ]; then continue; fi
  line_number=`grep -m 1 -n $pattern file2 | cut -f 1 -d :`
  echo ">$pattern"
  sed -n "$line_number { n; p; q }" file2
done < file1
$ ./temp.sh
>comp1000362_c0_seq1
QSLPFPPNYISLSHAGTLSVNPCTAYRLLKDFVSLSTGDFIIQNGANSGVGRVVIQLCKA

I'm hoping you also feel altruistic. :slight_smile: