Match 2 patterns together

How can I quickly print out lines in a datafile which has presence of both patterns in a row of another file. Maybe awk can do it much faster than bash.

Patternfile

ID1 PAT11 PAT12
ID1 PAT21 PAT22
ID2 PAT31 PAT32

datafile

headerline
rgthhrhhhhhtnjttntjjtjtjtjtjtjtjjjtPAT31rf3fffffPAT32efgreggeeeeggge
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg
rgthhrhhhhhtnjttntjjtjtjtjtjtjtjjjtPAT41rf3fffffPAT32efgreggeeeeggge
fgegegPAT21.ewdwd88weded((gfefggegrg!///*..ttttuuu.PAT22uuhhggggg
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg====

The outputs must be split by the ID (col1) that the patterns belong to.

Outputs

ID1

headerline
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg
fgegegPAT21.ewdwd88weded((gfefggegrg!///*..ttttuuu.PAT22uuhhggggg
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg====


ID2

headerline
rgthhrhhhhhtnjttntjjtjtjtjtjtjtjjjtPAT31rf3fffffPAT32efgreggeeeeggge

My attempt is very slow in bash,

while read pat
do
while read data
do
if  grep -q $pat[1] $data
if  grep -q $pat[2] $data
echo $data >> $pat[0]
fi
fi
done < datafile
done < patfile

Few points,

  • You don't need inner loop as grep access files as parameters not string. If you want to pass string, you have to pass as STDIN.
  • You seem to be using array, but it doesnt work like this.

As your pattern file is delimited by white-spaces :

while read id pat1 pat2
do
  echo $id  >> results_file # print ID
  echo >> results_file # print newline
  grep $pat1 datafile | grep $pat2 >>  results_file  # print matching lines
done < patfile
1 Like

Try

awk     'FNR==NR        {SP[$1,NR]=$2".*"$3; ID[$1]
                         next
                        }
         FNR==1         {for (i in ID) print > i
                         next
                        }
                        {for (s in SP) if ($0 ~ SP) {split (s, FN, SUBSEP); print > FN[1]}
                        }
        ' patfile datafile 
1 Like