How can I quickly print out lines in a datafile which has presence of both patterns in a row of another file. Maybe awk
can do it much faster than bash.
Patternfile
ID1 PAT11 PAT12
ID1 PAT21 PAT22
ID2 PAT31 PAT32
datafile
headerline
rgthhrhhhhhtnjttntjjtjtjtjtjtjtjjjtPAT31rf3fffffPAT32efgreggeeeeggge
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg
rgthhrhhhhhtnjttntjjtjtjtjtjtjtjjjtPAT41rf3fffffPAT32efgreggeeeeggge
fgegegPAT21.ewdwd88weded((gfefggegrg!///*..ttttuuu.PAT22uuhhggggg
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg====
The outputs must be split by the ID (col1) that the patterns belong to.
Outputs
ID1
headerline
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg
fgegegPAT21.ewdwd88weded((gfefggegrg!///*..ttttuuu.PAT22uuhhggggg
fgegegPAT11.ewdwd88weded((gfefggegrg!///*...PAT12uuhhggggg====
ID2
headerline
rgthhrhhhhhtnjttntjjtjtjtjtjtjtjjjtPAT31rf3fffffPAT32efgreggeeeeggge
My attempt is very slow in bash,
while read pat
do
while read data
do
if grep -q $pat[1] $data
if grep -q $pat[2] $data
echo $data >> $pat[0]
fi
fi
done < datafile
done < patfile