I have the following problem. I have original file (org.txt) that looks like this
module v_1(.....)
//arbitrary number of text lines
endmodule
module v_2(....)
//arbitrary number of text lines
endmodule
module v_3(...)
//arbitrary number of text lines
endmodule
module v_4(...)
//arbitrary number of text lines
endmodule
module v_5(...)
//arbitrary number of text lines
endmodule
I have another file keywords.txt that has keywords for original file org.txt that looks like
v_2
v_3
v_5
What i want to do is to extract modules from org.txt based on keywords.txt and store it in a new file (filter.txt). In this case, i shall be extracting only modules v_2, v_3 and v_5 and storing them in the filter.txt as
module v_2(....)
//arbitrary number of text lines
endmodule
module v_3(...)
//arbitrary number of text lines
endmodule
module v_5(...)
//arbitrary number of text lines
endmodule
How to do that using sed and awk or any other cool way. Note that this was a toy example. The actual files are huge in terms of number of lines.
$ cat module.awk
BEGIN {
# Load the list of keywords
while(getline <"keywords.txt") ARR[++C]=$1
}
# Only run this codeblock when not printing.
!P {
for(N in ARR)
if(match($0, "module[ \t]+" ARR[N] "[ \t]*\\("))
{
P=1 # Start printing lines
delete ARR[N]; # delete this item to make the loop faster next time
break; # End the loop early once we find one item
}
# Print all lines when P is nonzero.
} P
# Stop printing when we find a line with 'endmodule'. Also, print a blank line.
/endmodule/ { P=0; printf("\n"); }
$ cat data
module v_1(.....)
//arbitrary text
endmodule
module v_2(....)
//arbitrary text
endmodule
module v_3(...)
//arbitrary text
endmodule
module v_4(...)
//arbitrary text
endmodule
module v_5(...)
//arbitrary text
endmodule
$ awk -f module.awk data
module v_2(....)
//arbitrary text
endmodule
module v_3(...)
//arbitrary text
endmodule
module v_5(...)
//arbitrary text
endmodule