copy range of lines in a file based on keywords from another file

Hi Guys,

I have the following problem. I have original file (org.txt) that looks like this

module v_1(.....)
//arbitrary number of text lines
endmodule

module v_2(....)
//arbitrary number of text lines
endmodule

module v_3(...)
//arbitrary number of text lines
endmodule

module v_4(...)
//arbitrary number of text lines
endmodule

module v_5(...)
//arbitrary number of text lines
endmodule

I have another file keywords.txt that has keywords for original file org.txt that looks like

v_2
v_3
v_5

What i want to do is to extract modules from org.txt based on keywords.txt and store it in a new file (filter.txt). In this case, i shall be extracting only modules v_2, v_3 and v_5 and storing them in the filter.txt as

module v_2(....)
//arbitrary number of text lines
endmodule

module v_3(...)
//arbitrary number of text lines
endmodule

module v_5(...)
//arbitrary number of text lines
endmodule

How to do that using sed and awk or any other cool way. Note that this was a toy example. The actual files are huge in terms of number of lines.

I think the sed code I am using could help you. Provided you know the size of each block of text that you are extracting.

You could try.

 sed -n '/$Var1/,/$Var2/p' < intputfile | head $blocksize > outputfile 

Where Var1 and var2 are the ranges you are searching between and blocksize is the number of lines you wish to read.

Hi Andy
There is no range concept here. Please see the post again as there could be any number of lines between module v_* and endmodule.

$ cat module.awk

BEGIN {
        # Load the list of keywords
        while(getline <"keywords.txt") ARR[++C]=$1
}

# Only run this codeblock when not printing.
!P {
        for(N in ARR)
        if(match($0, "module[ \t]+" ARR[N] "[ \t]*\\("))
        {
                P=1 # Start printing lines
                delete ARR[N]; # delete this item to make the loop faster next time
                break; # End the loop early once we find one item
        }
# Print all lines when P is nonzero.
} P

# Stop printing when we find a line with 'endmodule'.  Also, print a blank line.
/endmodule/ { P=0; printf("\n"); }

$ cat data

module v_1(.....)
//arbitrary text
endmodule

module v_2(....)
//arbitrary text
endmodule

module v_3(...)
//arbitrary text
endmodule

module v_4(...)
//arbitrary text
endmodule

module v_5(...)
//arbitrary text
endmodule

$ awk -f module.awk data


module v_2(....)
//arbitrary text
endmodule

module v_3(...)
//arbitrary text
endmodule

module v_5(...)
//arbitrary text
endmodule
1 Like

Corona688

Dude! You are awesome. I am running the script now. When it finishes. I shall let you know.

Hat off to you. Thanks so much again.

Warm Regards,

If it's just freezing, make sure keywords.txt is in the same directory you run the command in.

1 Like

Its warm because its from the heart :wink: