copy range of lines in a file based on keywords from another file

kaaliakahn · January 25, 2012, 11:01am

Hi Guys,

I have the following problem. I have original file (org.txt) that looks like this

module v_1(.....)
//arbitrary number of text lines
endmodule

module v_2(....)
//arbitrary number of text lines
endmodule

module v_3(...)
//arbitrary number of text lines
endmodule

module v_4(...)
//arbitrary number of text lines
endmodule

module v_5(...)
//arbitrary number of text lines
endmodule

I have another file keywords.txt that has keywords for original file org.txt that looks like

v_2
v_3
v_5

What i want to do is to extract modules from org.txt based on keywords.txt and store it in a new file (filter.txt). In this case, i shall be extracting only modules v_2, v_3 and v_5 and storing them in the filter.txt as

module v_2(....)
//arbitrary number of text lines
endmodule

module v_3(...)
//arbitrary number of text lines
endmodule

module v_5(...)
//arbitrary number of text lines
endmodule

How to do that using sed and awk or any other cool way. Note that this was a toy example. The actual files are huge in terms of number of lines.

Andy82 · January 25, 2012, 11:21am

I think the sed code I am using could help you. Provided you know the size of each block of text that you are extracting.

You could try.

 sed -n '/$Var1/,/$Var2/p' < intputfile | head $blocksize > outputfile

Where Var1 and var2 are the ranges you are searching between and blocksize is the number of lines you wish to read.

kaaliakahn · January 25, 2012, 11:28am

Hi Andy
There is no range concept here. Please see the post again as there could be any number of lines between module v_* and endmodule.

Corona688 · January 25, 2012, 11:30am

$ cat module.awk

BEGIN {
        # Load the list of keywords
        while(getline <"keywords.txt") ARR[++C]=$1
}

# Only run this codeblock when not printing.
!P {
        for(N in ARR)
        if(match($0, "module[ \t]+" ARR[N] "[ \t]*\\("))
        {
                P=1 # Start printing lines
                delete ARR[N]; # delete this item to make the loop faster next time
                break; # End the loop early once we find one item
        }
# Print all lines when P is nonzero.
} P

# Stop printing when we find a line with 'endmodule'.  Also, print a blank line.
/endmodule/ { P=0; printf("\n"); }

$ cat data

module v_1(.....)
//arbitrary text
endmodule

module v_2(....)
//arbitrary text
endmodule

module v_3(...)
//arbitrary text
endmodule

module v_4(...)
//arbitrary text
endmodule

module v_5(...)
//arbitrary text
endmodule

$ awk -f module.awk data


module v_2(....)
//arbitrary text
endmodule

module v_3(...)
//arbitrary text
endmodule

module v_5(...)
//arbitrary text
endmodule

kaaliakahn · January 25, 2012, 11:36am

Corona688

Dude! You are awesome. I am running the script now. When it finishes. I shall let you know.

Hat off to you. Thanks so much again.

Warm Regards,

Corona688 · January 25, 2012, 11:48am

If it's just freezing, make sure keywords.txt is in the same directory you run the command in.

kaaliakahn · January 25, 2012, 11:50am

Its warm because its from the heart