Grepping text block by block by using for loop

anushree.a · September 24, 2012, 4:54am

Hei buddies,
Need ur help once again.

I have a file which has bunch of lines which starts from a fixed pattern and ends with another fixed pattern.
I want to make use of these fixed starting and ending patterns to select the bunch, one at a time.

The input file is as follows.

Hi welcome
blah blah blah
blah blah blah
Bye**
Hi welcome
blah  blah
blah  blah
blah  blah
blah  blah
blah  blah
Bye**
Hi welcome
blah 
blah 
Bye**
Hi welcome
blah blah blah
blah blah blah
blah blah blah
blah blah blah
blah blah blah
Bye**
Hi welcome
blah blah blah
Bye**

I tried using awk '/Hi welcome/,/Bye**/' inputfile.txt to select text from Hi welcome to Bye** However, it selects complete document may be because even whole document starts with "Hi welcome" and ends with "Bye**". Here I am trying to get it block by block (From Hi welcome to Bye** is one block) in a temp file by using for loop.

Please help. Little urgent.

Thank you.
Anu.

pamu · September 24, 2012, 5:03am

Try this....

awk '{if($0 ~ /^Hi welcome/){ s=$0}else{if($0 !~ /Bye\*\*/){if(s != "") { s=s"\n"$0}}else{s=s"\n"$0; print s"\n"}}}' file > temp_file

anushree.a · September 24, 2012, 5:11am

Hi Pamu,
The solution is not working. Output file is generated of zero MB

pamu · September 24, 2012, 5:13am

What is output of this..?

awk '{if($0 ~ /^Hi welcome/){ s=$0}else{if($0 !~ /Bye\*\*/){if(s != "") { s=s"\n"$0}}else{s=s"\n"$0; print s"\n"}}}' file

RudiC · September 24, 2012, 5:21am

I don't think you can use awk to process parts of files in a loop, at least not without additional measures. Try this suggestion to create several .tmp files that you can loop through afterwards:

awk     '/^Hi welcome/ {++fn}
         {print >fn".tmp"}   
         /Bye\*\*/ {close (fn".tmp")}
        ' infile

You could even leave out the /Bye.../ line if there's not too many files open...

anushree.a · September 24, 2012, 5:30am

Hi Pamu, once i pressed "Enter" key it returned to $ prompt without giving any output on screen.

Hi Rudic, the solution that you have given is working fine but I have more than a million records ka file to process. So its difficult to follow your suggestion. Any better way of doing it.

Please help
Anu.

RudiC · September 24, 2012, 5:34am

Well, here's a solution to use in a for loop (in bash!). It will not be too performant on large files, as awk always scans through the entire file!

for ((i=1;i<=5;i++))
do
echo Block: $i
awk     '/^Hi welcome/ {++fn}
     {if (fn==blockno) print}
    ' blockno=$i test
done

Redirect the output if you're happy with the result.

pamu · September 24, 2012, 6:05am

rudic:

It will not be too performant on large files, as awk always scans through the entire file!
for ((i=1;i<=5;i++))
do
echo Block: $i
awk     '/^Hi welcome/ {++fn}
   {if (fn==blockno) print}
   ' blockno=$i test
done

Hi Rudic,

Using for loop it will take too much of time and every time awk reads file from start. instead of using for loop we can use awk directlly.

try this..

awk '{if($0 ~ /^Hi welcome/){a++; s=$0}else{if($0 !~ /Bye\*\*/){if(s != "") { s=s"\n"$0}}else{s=s"\n"$0; print "Block : "a"\n"s"\n"}}}' file

another approach..

awk '{if($0 ~ /Bye\*\*/){a++;s=s"\n"$0;print "Block : "a"\n"s"\n";s=""}else{if(s){s=s"\n"$0}else{s=$0}}}' file

Lem · September 24, 2012, 6:11am

If you don't have any escape sequence in "blah blah blah", try this (made in bash):

par=""
while IFS= read line; do
par="${par}${line}\n"
if [ "$line" = "Bye**" ]; then
 par=${par%\\n}
 echo -e "${par}" >tmpfile
 cat tmpfile # or do whatever you like with it
 par=""
fi
done <inputfile

--
Bye

anushree.a · September 24, 2012, 6:35am

Dear RudiC,

Sorry, it may sound foolish but may I know where to enter input file name and what will be output file name? Also, wondering where is "Bye**" pattern is placed in the script.

RudiC · September 24, 2012, 6:42am

Sorry, the input filename I used is "test" - pls replace. The ouput is sent to stdout, you can use the redirection ">" to send it to e.g. "tempfile" or whatever filename you like. We dont need the "Bye**" pattern if the input file is structured like you posted: "Bye**" immediately followed by the next "Hi welcome".

anushree.a · September 24, 2012, 6:46am

Wow Len RudiC and Pamu,

Thanks for your efforts and prompt help which was badly needed.
Special thanks to Len, it worked exactly how I wanted it to work.

Thank you once again.
God bless you all
Take care
Anu.

RudiC · September 24, 2012, 6:46am

@pamu: I recognize this and commented on it, but the requestor asked to supply the blocks into a for loop: