How to extract some parts of a file to create some outfile

Hi All,
I am very new in programming. I need some help.
I have one input file like:

Number of disabled taxa: 9
Loading mapping file: ncbi.map
Load mapping:
taxId2TaxLevel: 469951
--- Subsample reads (20%): 66680 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Not assigned: 1445
No hits: 220253
+++++++++++End of summary for file: B-Red-sum.txt
--- Subsample reads (20%): 67037 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890
+++++++++++End of summary for file: B-Red-sum.txt

::::: and so on

I want to create some output like:
Out file1.txt(which grep from, next line of "Taxonomy:" upto "+++++++++++End" ) with no space in front of line and so on.

So the desired ouput will be:
outfile1.txt
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Not assigned: 1445
No hits: 220253

outfile2.txt
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890

and so on.

Can anybody please help me in this matter?

I tried with some code like this. But didn't workout.
--------------------------------------------------------------------------
#!/bin/tcsh
if $#argv != "1" then
echo "Usage: process-file-script 1st-output-file-as-inputfile"
exit 0
endif

FIL_NM=$1

str=""
cat $FIL_NM | while read LINE
do
if [ "`echo $LINE | awk '{print $1}'`" = "+++++++++++Begin" ] ; then
n=1
c=1
fi
if [ "`echo $LINE |grep Gamma`"] ; then
NEW_FIL_NM=$FIL_NM"_"$n.txt"
fi

fi
if [ "`echo $LINE | awk '{print $1}'`" = "+++++++++++End" ] ; then
n=0
fi
done
--------------------------------------------------------
Please help...
Many thanks in advance...
Best wishes,
Mitra

nawk '
    /^Taxonomy/ {p=6;close(out);out="output" ++cnt ".txt";next}
    p &&p-- { print > out }' myInputFile

if you have Python, here's an alternative solution

f=0;i=0
for line in open("file"):
    line=line.strip()
    if line.startswith("+++++++++++"): 
        f=0
        o.close()
    if "Taxonomy:" in line: 
        f=1;i=i+1
        o=open("out_"+str(i)+".txt","w")
    if f:
        print >>o, line

Hallo ghostdog74,
Thanks for your reply. But I am sorry to say that I forgot to mention : in my input file there are not always only 6 lines. I just copied some lines.. This lines varies from 100 to 200. So it is necessary for the program to read +++++++++End.

Thanks a lot,
Mitra.

And here's a perl solution:

$
$
$ cat input.txt
Number of disabled taxa: 9
Loading mapping file: ncbi.map
Load mapping:
taxId2TaxLevel: 469951
--- Subsample reads (20%): 66680 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Not assigned: 1445
No hits: 220253
+++++++++++End of summary for file: B-Red-sum.txt
--- Subsample reads (20%): 67037 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890
+++++++++++End of summary for file: B-Red-sum.txt
::::: and so on
$
$
$
$ perl -ne '{$/=""; $i=1;
>   while (/^Taxonomy:.(.*?)\+{11}/msgi) {
>     open(OUT,">outfile".$i++.".txt"); print OUT $1; close(OUT);
>   }}' input.txt
$
$
$ cat outfile1.txt
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Not assigned: 1445
No hits: 220253
$
$
$ cat outfile2.txt
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890
$
$

tyler_durden

nawk '
   /^Taxonomy/ {p++;close(out);out="output" ++cnt ".txt";next}
   /^[+]+End/ { p=0}
   p { print > out }' myInputFile

well, i am not sure i get you, but i see other solutions include "End', therefore if you are sure that ++++++++ is not unique, you can add "End"

....
if line.startswith("+++++++++++End"): 
....

Hallo durden_tyler,
your perl code works. Thanks a lot. But there is still one problem.
As I told in my input file there are several amount of spaces in front desired lines.
Is there any possibility to get rid of these space directly?
Now it is giving:

mitra:~ mitra$ cat outfile1.txt
          Gammaproteobacteria: 2767
       Alphaproteobacteria: 4123
         Deltaproteobacteria: 1343
                         Epsilonproteobacteria: 26
     Betaproteobacteria: 397
                        unclassified Proteobacteria: 48
                  Spirochaetes (class): 15
        Nitrospira (class): 1
        Bacilli: 25
  Not assigned: 1445
  No hits: 220253

Thank you very much for your help.
Best Wishes,
Mitra.

Sorry, I don't know why all the spaces disappears here. But there are several spaces (not equal for all lines)in front of desired lines.

Hallo ghostdog74,
I will try with this modification. If it works.
Thank you very much.
Best,
Mitra.

Here's one way to do it:

perl -ne '{$/=""; $i=1;
  while (/^Taxonomy:.(.*?)\+{11}/msgi) {
    $x = $1; $x =~ s/(^|\n)\s+/\1/g;
    open(OUT,">outfile".$i++.".txt"); print OUT $x; close(OUT);
  }}' input.txt

Testing on sample data:

$ 
$ cat input.txt
Number of disabled taxa: 9
Loading mapping file: ncbi.map
Load mapping:                 
taxId2TaxLevel: 469951        
--- Subsample reads (20%): 66680 of 334386
Processing: tree-from-summary             
Running tree-from-summary algorithm       
Taxonomy:                                 
    Gammaproteobacteria: 2767             
Alphaproteobacteria: 4123                 
  Deltaproteobacteria: 1343               
     Epsilonproteobacteria: 26            
 Not assigned: 1445                       
    No hits: 220253                       
+++++++++++End of summary for file: B-Red-sum.txt
--- Subsample reads (20%): 67037 of 334386       
Processing: tree-from-summary                    
Running tree-from-summary algorithm
Taxonomy:
      Gammaproteobacteria: 2809
  Alphaproteobacteria: 4001
        Deltaproteobacteria: 1208
    Epsilonproteobacteria: 15
Not assigned: 299
    No hits: 461890
+++++++++++End of summary for file: B-Red-sum.txt
::::: and so on
$
$ perl -ne '{$/=""; $i=1;
  while (/^Taxonomy:.(.*?)\+{11}/msgi) {
    $x = $1; $x =~ s/(^|\n)\s+/\1/g;
    open(OUT,">outfile".$i++.".txt"); print OUT $x; close(OUT);
  }}' input.txt
$
$
$ cat outfile1.txt
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Not assigned: 1445
No hits: 220253
$
$ cat outfile2.txt
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890
$
$

Hope that helps,
tyler_durden

____________________________________________________________
"This is your life and it's ending one minute at a time."

The spaces disappear here because you do not enclose your file data or code within the "code" tags. (Notice how the actual code posted by the forum members has a nice little box around it with the title "Code:" at the top.)

If you sandwich the desired text within "code" tags, without any space between "code", "]", "[" and "/" :

[ code ] <your_text_here> [ / code ]

then the leading spaces will be preserved.

Alternatively, if you are feeling lazy to actually type the "code" tags, then you can do this -
(a) select the desired text, and
(b) click on the "#" icon in your Message Box right above the response area
The dynamic script associated with the web page will put the "code" tags for you.

HTH,
tyler_durden

____________________________________________________________
"This is your life and it's ending one minute at a time."

Hallo durden_tyler,
At first I want to thank you for your help. Thanks a lot...I am very new in scripting. Can you please explain the filed (.*?)\+{11}/msgi) for your code in my thread help?

Actually I am trying to learn. So it will be really helpful. And one more question How can I make this script executable.

My try was:
#!/usr/bin/perl -w

$#ARGV==1 or die "Usage: 2ndprocess-script 1st-output-file-as-inputfile\n";

$input=shift;

perl -ne '{$/=""; $i=1;
while (/^Taxonomy:.(.*?)\+{11}/msgi) {
$x = $1; $x =~ s/(^|\n)\s+/\1/g;
open(OUT,">outfile".$i++.".txt"); print OUT $x; close(OUT);
}}' $1;

-----------------------------
which didn't work.
Can you please help me to learn this?
Thank you very much once again.
Have anice time.
Best wishes,
Mitra

Hallo durden_tyler,
At first I want to thank you for your help. Thanks for the help in writing also. Now I can use that.Thanks a lot...I am very new in scripting. Can you please explain the filed (.*?)\+{11}/msgi) for your code in my thread help?

Actually I am trying to learn. So it will be really helpful. And one more question How can I make this script executable.

My try was:

-----------------------------
which didn't work.
Can you please help me to learn this?
Thank you very much once again.
Have anice time.
Best wishes,
Mitra

if you want to use Perl, here's another version more "understandable" as there's less of regular expression.

$i=0;
while (<>){
 chomp;
 if (/\+*End of summary for file/ ){
    $f=0;close(FH);next;
 }    
 if (/Taxonomy:/ ) { 
     open(FH,">>","output_".$i++) or die "Cannot open for writing:$!\n";
     $f=1; next;
 }
 if ($f) { 
    s/^\s+//g; #get rid of spaces in front
    print FH $_."\n";
  }
}

to use the script,

# perl myscript.pl file

Dear ghostdog74,
My main problem is I am very new in programming. I am trying to learn. So I am not habituated with either perl or python. Both are new to me. Can you please help me to understand how should I make this files executable, like a script? In case of other reply also, when I use the code directly in the terminal then it works, but in all the cases, still I am unable to make these as an executable script with a given input file like $1.
Can you or anyone else please help me in this matter?
Thanks a lot for your help.
With best regards,
Mitra.

below perl code should help you some.

open $fh,"<","a.txt";
my ($flag,$n)=(0,0);
while(<$fh>){
	if(/Taxonomy:/){
		$n++;
		$file=sprintf("outfile%s.txt",$n);
		open FH,"+>$file";
		$flag=1;
		next;
	}
	if(/\++/){
		$flag=0;
		next;
	}
	print FH $_ if $flag==1;
}		

Dear All,
Thanks for your replies, codes and advices.
My main problem is I am very new in programming. I am trying to learn. So I am not habituated with either perl or python. Both are new to me. Can anybody please help me to understand how should I make this files executable, like a script, which I can call afterwords? Suppose if I call the script like code.perl or code.anything else
Everytime I want to give ./code.perl input.txt
My 1st try was:

#!/usr/bin/perl -w

$#ARGV==1 or die "Usage: 2ndprocess-script 1st-output-file-as-inputfile\n";

$name=shift;

$inputfile="`pwd`/$name";

perl -ne '{$/=""; $i=1;
  while (/^Taxonomy:.(.*?)\+{11}/msgi) {
    $x = $1; $x =~ s/(^|\n)\s+/\1/g;
    open(OUT,">outfile".$i++.".txt"); print OUT $x; close(OUT);
  }}' inputfile;

and 2nd try was:

#!/usr/bin/perl -w

$#ARGV==1 or die "Usage: 2ndprocess-script 1st-output-file-as-inputfile\n";

$name=shift;

$inputfile="`pwd`/$name";

open $fh,"<", $inputfile;
my ($flag,$n)=(0,0);
while(<$fh>){
	if(/Taxonomy:/){
		$n++;
		$file=sprintf("outfile%s.txt",$n);
		open FH,"+>$file";
		$flag=1;
		next;
	}
	if(/\++/){
		$flag=0;
		next;
	}
	print FH $_ if $flag==1;
}

But both of them didn't work in a desired way.
Can anybody please help me?
With best regards and many thanks,
Mitra.

ghostdog74,
Thank you for your help. Your last help for the script works. but still it produces files will spaces in front of lines. How I can get rid of the spaces.
The output looks like
mitra:testNextPart mitra$ more output_0

  Gammaproteobacteria: 2767
        Alphaproteobacteria: 4123
          Deltaproteobacteria: 1343
          Epsilonproteobacteria: 26
        Betaproteobacteria: 397
        unclassified Proteobacteria: 48
          Elusimicrobium: 2
        candidate division WWE1: 9
          Flavobacteria: 2358
          Sphingobacteria: 136
          Bacteroidia: 162
          environmental samples: 21
          Chlorobia: 77
        Planctomycetacia: 40
        Spirochaetes (class): 15
        Nitrospira (class): 1
        Bacilli: 25
  Not assigned: 1445
  No hits: 220253

Sorry to disturb you again and again.
Thanks a lot.
With best regard,
Mitra.

Dear All,
I was trying like below to get rid off the space in front of the line(see the previous post).

#!/usr/bin/perl -w

$#ARGV==0 or die "Usage: 2ndprocess-megan-script 1st-output-file-as-inputfile\n";

$i=1;
while (<>){
chomp;
   
 if (/Taxonomy:/ ) { 
     $x = $1; $x =~ s/^\s+|\s+$//g;
     open(OUT,">>","output_".$i++) or die "Cannot open for writing:$!\n";
     $f=1; next;
 }
 
 if (/\+*End of summary for file/ ){
    $f=0;close(OUT);next;
 }
 if ($f) { print OUT $_."\n";}
}

But its not working.

Can anybody please help me to have the out put in the form:
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Betaproteobacteria: 397
unclassified Proteobacteria: 48
Elusimicrobium: 2
candidate division WWE1: 9
Flavobacteria: 2358
Sphingobacteria: 136
Bacteroidia: 162
environmental samples: 21
Chlorobia: 77
Planctomycetacia: 40
Spirochaetes (class): 15
Nitrospira (class): 1
Bacilli: 25
Not assigned: 1445
No hits: 220253

Thanks a lot.
Best,
Mitra.