Print Range Only Once Per File

svermill · July 25, 2011, 12:26pm

Scenario:

Each of several .txt files contain the following (but perhaps with some minor variations due to code version running on the devices from which the text was extracted):

<output omitted>
SWITCH1#show proc cpu hist
                                                              
    1     111111111111111     11111     11111     11111     11
    3999995555511111333336666622222666661111144444111116666633
100                                                           
 90                                                           
 80                                                           
 70                                                           
 60                                                           
 50                                                           
 40                                                           
 30                                                           
 20       *****                                               
 10 *****************************************     ************
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5    
               CPU% per second (last 60 seconds)
                                                              
    1111111111111111111111111111111121111721111111111111671111
    5384363853456143653472465347484512537703585444536337004347
100                                                           
 90                                                           
 80                                      *                    
 70                                      *               *    
 60                                      *              **    
 50                                      *              **    
 40                                      #              **    
 30                                      #              **    
 20 * *  * **  **   **  *  **  * * ** * *#* ***   * *  **#   *
 10 ##########################################################
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5    
               CPU% per minute (last 60 minutes)
              * = maximum CPU%   # = average CPU%
                                                                          
    7222227227777287222375286232282222172272262687777227223226272272227226
    0280027203063046112658409100100011990171091324474418151608247111003502
100                                                                       
 90                                                                       
 80       *    *  **    *  *     *     *  *     *  *   *                  
 70 *     *  **** **    *  **    *     *  *  *  *****  *     * *  *   *   
 60 *     *  **** **    ** **    *     *  *  * ******  *     * *  *   *  *
 50 *     *  **** **    ** **    *     *  *  * ******  *     * *  *   *  *
 40 *     *  **** **   *** **    *     *  *  * ******  *     * *  *   *  *
 30 * *   *  **** **   *** ** *  *     *  *  * ******  * *** * ** *   ** *
 20 **********************************************************************
 10 ######################################################################
   0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
             0    5    0    5    0    5    0    5    0    5    0    5    0 
                   CPU% per hour (last 72 hours)
                  * = maximum CPU%   # = average CPU%

SWITCH1#show run
Building configuration...
<output omitted>

I wanted to parse the CPU section out of each file, so I tried something like this:

awk '/^100/,/^Building/' *.txt

Unfortunately, a line beginning with "100" occurs again not too far down in the file from the first occurrence. "Building" only occurs once and thus I get my first snippet and then most of the rest of the output following the second instance of my first string.

Ideally I would end on the second "average CPU%" (not sure why there aren't three) but that looked to be a challenge for me so I thought maybe I could live with just matching on a line beginning with "Building" to end the range, and then figure out a way to not print that line or the one preceding it. Not pretty but I was trying to break the problem down into manageable pieces. At the very least, can someone please explain how to match a range one time and then print nothing further from that individual file?

As always, many thanks!

(Note: I see that spacing didn't survive the pasting process but I think the intent is still clear.)

shamrock · July 25, 2011, 12:58pm

how about good old ex...

ex -s +'/^100/,/^Building/-3 p | q!' file

svermill · July 25, 2011, 1:25pm

Not familiar with ex but here's what I saw on my Mac:

$ ex -s +'/^100/,/^Building/-3 p | q!' Switch1.txt

^Z
[2]+ Stopped ex -s +'/^100/,/^Building/-3 p | q!' Switch1.txt

So basically no output until I did 'CNTL-Z'. Thanks, though!

shamrock · July 25, 2011, 2:24pm

Does switching to a heredoc work...

ex -s in_file <<EOF > out_file
/^100/,/^Building/-3 p
wq
EOF

svermill · July 25, 2011, 9:30pm

Well, this certainly has promise. My out file looks pretty good:


100                                                           ^M
 90                                                           ^M
 80                                                           ^M
 70                                                           ^M
 60                                                           ^M
 50                                                           ^M
 40      *****                                                ^M
 30      *****                         *****                  ^M
 20 **********************************************************^M
 10 **********************************************************^M
   0....5....1....1....2....2....3....3....4....4....5....5....^M
             0    5    0    5    0    5    0    5    0    5    ^M
               CPU% per second (last 60 seconds)^M
                                                              ^M
    2324333323252323332323232333232323233323232323332323232333^M
    6567054485676464636365646445667575764674656365646566678765^M
100                                                           ^M
 90                                                           ^M
 80                                                           ^M
 70                                                           ^M
 60            *                                              ^M
 50    *       *                                              ^M
 40  * * *   * *    *    *     * * * * * *   *   **  * * * ***^M
 30 ***********#**********************************************^M
 20 ##########################################################^M
 10 ##########################################################^M
   0....5....1....1....2....2....3....3....4....4....5....5....^M
             0    5    0    5    0    5    0    5    0    5    ^M
               CPU% per minute (last 60 minutes)^M
              * = maximum CPU%   # = average CPU%^M
                                                                          ^M
    3333333733364333333333553333333533373343333333353333344644484447555655^M
    7788787188682777678686297797778078668826576668686789922047709892020101^M
100                                                                       ^M
 90                                                                       ^M
 80                                    *                       *          ^M
 70        *   *                       *                       *   *      ^M
 60        *   *           *           *           *       *   *   *   *  ^M
 50        *   *          **       *   *           *       * *************^M
 40 **********************************************************************^M
 30 **********************************************************************^M
 20 ######################################################################^M
 10 ######################################################################^M
   0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.^M
             0    5    0    5    0    5    0    5    0    5    0    5    0 ^M
                   CPU% per hour (last 72 hours)^M
                  * = maximum CPU%   # = average CPU%^M

I'm just slowly coming up to speed on basic awk, though, and know nothing about ex or even heredocs. I was hoping to use some of what I've learned of awk to add some additional detail (such as printing the second field of the 'hostname Switch1' line from the in file or the in file name itself (which equals the hostname, actually, so it's six of one/half dozen of the other)). Is that same sort of thing possible here? Is it as easy as invoking "FILENAME" in awk?

Thank so much!

---------- Post updated at 01:04 PM ---------- Previous update was at 12:47 PM ----------

Just realized too that I don't seem to be able to do *.txt as my in file. I have to repeat this over a couple of hundred .txt files on an ongoing basis. Is there an equivalent to that with this approach?

Thanks again!

---------- Post updated at 03:06 PM ---------- Previous update was at 01:04 PM ----------

I was thinking about this over lunch. Maybe the range approach is not what I want. Maybe I want to match the first line that begins with "100" and print that line. Print every line thereafter until I reach the first line that ends with "CPU%." Print that line too. Continue to print every line until I once again find a line that ends with "CPU%." Print that line and exit, move to next file. I wonder if that's actually simpler than trying to solve this with the range function?

---------- Post updated at 05:59 PM ---------- Previous update was at 03:06 PM ----------

So the simplest - and in retrospect best - solution was simply to print the range in between the two 'show' commands in the input files:

	ls *.txt | xargs awk '/show proc cpu hist/,/show run/'

All that I need to do now is how to figure out how to print everything in between those two lines but not including those two lines. And I need to get the host or file name printed for each iteration (which I'm pretty confident I can figure out).

---------- Post updated at 07:30 PM ---------- Previous update was at 05:59 PM ----------

OK, I just need one final piece. I realize that this is probably very inelegant, but it almost works and I've once again lost an entire day to a single script!

#!/bin/ksh
ls *.txt | xargs awk '
  		  
    FILENAME != last {
    fn = FILENAME;
    gsub( ".txt", "", fn );
    last = FILENAME;
    }
        
    /show proc cpu hist/,/show run/ { print $0 };
  		
' | awk '!/show proc cpu hist/&&!/show run/' > "CPU History"

The above includes a piece given to me by agama yesterday to strip off the ".txt" extension from FILENAME. However, I haven't been able to figure out how to use what's left in the way that I want. Prior to each block of CPU history, I'd like to print "FILENAME:" beforehand. In yesterdays' case, I was building a .csv, so I wanted the filename on every single line of output. In this case I'm not building a CSV and will import the text directly into a report. So I'd like to figure out how to print the filename plus ":" once and then the CPU history block. The closest I've come was to actually awk out $2 of 'hostname Switch1' but that comes in the input file after the CPU history block, so it was printing below instead of above. And for some reason my ":" was actually printing on a new line below the hostname?? Is this more difficult than it sounds or should I be able to accomplish this easily?

Thanks.