Looking to improve the output of this awk one-liner

I have the following awk one-liner I came up with last night to gather some data. and it works pretty well (apologies, I'm quite new with awk, and don't know how to format this pretty-printed). You can see the output with it.

 awk '{if ($8 == 41015 && $21 == "requests") arr["Requests "$1" "substr($2,0,5)]+=$20;if ($8 == 41015 && $22 == "requests") arr["Requests "$1" "substr($2,0,5)]+=$21;else if ($8 == 41100) arr["Deletes "$1" "substr($2,0,5)]+=1;if ($8 == 41015) arr["Batches "$1" "substr($2,0,5)]+=1};END{for (i in arr) print i,arr}'  example.log 

My input file for this example: (this is uniq'd but there are 75 lines total)

07/19/13 07:50:27.890   D CN M Proxy   MGR_PROXY    41100  @BOX-98765  Manager DoD with THING asset 1508772769 of home 7793004 found in 0 retries
07/19/13 07:50:28.247   I CN M Proxy   USER_OPER    41015  @12345  Schedule recording request: THING recording requested asset 7474656, channel XXX, 60 requests
07/19/13 07:53:04.319   I CN M Proxy   USER_OPER    41015  @54321  Schedule recording request: THING recording requested asset 61263854, channel XX HD, 1 requess
07/19/13 07:53:04.319   I CN M Proxy   USER_OPER    41015  @54321  Schedule recording request: THING recording requested asset 61263854, channel XX HD, 1 requests

and my output is:

Batches 07/19/13 07:50 25
Requests 07/19/13 07:50 1500
Deletes 07/19/13 07:50 25
Batches 07/19/13 07:53 25
Requests 07/19/13 07:53 25

The code logic is pretty simple:

 * Check 8th column which denotes log line type
 * If 41015 (a recording request) 
    + increment up the batch counter by one. 
    + find the column with the number of requests and increment the requests counter by that value
 * if 41100
    + increment the deletion counter up one.

My primary objective is to format the output as a CSV that I can just send off as a report like this (the headers are illustrative, and I'm not looking to actually print them out...unless I can). :

#Date,Time,Reqs,Bats,Dels
07/19/13,07:50,1500,25,25
07/19/13,07:53,24,25,

My secondary objective is to clean up the code. For example, having to check the 8th column twice for 41015 to increment both counters seems wasteful.

Any advice is welcome, but please keep in mind this is my first time doing anything more complex than awk '{print $2,$4,$8}' file, so I'd appreciate explanations as well as code snippets.

That's only a 'one-liner' because the line refuses to wrap in code tags :wink: Better to break it where it matters and see what you're doing. I like two liners, three liners.

I have no idea where you're pulling that 1500 from, so I'll assume your output is unrelated.

awk 'BEGIN { SUBSEP=","; OFS="," }
        { sub(/:[^:]*$/, "", $2); } # Strip the seconds off the time
        $8 == 41015 { BATS[$1,$2]++ ; D[$1,$2]++ ; REQS[$1,$2] += $(NF-1) }
        $8 == 41100 { DELS[$1,$2]++; D[$1,$2]++ }
        END {
                print "#Date,Time,Reqs,Bats,Dels"
                for(X in D) print X, REQS[X]+0, BATS[X]+0, DELS[X]+0 }' inputfile
1 Like

You can use profiling gawk to pretty print your awk program. Check the gawk manual for more details:

man gawk

I added few lines to your code to generate desired output, make any adjustments if required:

awk '
        BEGIN {
                print "#DateTime,Reqs,Bats,Dels"
        }
        {
                if ($8 == 41015 && $21 == "requests") {
                        arr["Requests," $1 " " substr($2, 0, 5)] += $20
                }
                if ($8 == 41015 && $22 == "requests") {
                        arr["Requests," $1 " " substr($2, 0, 5)] += $21
                } else {
                        if ($8 == 41100) {
                                arr["Deletes," $1 " " substr($2, 0, 5)] += 1
                        }
                }
                if ($8 == 41015) {
                        arr["Batches," $1 " " substr($2, 0, 5)] += 1
                }
        }


        END {
                for (i in arr) {
                        n = split ( i, V, "," )
                        if ( V[1] == "Batches" )
                                B[V[2]] = arr
                        if ( V[1] == "Requests" )
                                R[V[2]] = arr
                        if ( V[1] == "Deletes" )
                                D[V[2]] = arr
                        T[V[2]]
                }
                for ( k in T )
                        print k,R[k],B[k],D[k]

        }
' OFS=, example.log
1 Like

Yeah, one of the downsides of only learning bits and pieces of different languages is I never learned how to construct a proper program, so everything is a one-liner to me. It's kind of fun though.

The 1500 is because like I said, my input is uniq'd and there's really 75 lines (or 18 of each line). So, where it's saying

awk '{if ($8 == 41015 && $21 == "requests") arr["Requests "$1" "substr($2,0,5)]+=$20;if ($8 == 41015 && $22 == "requests") arr["Requests "$1" "substr($2,0,5)]+=$21; [...]

It's adding up the column with the number of requests which in the full input file adds up to 1500.

Either way, your code looks pretty sweet and I'm going to try to wrap my head around it after I've had a chance to rest. Thanks, as always Corona688. As always, just when I think i'm getting good, you come through and kick me back to the kids table :).