awk-files in a txt

There is a file that contains access's data to a web server. Each line of the file (it's big) it's :

IP Client - - [date-hour] "Command Path Protocol" code size "client software"

Example,for the first line: IP Client is 67.195.37.107, date-hour is [11/Jan/2009:04:30:58 +0200], Command Path Protocol is "Get /papers/ISO4nm.pdf 
HTTP/1.0" 200 306676" code 200 size 306676, and client software is "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"

cat data.txt

67.195.37.107 - - [11/Jan/2009:04:30:58 +0200] "GET /papers/ISO4nm.pdf HTTP/1.0" 200 306676 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"
64.56.66.112 - - [11/Jan/2009:04:31:19 +0200] "GET /aiai2009/registration.html HTTP/1.0" 200 4630 "-" Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts)"
64.56.66.112 - - [11/Jan/2009:04:31:20 +0200] "GET /index.html HTTP/1.0" 200 4045 "-" Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts)"
67.195.37.107 - - [11/Jan/2009:04:32:15 +0200] "GET /papers/ISO4nm.jpg HTTP/1.0" 200 306676 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)

I want to print the total size of the files per file type (pdf,html,jpg) using awk, sed.. There are hudnred lines I post some of them..

Help...

Start with this:

awk -F"[. ]" '{t[$11]+=$15}END{for(i in t) printf "%s %s\n",i,t}' infile
1 Like