Classwork (group project)

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted!

  1. The problem statement, all variables and given/known data:

Using the example.log file in your home directory, write a bash script that will collect the following information:

  1. Date range of the log. This would be the first (oldest) and last (newest) date in the file.
  2. Total number of hits during the date range. 1 hit = 1 line
  3. Total number of unique visitors. A unique visitor = a unique IP address.
  4. Top 10 users, and the number of times they visited.
  5. Five most popular resource accessed, and the number of times they were accessed.
  6. Ten most visited URLs, and the number of time accessed.
  7. Number of visitors using Internet Explorer, broken down by version.
  8. Number of people using Firefox, broken down by version.
  9. Number of people using any other browser.

THE FORMAT OF THE LOG:

Each line in the log file is equal to one visit, or hit. The log is a tab-delimited file with 6 columns of data. It will be necessary to extract data from a specific column and break it down further. The columns in the log file are in the order below, and contain the following data:

  1. IP address of the visitor.

  2. Username of the visitor

  3. Date and time the user visited.

  4. The access method, resource, and protocol used.

  5. URL accessed by the user.

  6. The �User Agent� string containing browser and other system information from the user.

  7. Relevant commands, code, scripts, algorithms:
    � bash
    � head
    � tail
    � cut
    � tr
    � sed
    � wc
    � sort
    � uniq
    � grep
    � printf
    cut & sort) multiple times in one command chain

  8. The attempts at a solution (include all code and scripts):

 $ head -n1 example.log | cut -d'h' -f1
 $ tail -n1 example.log | cut -d'h' -f1

(These for the date ranges)

grep -i "date" example.log | wc -l

(for the top hits but we believe this isnt completely correct)

also know that the sort and awk commands can be of use but dont know how to put it into a line.

cat /path/to/example.log |awk '{print $1}' | sort |uniq -c |sort -n |tail 

(ip addresses, but not sure if correct either.)

  1. Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):

Pace university, New York, New York. United States. Professor Thomas Murphy, RH134
Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).

---------- Post updated at 10:54 AM ---------- Previous update was at 10:52 AM ----------

so it wont let me link the course =( and also wont let me link the data for this assignment. Anyone know a work around? says I need at least 5 post first

---------- Post updated at 10:55 AM ---------- Previous update was at 10:54 AM ----------

192.168.28.168	user143	[08/May/2010:09:52:52]	"GET /NoAuth/js/scriptaculous/scriptaculous.js?load=effects,controls HTTP/1.1"	"http://www.example.com/index.html"	"Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
192.168.28.168	user147	[08/May/2010:09:52:52]	"GET /NoAuth/js/prototype/prototype.js HTTP/1.1"	"http://www.example.com/index.html"	"Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
192.168.28.168	user174	[08/May/2010:09:52:52]	"GET /NoAuth/js/ahah.js HTTP/1.1"	"http://www.example.com/index.html"	"Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
192.168.28.168	user82	[08/May/2010:09:52:52]	"GET /NoAuth/js/titlebox-state.js HTTP/1.1"	"http://www.example.com/index.html"	"Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
192.168.28.168	user14	[08/May/2010:09:52:52]	"GET /NoAuth/css/validation.css HTTP/1.1"	"http://www.example.com/index.html"	"Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
192.168.28.168	user129	[08/May/2010:09:52:52]	"GET /NoAuth/js/util.js HTTP/1.1"	"http://www.example.com/index.html"	"Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
192.168.28.168	user162	[08/May/2010:09:52:52]	"GET /NoAuth/css/print.css HTTP/1.1"	"http://www.example.com/index.html"	"Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
192.168.28.168	user35	[08/May/2010:09:52:52]	"GET /NoAuth/css/web2/main-squished.css HTTP/1.1"	"http://www.example.com/index.html"	"Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
192.168.149.163	user44	[08/May/2010:09:51:30]	"GET /index.html HTTP/1.1"	"http://www.example.com/Ticket/Display.html?id=236821&results=54058c6bb77364e805a28b05cf401789"	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 1.0.3705;)"
192.168.149.163	user137	[08/May/2010:09:51:30]	"GET / HTTP/1.1"	"http://www.example.com/Ticket/Display.html?id=236821&results=54058c6bb77364e805a28b05cf401789"	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 1.0.3705;)"
192.168.149.163	user15	[08/May/2010:09:51:13]	"GET /Ticket/Display.html?id=236821&results=54058c6bb77364e805a28b05cf401789 HTTP/1.1"	"http://www.example.com/Ticket/Update.html?Action=Comment&id=236821"	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 1.0.3705;)"
192.168.149.163	user101	[08/May/2010:09:51:12]	"POST /Ticket/Update.html HTTP/1.1"	"http://www.example.com/Ticket/Update.html?Action=Comment&id=236821"	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 1.0.3705;)"
192.168.170.132	user195	[08/May/2010:09:43:52]	"GET /index.html HTTP/1.1"	"http://www.example.com/Ticket/Display.html?id=238759&results=16189b033b19ffdba5b07b0dddc11b85"	"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2)"
192.168.170.132	user43	[08/May/2010:09:43:52]	"GET / HTTP/1.1"	"http://www.example.com/Ticket/Display.html?id=238759&results=16189b033b19ffdba5b07b0dddc11b85"	"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2)"
192.168.170.132	user38	[08/May/2010:09:43:41]	"GET /Ticket/Display.html?id=238759&results=16189b033b19ffdba5b07b0dddc11b85 HTTP/1.1"	"http://www.example.com/Ticket/Update.html?Action=Comment&id=238759"	"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2)"
192.168.170.132	user192	[08/May/2010:09:43:41]	"POST /Ticket/Update.html HTTP/1.1"	"http://www.example.com/Ticket/Update.html?Action=Comment&id=238759"	"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2)"
192.168.170.132	user58	[08/May/2010:09:43:12]	"GET /Ticket/Update.html?Action=Comment&id=238759 HTTP/1.1"	"http://www.example.com/Ticket/Display.html?id=238759"	"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2)"
192.168.170.132	user11	[08/May/2010:09:43:07]	"GET /Ticket/Display.html?id=238759 HTTP/1.1"	"http://www.example.com/index.html"	"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2)"

Here is a little snippet of what the data I'm working with looks like.

  • With the sample given, the assumption for the first question on first and last line is wrong. On top, the delimiter for cut should be reconsidered.
  • Your attempt to find the top hits (whatever this means) looks for the string constant "date"; not sure this is what you want.
  • Your attempt to answer question 3 can be simplified and doesn't meet the question's kernel.

What about questions 4 - 10?

  1. Date range of the log. Hope they are sorted

Check the manual page for cut using

$ man cut

cut -f 
     -f list
             The list specifies fields, separated in the input by the field
             delimiter character (see the -d option).  Output fields are sepa-
             rated by a single occurrence of the field delimiter character.

So if we dont use -d just use -f then each word is separated by space

 $ tail -n1 exp.log | cut -f 3 

Gets the date but with a [] sign .. can we use 'tr' command now ??

That should be: by TAB character.

     -d delim
             Use delim as the field delimiter character instead of the tab character.

to use a single space: -d " "

i tested same log (copy paste) and -f option and it worked for me with space.

on FreeBSD with /usr/bin/cut

According to the man page of FreeBSD 11.0 :

     -d delim
	     Use delim as the field delimiter character instead of the tab character.

     -f list
	     The list specifies fields, separated in the input by the field delimiter character
	     (see the -d option).  Output fields are separated by a single occurrence of the
	     field delimiter character.
$ printf "foo\tbar boo \tbaz\tbae\n" | cut -f3
baz
$

--

The log file sample is TAB separated, so that is why it works in this case..

Got it ... Thanks