What I want them to all be on the same line in a table to look like this:
FILENAME HITS UNIQUE HITS
food.hits 2034 245
mags.hits 2000 435
adverts.hits 1456 344
#the hits column needs to also be in a descending order as shown above
awk '{print $1}' HITS
# you need to specify the column separator because awk uses
# white space (spaces/tabs) by default , e.g.
awk -F: '{print $1}' HITS
awk '{print $2}' HITS | sort -n | wc -l
# no need for awk and sort, as it doesn't change the number of lines of data, just:
wc -l < HITS
awk '{print $2}' HITS | uniq | wc -l
# if the data is unsorted uniq does not identify matching lines, better to use:
awk -F: '{print $2}' HITS | sort -u | wc -l
Personally I would use one awk script to generate all of the results, something like:
sort -t : -k 1,1 -k 2,2 HITS | awk -F: '
# assign values to variables for readability, count a hit
{ file=$1; ip=$2; hits[file]++ }
# initialise prevfile when reading the first line
NR==1 { prevfile=file }
# if it is a new file, reset the previous IP
file != prevfile { previp="" }
# if the ip is different to the previous IP, count a unique hit
ip != previp { uniquehits[file]++ }
# save previous ip and file name for future reference
{ previp=ip; prevfile=file }
# output the results
END { for (file in hits) { print file,hits[file],uniquehits[file] } }
'
This won't output in exactly the format you wanted, you can use printf() for that, but I'll leave that part as an exercise for you!
This gives a better explanation as to what problems i have. This is kind of irrelevant now because i misinterpreted my requirements. But this still does help me. Thanks again.