Awk output issues.

amatuer_lee_3 · May 15, 2008, 11:06pm

I have the follwing code:

awk '{print $1}' HITS                                   #Searches HITS file column one. Column one is filenames

awk '{print $2}' HITS  | sort -n | wc -l            #Searches HITS file and sorts numerically and outputs line count. column 2 is IP addresses

awk '{print $2}' HITS | uniq | wc -l                # Searches HITS file for unique entries and outputs line count. column 2 is IP addresses

i know my code is not right and my results are not listed how i want it. they are just displayed one after the other on seperate lines.

like this:

hits/adverts.hits:248.204.125.183
hits/mags.hits:87.114.172.31
hits/adverts.hits:34.220.19.30
hits/food.hits:185.227.145.86
hits/food.hits:213.225.8.140
hits/mags.hits:83.222.98.178
hits/food.hits:118.195.119.35
10345
245

What I want them to all be on the same line in a table to look like this:

FILENAME                    HITS                       UNIQUE HITS
food.hits                   2034                            245
mags.hits                   2000                            435
adverts.hits                1456                            344

#the hits column needs to also be in a descending order as shown above

ilan · May 16, 2008, 12:04am

may be you should provide input data which gives clarity on your requirement.

-ilan

amatuer_lee_3 · May 16, 2008, 4:02am

There wont be any input data i just want the function to display the results.

Annihilannic · May 16, 2008, 8:18pm

Did you know it's spelt 'amateur'?

Some comments regarding your existing code:

awk '{print $1}' HITS

# you need to specify the column separator because awk uses 
# white space (spaces/tabs) by default , e.g.

awk -F: '{print $1}' HITS

awk '{print $2}' HITS  | sort -n | wc -l

# no need for awk and sort, as it doesn't change the number of lines of data, just:

wc -l < HITS

 awk '{print $2}' HITS | uniq | wc -l

# if the data is unsorted uniq does not identify matching lines, better to use:

awk -F: '{print $2}' HITS | sort -u | wc -l

Personally I would use one awk script to generate all of the results, something like:

sort -t : -k 1,1 -k 2,2 HITS | awk -F: '
        # assign values to variables for readability, count a hit
        { file=$1; ip=$2; hits[file]++ }
        # initialise prevfile when reading the first line
        NR==1 { prevfile=file }
        # if it is a new file, reset the previous IP
        file != prevfile { previp="" }
        # if the ip is different to the previous IP, count a unique hit
        ip != previp { uniquehits[file]++ }
        # save previous ip and file name for future reference
        { previp=ip; prevfile=file }
        # output the results
        END { for (file in hits) { print file,hits[file],uniquehits[file] } }
'

This won't output in exactly the format you wanted, you can use printf() for that, but I'll leave that part as an exercise for you!

amatuer_lee_3 · May 16, 2008, 8:23pm

Thanks very much. Yeah i do know its spelt wrong. I noticed after i clicked confirm on my username. just cant be bothered to change it.

can i refer you to my recent post:

http://www.unix.com/shell-programming-scripting/65529-using-uniq-awk.html

This gives a better explanation as to what problems i have. This is kind of irrelevant now because i misinterpreted my requirements. But this still does help me. Thanks again.