In this line here is we only need to consider the components marked in BOLD above so basically: /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh : is called the URL
200: is called the response code.
h=abcdefgh : is called the query string.
I am trying to write a script which does the following:
1.) Count of each URL which have a count of 10000 or greater than 10000 that have resulted in a non successful response code( basically a non � 200, 206 or 304 response code) and do not contain the following patterns in the URL : '/F200%5E*', '/F0%5E*' and '/F100%5E*'
2.) Count of each URL excluding the query string with 800 characters in length and do not contain the following patterns in the URL : '/F200%5E*', '/F0%5E*' and '/F100%5E*'
Sometimes, for speed, I break up the sed and use a long pipe of mixed sed and grep to speed tings up and multiprocess, as the last 5 lies of this sed are essentially "grep -v". Putting the best eliminator first speeds things up. For many gzipped files, in bash and /dev/fd/# UNIX's, you can go parallel dividing the files into (#cores x 2) lists (assuming 50% i/o bound processing) and replacing the first 'sort' with: