A script needs to be created to collect all HTTP GET requests containing a particular string

Hi friends,

A script needs to be created to collect all HTTP GET requests containing a particular string say abcd.gif in the url path along with the IP address of the client that issued the request. The source of this data is the web server logs. Also Each script execution should extract client IP address and timestamp and record this either in a file or in a DB.
Logs are like

10.252.33.251 - - [13/Jul/2012:05:17:46 -0400] "GET /keepalive.html HTTP/1.1" 200 299 
10.254.17.140 - - [13/Jul/2012:05:17:48 -0400] "GET /webapp/wcs/stores/servlet/KioskGiftRegistryMainView?catalogId=10051&langId=-1&storeId=10151 HTTP/1.1" 200 7881 
................................... 
10.252.33.251 - - [13/Jul/2012:05:20:46 -0400] "GET /keepalive.html HTTP/1.1" 200 299 
10.254.17.140 - - [13/Jul/2012:05:20:49 -0400] "GET /wcsstore/GiftRegistryStorefrontAssetStore/KioskArea/images/abcd.gif?1342171249161 HTTP/1.1" 200 799 
10.252.33.252 - - [13/Jul/2012:05:20:50 -0400] "GET /keepalive.html HTTP/1.1" 200 299 
Here we need to capture 
10.254.17.140 - - [13/Jul/2012:05:20:49 -0400] "GET /wcsstore/GiftRegistryStorefrontAssetStore/KioskArea/images/abcd.gif?1342171249161 HTTP/1.1" 200 799 

and capture IP address 10.254.17.140 and timestamp 13/Jul/2012:05:20:49 in a file

Any help will be greatly appreciated
Regards,
Surendra

 awk '/abcd.gif/{print $1$4}' access_log | tr "[" " " >> yourlogfile

That is nothing AIX related, moving.

Hi,

Thanks for your response,

Here I want to add 2 more conditions in my extract, I need to extract the lines with abcd.gif which contains HTTP GET requests, so I need to add strings HTTP and GET in conditions.
Apart from this it should be 200 request, sometimes in access logs it also comes as HTTP/1.1" 404, so effectively we have conditions to include abcd.gif, HTTP, GET, 200

10.254.17.139 - - [09/Jul/2012:09:53:03 -0400] "GET /wcsstore/GiftRegistryStorefrontAssetStore/KioskArea/images/abcd.gif?1341841982900 HTTP/1.1" 200 799
10.252.33.252 - - [09/Jul/2012:09:53:06 -0400] "GET /keepalive.html HTTP/1.1" 200 299
10.254.17.139 - - [09/Jul/2012:09:53:09 -0400] "GET /wcsstore/GiftRegistryStorefrontAssetStore/KioskArea/images/abcd.gif?1341841988900 HTTP/1.1" 200 799
10.252.33.252 - - [09/Jul/2012:09:53:11 -0400] "GET /keepalive.html HTTP/1.1" 200 299
10.254.17.139 - - [09/Jul/2012:09:53:15 -0400] "GET /wcsstore/GiftRegistryStorefrontAssetStore/KioskArea/images/abcd.gif?
1341841994901 HTTP/1.1" 200 799

Thanks again.

Regards,
Surendra

That's only a trivial modification of what you've been given, you know. Have you tried anything yourself?

awk '/HTTP/ && /GET/ && /abcd[.]gif/ { print $1$4 }' access.log | tr "[" " "