sed / awk to get specific word in line

erlanq · March 21, 2012, 3:15am

I have http log that I want to get words after specific "tag", this a sample line from the log:

98,POST,200 OK,www.facebook.com,Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1,/ajax/updatestatus.php?__a=1,datr=P_H1TgjTczCHxiGwdIF5tvpC; lu=Si1fMkcrU2SInpY8tk_7tAnw; c_user=728445064; xs=61%3Ab9ee26a8f2fc53efb960a6bd6c1c0042%3A0%3A1328685197; presence=EDvFA22A2EtimeF1328685268EuserFA2728445064A2EstateFDutF1328685268207EvisF1EvctF0H0EblcF0EsndF1ODiFA21014445485A2C_5dEfFA21014445485A2EuctF1328685227EsF0CEchFDp_5f728445064F15CC; p=3; act=1328685304452%2F17%3A2; _e_0Hb1_9=%5B%220Hb1%22%2C1328685304455%2C%22act%22%2C1328685304452%2C17%2C%22http%3A%2F%2Fwww.facebook.com%2Fajax%2Fupdatestatus.php%22%2C%22f%22%2C%22submit%22%2C%22wall%22%2C%22r%22%2C%22%2Fmeemelati%22%2C%7B%22ft%22%3A%7B%7D%2C%22gt%22%3A%7B%22profile_owner%22%3A%22731612557%22%2C%22ref%22%3A%22mf%22%7D%7D%2C0%2C0%2C0%2C0%2C16%5D; x-src=%2Fajax%2Fupdatestatus.php%7Cprofile_stream_composer,548,application/x-www-form-urlencoded; charset=UTF-8,0,application/x-javascript; charset=utf-8,gzip,chunked,post_form_id=a012a7a073bc1d990a6c449643ea4570&fb_dtsg=AQDnwj7O&xhpc_composerid=ux0ih_13&xhpc_targetid=731612557&xhpc_context=profile&xhpc_fbx=1&xhpc_timeline=&xhpc_ismeta=1&xhpc_message_text=Hello%20Londoners&xhpc_message=Hello%20Londoners&composertags_place=102173726491792&composertags_place_name=&composer_predicted_city=102173726491792&composer_session_id=1328685296&is_explicit_place=&composertags_city=102173726491792&disable_location_sharing=false&nctr[_mod]=pagelet_wall&lsd&post_form_id_source=AsyncRequest&__user=728445064&phstamp=16581681101191065579516,<EOH>

After awk found specific tag: "xhpc_message_text="
It will give output: "Hello Londoners" (it will remove url character encoding too, like "%20" in this sample output string)
And limit by char "&" or "&xhpc_message".

thanks, for any suggestion to solve this problem.

daWonderer · March 21, 2012, 3:23am

with awk you can use function 'index' to get position of the tag.
'substr' can be used to cut from this position to the end.
another 'index' call will find out next position of '&'.
now you have string position of your expected result.

erlanq · March 21, 2012, 3:28am

ok, I'll try that

rangarasan · March 21, 2012, 3:29am

Hi,

Try this one,

awk -F"&xhpc_message_text=" '{l=substr($2,0,match($2,"&")-1);gsub(/%20/," ",l);print l;}' file

Cheers,
Ranga:)

itkamaraj · March 21, 2012, 3:33am

you can retrieve the value using awk. but converting the URL-encoding to ascii is the task here..

 
$ nawk -F\& '{for(i=1;i<=NF;i++)if($i~/xhpc_message_text/){split($i,a,"=");print a[2]}}' test.txt
Hello%20Londoners

In perl, you can do it easily. let me know if you are interested in perl

erlanq · March 21, 2012, 3:43am

itkamaraj, thank you, I like your solution...

itkamaraj · March 21, 2012, 3:50am

$ perl -F\& -lane 'foreach(@F){if($_=~m/xhpc_message_text/){($a,$b)=split("=",$_);$b=~tr/+/ /;$b=~s/%([a-fA-F0-9]{2,2})/chr(hex($1))/eg;$b=~s/<!--(.|\n)*-->//g;print $b}}' input.txt
Hello Londoners