Extract key words and print their values

Input file (HTTP request log file):

GET /dynamic_branding_playlist.fmil?domain=915oGLbNZhb&pluginVersion=3.2.7_2.6&pubchannel=usa&sdk_ver=2.4.6.3&width=680&height=290&embeddedIn=http%3A%2F%2Fviewster.com%2Fsplash%2FOscar-Videos-1.aspx%3Futm_source%3Dadon_272024_113535_24905_24905%26utm_medium%3Dcpc%26utm_campaign%3DUSYME%26adv%3D573900%26req%3D5006e9ce1ca8b26347b88a7.1.825&sdk_url=http%3A%2F%2Fdivaag.vo.llnwd.net%2Fo42%2Fhttp_only%2Fviewster_com%2Fv25%2Fyume%2F&viewport=42

Out put file:

domain          sdk_version
915oGLbNZhb   2.4.6.3

Thousands of logs similar to the example above, so I need to find a way to extract the value of domain&sdk_version. And the positions of domain and sdk_version are not fixed. sometimes appear in the 2 field, sometimes apprear in the last field (if split by &).

Could anyone help me in this problem? Thanks so much in advance:)

GNU sed version 4.2.1

$less samplelog 
GET /dynamic_branding_playlist.fmil?domain=915oGLbNZhb&pluginVersion=3.2.7_2.6&pubchannel=usa&sdk_ver=2.4.6.3&width=680&height=290&embeddedIn=http%3A%2F%2Fviewster.com%2Fsplash%2FOscar-Videos-1.aspx%3Futm_source%3Dadon_272024_113535_24905_24905%26utm_medium%3Dcpc%26utm_campaign%3DUSYME%26adv %3D573900%26req%3D5006e9ce1ca8b26347b88a7.1.825&sdk_url=http%3A%2F%2Fdivaag.vo.llnwd.net%2Fo42%2Fhtt p_only%2Fviewster_com%2Fv25%2Fyume%2F&viewport=42
GET /dynamic_branding_playlist.fmil?domain=915oGLbNZhb&pluginVersion=3.2.7_2.6&pubchannel=usa&sdk_ver=2.4.6.3&width=680&height=290&embeddedIn=http%3A%2F%2Fviewster.com%2Fsplash%2FOscar-Videos-1.aspx%3Futm_source%3Dadon_272024_113535_24905_24905%26utm_medium%3Dcpc%26utm_campaign%3DUSYME%26adv %3D573900%26req%3D5006e9ce1ca8b26347b88a7.1.825&sdk_url=http%3A%2F%2Fdivaag.vo.llnwd.net%2Fo42%2Fhtt p_only%2Fviewster_com%2Fv25%2Fyume%2F&viewport=42
GET /dynamic_branding_playlist.fmil?domain=915oGLbNZhb&pluginVersion=3.2.7_2.6&pubchannel=usa&sdk_ver=2.4.6.3&width=680&height=290&embeddedIn=http%3A%2F%2Fviewster.com%2Fsplash%2FOscar-Videos-1.aspx%3Futm_source%3Dadon_272024_113535_24905_24905%26utm_medium%3Dcpc%26utm_campaign%3DUSYME%26adv %3D573900%26req%3D5006e9ce1ca8b26347b88a7.1.825&sdk_url=http%3A%2F%2Fdivaag.vo.llnwd.net%2Fo42%2Fhtt p_only%2Fviewster_com%2Fv25%2Fyume%2F&viewport=42

$printf "domain sdk_version\n"; sed -e 's/.*domain=\([0-9a-zA-Z]\+\)&.*sdk_ver=\([.0-9]\+\).*/\1 \2/' -e 's/ /\n/2' samplelog

domain sdk_version
915oGLbNZhb 2.4.6.3
915oGLbNZhb 2.4.6.3
915oGLbNZhb 2.4.6.3

Correction:
This "-e 's/ /\n/2'" was left in by mistake. As noted by Scrutinizer. It is not needed.

$printf "domain sdk_version\n"; sed 's/.*domain=\([0-9a-zA-Z]\+\)&.*sdk_ver=\([.0-9]\+\).*/\1 \2/' samplelog 
sed 's/.*domain=\([^&]*\).*sdk_ver=\([^&]*\).*/\1 \2/' infile

--
Note: \+ is GNU sed only (but it is not needed here, as * would suffice because of greedy matching) and \n cannot be used in the replacement part in standard sed (nor is it needed in this case), you would need an actual linefeed preceded with an escape character ( \ )..

1 Like