Extract pattern from text

Hi all,

I got a txt here and I need to extract all D 8888 44 and D 8888 43 + next field

=",g("en")];f._sn&&(f._sn= "og."+f._sn);for(var n in f)l.push("&"),l.push(g(n)),l.push("="),l.push(g(f[n]));l.push("&emsg=");l.push(g(d.name+":"+d.message));var m=l.join("");Ea(m)&&(m=m.substr(0,2E3));c=m;var r=window.gbar.logger._aem(a,c);ia(r)}}catch(z){}}var Ea=function(a){return 2E3c?Math.max(0,a.length+c):c;c(function(){var a=function(f){for(var g=f.parentElement,d=null,e=0;ewindow.gbar&&gbar.eli&&gbar.eli()Google+SearchImagesMapsPlayYouTubeNewsGmailMoreDriveCalendarTranslateBooksShoppingBloggerFinancePhotosVideosDocsEven more �Account OptionsSign inSearch settingsWeb Historywindow.gbar&&gbar.elp&&gbar.elp()�AllImagesVideosNewsShoppingMapsBooks._Bu,._Bu a:link,._Bu a:visited,a._Bu:link,a._Bu:visited{color:#808080}._kBb{color:#61C}.ellip{overflow:D 8888 43 BBBBBBBBBBBBBB sis;white-space:nowrap}Search OptionsAny countryCountry: the UKAny timePast hourPast 24 hoursPast weekPast monthPast yearAll resultsVerbatim7  | xxxxxxxxxxxxxxxxxxxxxx/2016/03/xxxxxxxxxxxl-19032016.htmlCached15 hours ago ... D 8888 44 AAAAA4FFBBBBBB ; Y OptionsAny country D 8888 44 
CCCCCCCCCCCCCC 
inkOpt,a._Bu:visited{c ink,a._Bu:visited{c D 8888 43 EEEEEEEEEEEEEE
OptionsAny country D 8888 43 
FFFFFFFFFFFFFFFFF

It should look like this after

D 8888 43 BBBBBBBBBBBBBB
D 8888 44 AAAAA4FFBBBBBB
D 8888 44 CCCCCCCCCCCCCC
D 8888 43 EEEEEEEEEEEEEE
D 8888 43 FFFFFFFFFFFFFFFFF

Thank you very much

For now I tried

cat txt | sed 's/D/\n/g' | grep "^ 8888" | awk '/8888/ { print;getline;print}'
 8888 43 BBBBBBBBBBBBBB sis;white-space:nowrap}Search OptionsAny countryCountry: the UKAny timePast hourPast 24 hoursPast weekPast monthPast yearAll resultsVerbatim7 | xxxxxxxxxxxxxxxxxxxxxx/2016/03/xxxxxxxxxxxl-19032016.htmlCached15 hours ago ...
 8888 44 AAAAA4FFBBBBBB ; Y OptionsAny country
 8888 44
 8888 43 EEEEEEEEEEEEEE
 8888 43
 8888 43

Are those <new line>s real or just artefacts due to you NOT using code tags? If artefacts, try

grep -o "D 8888 4[43] [^ ]*" file4 
D 8888 43 BBBBBBBBBBBBBB
D 8888 44 AAAAA4FFBBBBBB
D 8888 44 CCCCCCCCCCCCCC
D 8888 43 EEEEEEEEEEEEEE
D 8888 43 FFFFFFFFFFFFFFFFF

Why is the EEEEE line missing in your output sample?

1 Like

Nearly

grep -o "D 8888 4[43] [^ ]*" txt
D 8888 43 BBBBBBBBBBBBBB
D 8888 44 AAAAA4FFBBBBBB
D 8888 44
D 8888 43 EEEEEEEEEEEEEE
D 8888 43

2 are missing

try:

awk '$1=$1' OFS="\n" infile | awk 'l ~ /D 8888 4[34] ./ {sub(".*D 8888 4", "D 8888 4", l) ;print l; l="";} {l=l $1 " ";}
END {if (l ~ /D 8888 4[34] ./) {sub(".*D 8888 4", "D 8888 4", l) ;print l;}}'
1 Like

Superb works awesome thanks ever so much

Got Perl?

perl -0ne 'while(/(D\s8{4}\s4[43])\s(\w+)/g){print "$1 $2\n"}' stinkefisch.input
D 8888 43 BBBBBBBBBBBBBB
D 8888 44 AAAAA4FFBBBBBB
D 8888 44 CCCCCCCCCCCCCC
D 8888 43 EEEEEEEEEEEEEE
D 8888 43 FFFFFFFFFFFFFFFFF