The efficiency between GREP and SED???

ever · September 13, 2010, 9:51am

Hello Everyone!

I am a newbie. I'd like to get key lines from a big txt file by Reg Exp, The file is nearly 22MB.

GREP or SED?which may be the best choice,more efficient way?

or any other best practise?

Thank you in advance.

Ever:)

fpmurphy · September 13, 2010, 10:00am

Either will work. There is not much difference in speed if you write an efficient regular expression.

Corona688 · September 13, 2010, 1:15pm

Unless you need to process lots of them or process within a short time limit, 22 megs really isn't that big these days anyway.

joeyg · September 13, 2010, 1:19pm

I would typically use GREP for your request.
In my mind:
GREP is to select out lines based on criteria
SED is a string editor to change lines

durden_tyler · September 13, 2010, 2:47pm

So, just for kicks, I created a file of size 22 MB approx., consisting of random characters laid out in 129 cols X 180224 rows.

$
$ wc testfile_22mb.txt
  180224   180224 23248896 testfile_22mb.txt
$
$

Sample data -

$
$ head testfile_22mb.txt
v^zEh6p7qQn($_%8UMl@\)z3rZy#b"JJtf;n"S]#}^y^i3X(5U$7xv7OWU"e/):\f>kZGjL5OnMQ5)a9d?T@MSU9.8{Dye\CSJZbB#El_GSM*AY=pbiXX1Wf%Jj>:Zru
eyL(/Y^=5TyG(8d4hHl"T;.c4`(v7[L#4Jjf!-`i80/5cC[T'}D\Q\4Pv<.Xq}&%9@\g%P"i9|8lOS$Ge`sOV9vytDvj'2JoWif2;tBI1O"0isnG0UCb_NUG:n{|`{Wi
9fN.^iYE?}q9ol#b0MUDctWnV>U_JFWhlzug!tglJPq,t!nzw'l_MEYZd:YF|jc:-z`3&8"E^thk!IWE'd.g_-Cf'x]dqhE9S#2`L0bEPiyC3Bjf[^8rsTjeB>v9Q357
i8FHk4>TyTKzB<*=!nJ7\DyXe4;[?:u9yN"aP9Q3AE}b>%@q%}d0?\(uh29NTICsYE>izM*-J;vjieLh}`fx<XTgHY'RAH\5aD="H9$eHy7GCcZrM4kufnvh&XQNl7wt
S_Yf)AJtpw\m\\'XU`pJn3r+L(,+v3Y(u6OC%cX#dQH<}K:pb{PyWm:Kmf#2Uq1L{ox=:[w(FnX+lDm+ge&qy++$c)&RS+>;:&P!_^"Ubci[>uKC!HG^iiWY.C^}U&q<
l4Nx%\[Mkm:3`m).+70lmi|IQY|J6-jjyEh\,`\C?X%4z2kO/4UqgElA\^im%%9WLQw[O1,>;8`+-@j"\*ND/-a`c*P$LiU%V82Ocl;x)iK@&tJmKMNxMoH>D@7)jX.$
q&Io2SODUy;ERj=Stbk0,Yx^^Qf{9$[|?13=9Z<=x?fk61%E`t9dClB]}ZGc#xinEZ`U1U@vq8')fSQ&u{G;H+wg3@=F,FQ'p1d=(0MyBo/rX\ej+YGVDM(|S4/,Plw`
zJo9#Qc`t}Rf1NoC4;,@8*Cp_P=!%p86>i34{}2u?FGCkE/)R9!BxQSJ]?m4OMqj|@Xxk64ytR^F`{.kmT5I;PLAwOBxRu@xD#Q9:06_<YhUy'Kr6}VG}-2`E`:Ge^Bn
BT*%'!tLV`wX]qo2<1Q0cfv+=;UrWSj,2G-1j7z97Sy9aj93|M2X{|}mCNs[MZ^"Qs%Wt]CO/j?CyTlo]&>\6gxR^=S|c9.6G}y3m;32[j3e.0f\(pC9n9FN`LO;N)T$
AXY"pLblBaA8ztKWIXGK#|6N?REvu%F53;G$3n:JJI7kx>Q9<h29}^CnvmO<!?=*9LI?|:L^Fd{=U8^f^`)ej@|D0Ifp`G(R5=Hx6z!T/'>d3pf^vD1zG@BN29d'i&`t
$
$

And I tested the existence of pattern "XYZ" with grep, sed, awk and Perl.
Time taken, in descending order is -

$
$
$ time perl -ne 'print if /XYZ/' testfile_22mb.txt 1>/dev/null
real    0m0.982s
user    0m0.015s
sys     0m0.015s
$
$ time sed -n '/XYZ/p' testfile_22mb.txt 1>/dev/null
real    0m0.771s
user    0m0.624s
sys     0m0.062s
$
$ time awk '/XYZ/' testfile_22mb.txt 1>/dev/null
real    0m0.300s
user    0m0.186s
sys     0m0.031s
$
$ time grep "XYZ" testfile_22mb.txt 1>/dev/null
real    0m0.152s
user    0m0.077s
sys     0m0.015s
$
$

A few microseconds less than if I had actually printed the results; each case matches and prints 31 lines.

So, not much difference if done once. But it will add up if you are doing this a gazillion times inside a loop.

tyler_durden

frank_rizzo · September 13, 2010, 8:40pm

just a minor correction. sed is a stream editor.