Taking a specific value from a log file

Dear community,
I've a file contaning some logs like:

185413.854: [GC 3938735K->3100312K(4089472K), 0.0124750 secs]
185456.748: [GC 3897187K(4089472K), 0.5681710 secs]
185457.631: [GC 3940519K->3101353K(4089472K), 0.0107800 secs]
185467.213: [GC 3873521K(4089472K), 1.1164290 secs]
185468.913: [GC 3940265K->3102224K(4089472K), 0.0114570 secs]
185472.378: [GC 3369975K(4089472K), 0.4640150 secs]
185479.944: [GC 3940134K->3101807K(4089472K), 0.0154030 secs]
185482.828: [GC 3338284K(4089472K), 0.0680050 secs]
185486.855: [GC 3660673K(4089472K), 0.9342110 secs]
185490.946: [GC 3940426K->3101326K(4089472K), 0.0120510 secs]
185497.580: [GC 3649545K(4089472K), 0.7692390 secs]
185501.771: [GC 3940088K->3101499K(4089472K), 0.0123540 secs]
185501.787: [GC 3101727K(4089472K), 0.0061810 secs]
185511.343: [GC 3851751K(4089472K), 1.1011740 secs]
185513.458: [GC 3940411K->3101902K(4089472K), 0.0117240 secs]
185501.787: [GC 3101727K(4089472K), 0.0061810 secs]
185516.603: [GC 3361385K(4089472K), 0.4643180 secs]

What I need to do is extract the value in red/bold from the last line with "->". I tried:

# tail gclog.txt | grep "-" | awk -F"[>K]" '/->/{print $3}'
3101807
3101326
3101499
3101902 

But I need only 3101902

Hope is clear, thanks.
Lucas

U need only this 3101902 because it is the last entry???

$ awk 'match ($0, "->") {gsub (/^.*>|[^0-9].*$/, ""); TMP=$0} END{print TMP}' file
3101902
1 Like

Depending on your OS you could also give a try to :

tac gclog.txt | sed '/-/!d;s/.*>//;s/K.*//;q'

or

tail -r gclog.txt | sed '/-/!d;s/.*>//;s/K.*//;q'

Should be faster because starting from end of file

Note that using the tail command without additional options you assume that the line that you want to extract is in the output of your tail command which may not be the case if the number of subsequent "uninteresting" lines exceed 9.

1 Like

Both of them work perfect! Thanks RudiC and ctsgnb! :b:

# awk 'match ($0, "->") {gsub (/^.*>|[^0-9].*$/, ""); TMP=$0} END{print TMP}' $lastgc
or
# tac $lastgc | sed '/-/!d;s/.*>//;s/K.*//;q'

Btw, I believe the second one is faster, am I wrong? I'm asking that because the gclog sometime becomes huge!

That would be an interesting question. awk will need to go through the entire file to the end to be sure it picks the last number wanted. With tac , you could take the first one and quit. BUT - tac still needs to read the entire file to find the last, before last etc. lines and present them to sed .

And, if there's many a non matching trailing line, this may become slower and slower.

Can you time the solutions with a couple of input files and report back?

EDIT: I created a huge file and did the timing:

$ time   awk 'match ($0, "->") {gsub (/^.*>|[^0-9].*$/, ""); TMP=$0} END{print TMP}' file 
3101902

real    0m0.081s
user    0m0.072s
sys     0m0.008s
$ time tac file | sed '/-/!d;s/.*>//;s/K.*//;q'
3101902

real    0m0.008s
user    0m0.000s
sys     0m0.000s

Very evident, one order of magnitude difference in execution times. In fact, with strace you can see, that awk opens the file and reads reads reads

open("file", O_RDONLY)                  = 3   
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffeff21e78) = -1 ENOTTY (Inappropriate ioctl for device)
read(3, "185413.854: [GC 3938735K->310031"..., 4096) = 4096 
read(3, "570 secs]\n185472.378: [GC 336997"..., 4044) = 4044
read(3, "185490.946: [GC 3940426K->310132"..., 4096) = 4096 
read(3, "185513.458: [GC 3940411K->310190"..., 4096) = 4096 

, while tac does an lseek (... SEEK_END), (... SEEK_SET) and starts from the rear:

open("file", O_RDONLY)                  = 3
lseek(3, 0, SEEK_END)                   = 5495040
. . . 
lseek(3, 5488640, SEEK_SET)             = 5488640
read(3, "14570 secs]\n185472.378: [GC 3369"..., 6400) = 6400
. . .

EDIT 2:

tac file | awk 'match ($0, "->") {gsub (/^.*>|[^0-9].*$/, ""); print; exit}'

is as fast!

1 Like

Great RudiC, I'll use tac for sure.
And thanks again for your feedback!