awk fetch numbers after the word

ahmedwaseem2000 · July 14, 2010, 1:25pm

Hi,

I would want to fetch all the numbers after a word the number of characters could very. how can I do that?

below is the example of the data and the expected output

sample data

03 xxxx occurs 1090 times.
04 aslkja occurs 10 times.

I would want to fetch 10 & 1090 separately.

durden_tyler · July 14, 2010, 1:50pm

If "occurs" is the word in your sample data, and you want to fetch the longest string of digits after that, then -

$
$
$ cat f2
03 xxxx occurs 1090 times.
04 aslkja occurs 10 times.
$
$ awk '{x=gensub(/.*occurs ([0-9]+) .*/,"\\1",$0); print x}' f2
1090
10
$
$

tyler_durden

Franklin52 · July 14, 2010, 2:08pm

Or is this sufficient for your purpose?

awk '{print $(NF-1)}' file

anbu23 · July 14, 2010, 2:12pm

durden_tyler:

If "occurs" is the word in your sample data, and you want to fetch the longest string of digits after that, then -
$
$
$ cat f2
03 xxxx occurs 1090 times.
04 aslkja occurs 10 times.
$
$ awk '{x=gensub(/.*occurs ([0-9]+) .*/,"\\1",$0); print x}' f2
1090
10
$
$
tyler_durden

what is gensub function?

Franklin52 · July 14, 2010, 2:40pm

It's a gawk specific function:

gensub - The GNU Awk User's Guide

ahmedwaseem2000 · July 15, 2010, 7:02am

Thanks guys!! However, gensub doesnt work so, I replaced it with gsub still doesnt work. and its not always the second last field so cant use the field -1 option either. what could be other ways to this?

Franklin52 · July 15, 2010, 7:56am

gensub is gawk specific, did you use gawk?

Post a better example of your input file.

ahmedwaseem2000 · July 15, 2010, 8:08am

yes, I tried gawk and its not present. here is the example

askd sslkajdf OCCURS 10 Times.
a;lkjsfdj alkjsfd OCCURS 100 times depending on XYZ.
al;ksfjas OCCURS 10.

Franklin52 · July 15, 2010, 8:16am

Maybe something like this?

sed 's/.*OCCURS \([^ .]*\).*/\1/' file

ahmedwaseem2000 · July 15, 2010, 8:25am

yes, it works with sed. I was looking for the awk pattern as this is going to be part of other major awk script.

radoulov · July 15, 2010, 9:02am

You could use the match function:

awk '{
  wl = length(w) + 1
  if (match($0, w " *[0-9]*")) 
    print substr($0, RSTART + wl, RLENGTH - wl)
  }' w=OCCURS infile

With a slide modification, you could also handle multiple occurrences on the same line.

ahmedwaseem2000 · July 15, 2010, 10:22am

its working perfectly fine. a desect about the match statement would be very nice !!!!

rdcwayx · July 15, 2010, 8:12pm

grep -o "occurs [0-9]*" urfile |awk '{print $2}'

kurumi · July 15, 2010, 9:00pm

#!/bin/bash
while read -r LINE
do
 case "$LINE" in
  *OCCURS*)
     LINE=${LINE##*OCCURS }
     echo ${LINE%% *}
 esac
done <"file"