awk fetch numbers after the word

Hi,

I would want to fetch all the numbers after a word the number of characters could very. how can I do that?

below is the example of the data and the expected output

sample data

03 xxxx occurs 1090 times.
04 aslkja occurs 10 times.

I would want to fetch 10 & 1090 separately.

If "occurs" is the word in your sample data, and you want to fetch the longest string of digits after that, then -

$
$
$ cat f2
03 xxxx occurs 1090 times.
04 aslkja occurs 10 times.
$
$ awk '{x=gensub(/.*occurs ([0-9]+) .*/,"\\1",$0); print x}' f2
1090
10
$
$

tyler_durden

Or is this sufficient for your purpose?

awk '{print $(NF-1)}' file

what is gensub function?

It's a gawk specific function:

gensub - The GNU Awk User's Guide

1 Like

Thanks guys!! However, gensub doesnt work so, I replaced it with gsub still doesnt work. and its not always the second last field so cant use the field -1 option either. what could be other ways to this?

gensub is gawk specific, did you use gawk?

Post a better example of your input file.

yes, I tried gawk and its not present. here is the example

askd sslkajdf OCCURS 10 Times.
a;lkjsfdj alkjsfd OCCURS 100 times depending on XYZ.
al;ksfjas OCCURS 10.

Maybe something like this?

sed 's/.*OCCURS \([^ .]*\).*/\1/' file

yes, it works with sed. I was looking for the awk pattern as this is going to be part of other major awk script.

You could use the match function:

awk '{
  wl = length(w) + 1
  if (match($0, w " *[0-9]*")) 
    print substr($0, RSTART + wl, RLENGTH - wl)
  }' w=OCCURS infile

With a slide modification, you could also handle multiple occurrences on the same line.

1 Like

its working perfectly fine. a desect about the match statement would be very nice !!!!

grep -o "occurs [0-9]*" urfile |awk '{print $2}'
#!/bin/bash
while read -r LINE
do
 case "$LINE" in
  *OCCURS*)
     LINE=${LINE##*OCCURS }
     echo ${LINE%% *}
 esac
done <"file"