grep or awk problem, unable to extract numbers

baghera · August 30, 2007, 7:30am

Hi, I've trouble getting some numbers from a html-file. The thing is that I have several html-logs that contains lines like this:

nerdnerd, how_old_r_u:45782<br>APPLY: <hour_second> Verification succeded

This is some of what I've extracted from a html file but all I really want is the number in the middle. When using awk I get:

how_old_r_u:45782<br>APPLY:

since there is a space at each end, like a separator for awk.

And I tried using grep "[0-9]" but it only takes the whole line containing the number so I get the whole line again. Is there any command that can retreive the numbers only?

vino · August 30, 2007, 7:41am

The pattern is not very clear. But you can try

grep -oE "[[:digit:]]{1,}" input.txt

If that does not satisfy your requirement, perhaps this.

sed -n -e "s/.*:\([0-9]*\).*/\1/p" input.txt

baghera · August 31, 2007, 6:34am

vino:

The pattern is not very clear. But you can try
grep -oE "[[:digit:]]{1,}" input.txt
If that does not satisfy your requirement, perhaps this.
sed -n -e "s/.*:$[0-9]*$.*/\1/p" input.txt

But if there is more numbers on that line for example:

how_old_r_u:45782<br>APPLY:[30000,t3,t4]:Plummet

It seems when I run the command

grep -oE "[[:digit:]]{1,}" input.txt

I also get the other numbers is there some way to get only 45782?

fazliturk · August 31, 2007, 6:58am

cut -f2 -d: inputfile |sed s/[^0-9]//g

charbel · August 31, 2007, 8:10am

Is that number composed of 5 digits only?
if YES, then you can use the awk command and you can print that substring only....

code:
cat input.txt|awk 'BEGIN {FS=":"} {print substr($2,1,5)}'

this may help.....

ghostdog74 · August 31, 2007, 8:25am

no need for cat.

awk 'BEGIN {FS=":"} {print substr($2,1,5)}' input.txt

vino · August 31, 2007, 1:29pm

baghera:

But if there is more numbers on that line for example:

how_old_r_u:45782<br>APPLY:[30000,t3,t4]:Plummet

It seems when I run the command
grep -oE "[[:digit:]]{1,}" input.txt
I also get the other numbers is there some way to get only 45782?

Which is why the sed alternative was provided. Did you try that ? Does that give you what you are looking for ?

cassj · August 31, 2007, 4:42pm

Give this a shot:

sed 's/[^0-9]/\ /g;s/\  */\t/g;s/^[ \t]*//;s/[ \t]*$//;/^$/d' file.txt

I'm sure there's a more elegeant way to do this, but this seems to work okay.

# Breakdown of what does what

#1. The "sed" command itself
sed

#2. Replace everything but numbers with a space globally
's/[^0-9]/\ /g;

#3. Substitute a single tab for multiple spaces globally
s/\ */\t/g;

#4. Remove all leading and trailing white space
s/^[ \t]//;s/[ \t]$//;

#5. Delete all blank lines
/^$/d'

#6. The file to be processed.
file.txt