Hi, I've trouble getting some numbers from a html-file. The thing is that I have several html-logs that contains lines like this:
nerdnerd, how_old_r_u:45782<br>APPLY: <hour_second> Verification succeded
This is some of what I've extracted from a html file but all I really want is the number in the middle. When using awk I get:
how_old_r_u:45782<br>APPLY:
since there is a space at each end, like a separator for awk.
And I tried using grep "[0-9]" but it only takes the whole line containing the number so I get the whole line again. Is there any command that can retreive the numbers only?
vino
August 30, 2007, 7:41am
2
The pattern is not very clear. But you can try
grep -oE "[[:digit:]]{1,}" input.txt
If that does not satisfy your requirement, perhaps this.
sed -n -e "s/.*:\([0-9]*\).*/\1/p" input.txt
vino:
The pattern is not very clear. But you can try
grep -oE "[[:digit:]]{1,}" input.txt
If that does not satisfy your requirement, perhaps this.
sed -n -e "s/.*:\([0-9]*\).*/\1/p" input.txt
But if there is more numbers on that line for example:
how_old_r_u:45782<br>APPLY:[30000,t3,t4]:Plummet
It seems when I run the command
grep -oE "[[:digit:]]{1,}" input.txt
I also get the other numbers is there some way to get only 45782?
cut -f2 -d: inputfile |sed s/[^0-9]//g
Is that number composed of 5 digits only?
if YES, then you can use the awk command and you can print that substring only....
code:
cat input.txt|awk 'BEGIN {FS=":"} {print substr($2,1,5)}'
this may help.....
charbel:
Is that number composed of 5 digits only?
if YES, then you can use the awk command and you can print that substring only....
code:
cat input.txt|awk 'BEGIN {FS=":"} {print substr($2,1,5)}'
this may help.....
no need for cat.
awk 'BEGIN {FS=":"} {print substr($2,1,5)}' input.txt
vino
August 31, 2007, 1:29pm
7
baghera:
But if there is more numbers on that line for example:
how_old_r_u:45782<br>APPLY:[30000,t3,t4]:Plummet
It seems when I run the command
grep -oE "[[:digit:]]{1,}" input.txt
I also get the other numbers is there some way to get only 45782?
Which is why the sed alternative was provided. Did you try that ? Does that give you what you are looking for ?
cassj
August 31, 2007, 4:42pm
8
Give this a shot:
sed 's/[^0-9]/\ /g;s/\ */\t/g;s/^[ \t]*//;s/[ \t]*$//;/^$/d' file.txt
I'm sure there's a more elegeant way to do this, but this seems to work okay.
# Breakdown of what does what
#1 . The "sed" command itself
sed
#2 . Replace everything but numbers with a space globally
's/[^0-9]/\ /g;
#3 . Substitute a single tab for multiple spaces globally
s/\ */\t/g;
#4 . Remove all leading and trailing white space
s/^[ \t]//;s/[ \t] $//;
#5 . Delete all blank lines
/^$/d'
#6 . The file to be processed.
file.txt