Extract pattern from text line

vampirodolce · October 14, 2008, 7:26am

Gents,
from these sample lines:

ZUCR.MI ZUCCHI SPA RISP NC 2,5000 6 ott 0,0000
ZV.MI ZIGNAGO VETRO 3,6475 16:36 Up 0,0075

is it possible to get this:

ZUCR.MI 2,5000
ZV.MI 3,6475

i.e. the first field, a separator and the first decimal number?
(in Europe we use commas for decimal numbers, so �1.000,00 = one thousand).

Regards.

jim_mcnamara · October 14, 2008, 7:30am

awk '{ printf ("%s ",$1)
         for(i=2; i<=NF; i++)
           { if(index($i,"<")>0) {print $i} }
       } ' inputfilename

danmero · October 14, 2008, 7:54am

awk '{printf ("%s ",$1);for(i=2;++i<=NF;){if(int($i)>0){print $i;next}}}' file

vampirodolce · October 14, 2008, 8:24am

Thank you. The second command seems to work but not for all the instances. For example, the last lines in the source file are:

VVE.MI VIAGGI DEL VENTAGLIO 0,2500 13 ott Up 0,0088
XPR.MI EXPRIVIA 0,7653 13 ott Up 0,1210
ZUC.MI ZUCCHI SPA 1,4010 13 ott Up 0,0010
ZUCR.MI ZUCCHI SPA RISP NC 2,5000 6 ott 0,0000
ZV.MI ZIGNAGO VETRO 3,6500 13 ott Up 0,0100

and the output of the command is:

VVE.MI 13
XPR.MI 13
ZUC.MI 1,4010
ZUCR.MI 2,5000
ZV.MI 3,6500

So in the first 2 cases the output is the day of the month (13) and not the price (0,2500 or 0,7653).

joeyg · October 14, 2008, 8:31am

Perhaps not elegant, but it appears to work...

> sample="ZUCR.MI ZUCCHI SPA RISP NC 2,5000 6 ott 0,0000"
> s2=`echo $sample | cut -d" " -f1` ; s3=`echo $sample | cut -d" " -f2- | tr -d "[:alpha:]" | tr -s " " | tr " " "\n" | grep "," | head -1 | tail -1` ; echo $s2 $s3 
ZUCR.MI 2,5000
>

joeyg · October 14, 2008, 8:48am

> cat file10
VVE.MI VIAGGI DEL VENTAGLIO 0,2500 13 ott Up 0,0088
XPR.MI EXPRIVIA 0,7653 13 ott Up 0,1210
ZUC.MI ZUCCHI SPA 1,4010 13 ott Up 0,0010
ZUCR.MI ZUCCHI SPA RISP NC 2,5000 6 ott 0,0000
ZV.MI ZIGNAGO VETRO 3,6500 13 ott Up 0,0100

> cat calc_file10 
while read sample
   do
   s2=`echo $sample | cut -d" " -f1`  
   s3=`echo $sample | cut -d" " -f2- | tr -d "[:alpha:]" | tr -s " " | tr " " "\n" | grep "," | head -1 | tail -1` 
   echo $s2 $s3 
done <file10

> calc_file10 
VVE.MI 0,2500
XPR.MI 0,7653
ZUC.MI 1,4010
ZUCR.MI 2,5000
ZV.MI 3,6500
>

radoulov · October 14, 2008, 8:58am

perl -nle'$,=" ";print/(\S*).*?(\d+,\d*)/' filename

awk '{
  for (i=1; i<=NF; i++)
    if ($i ~ /^[0-9][0-9]*,/) {
	  print $1, $i
	  break
	  }
  }' filename

vampirodolce · October 14, 2008, 9:46am

That's the simplest solution and it definitely seems to work. One question though... the whole script (still partial actually) includes a lot of text manipulation, plenty of 'grep', 'tr', 'head', 'tail', and of course several pipes. The source data is downloaded from Yahoo finance via 'lynx -dump' and does not go beyond 250 lines. After dumping the 5 web pages, the CPU usage goes up to 100% for almost a minute (on a PIV, 3GHz). I didn't think this task was supposed to be so CPU-intensive for a modern PC. To be honest I still haven't had a chance to test it under linux, I am using a Win2K workstation and cygwin. What is your opinion on this?

tayyabq8 · October 14, 2008, 10:16am

Why don't you try radoulov's solutions, both solutions are more efficient.

vampirodolce · October 14, 2008, 11:17am

You're right, it took less than one second How is this possible??