Multiple Grep command

tara123 · October 25, 2014, 4:20pm

Hi,

I have an url.txt I need check them and grep some data.

url.txt

domain.com
domain2.com
domain3.com
.....

All sites urls have in source this patterns:

"web=pattern1"
"net++pattern2"
"office**pattern3"

I need this output:

domain.com: pattern1,pattern2,pattern3
domain2.com: pattern1,pattern2,pattern3

If there is no a pattern:

domain.com: pattern1,zero,pattern3
domain2.com: pattern1,pattern2,zero

sea · October 25, 2014, 6:05pm

What have you tried so far?

tara123 · October 25, 2014, 8:04pm

I can not for multiple url and multiple pattern. Thanks.

my code:

wget -q www.domain.com -O - | grep -o -E -m 1 '"web=([^"#]+)"' | cut -d'=' -f2

RudiC · October 26, 2014, 8:00am

Well, give this a try:

wget -i url.txt -O - |
awk     '/<(link rel=\"canonical\"|base) href/  {if (L++) {for (i=1; i<=3; i++)
                                                                {printf "%s%s", DL, P?P:"zero"; DL=","}
                                                                 printf "\n"
                                                          }
                                                 delete P; DL=""
                                                 gsub (/href="http:\/\/|\/"\/*>/, ""); printf "%s: ", $NF
                                                } 
         match ($0, /"web=[^"]*"/)              {P[1]=substr($0,6,RLENGTH-6)}
         match ($0, /"net++[^"]*"/)             {P[2]=substr($0,7,RLENGTH-7)}
         match ($0, /"office\*\*[^"]*"/)        {P[3]=substr($0,10,RLENGTH-10)}
         END                                    {for (i=1; i<=3; i++)
                                                        {printf "%s%s", DL, P?P:"zero"; DL=","}
                                                 printf "\n"}
        '

and report back. Finding the domain from an html file might be trickier than assumed in above.