script to open the specified url in a browser from a text file

gsp · August 22, 2007, 5:13am

Hi All,

here i am struc

prvnrk · August 22, 2007, 6:10am

Could you be more specific like what do you want to achieve by opening URL in the browser? and also which browser do you want to open this URL?

If you want to download something from URL, you could use wget without any browser onto your box.

gsp · August 22, 2007, 10:16am

I am having a text file which has url's in it.

so i want to write a script where it gets the url from the text file and opens the url if it is correct and existing, if not it should comment the one which is not existing.

example:
now i saved a file by name slist, which contains data as

http://yahoommail.com/

http://gmmaaiill.com/ and so on.

from this slist file which has url's , it should open each of the url's and check whether they are valid and are opening, if not valid (may be some 404 page not found errors are any problem) then those url's should be commented out in the slist file as shown below.
#htp:/gmail.com/

#http://yahoommail.com/
http://gmail.com/
#http://gmmaaiill.com/

i tried as

wget -i ./sitelist

and it displayed as,
$ wget -i ./sitelist
------------------------------------------------------------------------
./sitelist: Invalid URL htp:/gmail.com/: Unsupported scheme
--07:04:31-- http://yahoomail.com/
=> `index.html'
Resolving yahoomail.com... 216.109.112.135, 66.94.234.13
Connecting to yahoomail.com|216.109.112.135|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://mail.yahoo.com/ [following]
--07:04:32-- http://mail.yahoo.com/
=> `index.html'
Resolving mail.yahoo.com... 209.73.168.74
Connecting to mail.yahoo.com|209.73.168.74|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: Yahoo [following]
--07:04:32-- Yahoo
=> `login_verify2?&.src=ym'
Resolving login.yahoo.com... 209.73.168.74
Connecting to login.yahoo.com|209.73.168.74|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ &lt;=&gt;                                 ] 26,138        --.--K/s

07:04:32 (44.43 MB/s) - `login_verify2?&.src=ym' saved [26138]

--07:04:32-- http://gmail.com/
=> `index.html'
Resolving gmail.com... 72.14.253.83, 64.233.171.83, 64.233.161.83
Connecting to gmail.com|72.14.253.83|:80... connected.
HTTP request sent, awaiting response... 302 Found
Cookie coming from gmail.com attempted to set domain to google.com
Location: http://mail.google.com/mail/ [following]
--07:04:32-- http://mail.google.com/mail/
=> `index.html'
Resolving mail.google.com... 209.85.147.19, 209.85.147.18, 209.85.147.83
Connecting to mail.google.com|209.85.147.19|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://www.google.com/accounts/ServiceLogin?service=mail&passive=true
&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3Dl&l
tmpl=default&ltmplcache=2 [following]
--07:04:33-- https://www.google.com/accounts/ServiceLogin?service=mail&passive=
true&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3
Dl&ltmpl=default&ltmplcache=2
=> `ServiceLogin?service=mail&passive=true&rm=false&continue=http:%2F
%2Fmail.google.com%2Fmail%2F?ui=html&zy=l&ltmpl=default&ltmplcache=2'
Resolving Google... 72.14.253.103, 72.14.253.99, 72.14.253.147, ...
Connecting to www.google.com|72.14.253.103|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16,131 (16K) [text/html]

100%[====================================>] 16,131 --.--K/s

07:04:33 (110.83 KB/s) - `ServiceLogin?service=mail&passive=true&rm=false&contin
ue=http:%2F%2Fmail.google.com%2Fmail%2F?ui=html&zy=l&ltmpl=default&ltmplcache=2'
saved [16131/16131]

--07:04:33-- Gmmail.com
=> `index.html'
Resolving gmmail.com... 206.207.87.4
Connecting to gmmail.com|206.207.87.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ &lt;=&gt;                                 ] 32,276       172.51K/s

07:04:34 (172.20 KB/s) - `index.html' saved [32276]

--07:04:34-- http://yahoommail.com/
=> `index.html.1'
Resolving yahoommail.com... 216.109.112.135, 66.94.234.13
Connecting to yahoommail.com|216.109.112.135|:80... connected.
HTTP request sent, awaiting response... 404
07:04:34 ERROR 404: (no description).

FINISHED --07:04:34--
Downloaded: 74,545 bytes in 3 files

------------------------------------------------------------------------

So please suggest to get the following output

prvnrk · August 22, 2007, 1:27pm

Grep for "Invalid URL" and "Host not found" (better, take cases in consideration) and you get the results for all wrong sites.

Another issue you have here, is that sites like Gmmail.com and http://yahoommail.com/ DO EXIST, but in your case these have to be considered as wrong ones.. For such popular sites, maintain valid URLs in a file and check against to.

cbkihong · August 22, 2007, 11:45pm

If you use Perl to do it, it would require a simple while() loop and a call to LWP module to get the HTTP headers. It is likely more reliable as it does not need to parse the "messy" output of wget or any other similar command line tools.

gsp · August 23, 2007, 2:53am

Hi cbkihong,

to say i am not good in perl script, i am just a beginner, but i can understand the script written in perl with little bit of difficulty.....

so will u let me know the script .........(may be in perl as u told), then if it is possible suggest me how can i get rid of it using shell script.

my intention is read urls from a file (what ever url's it may be) but the url's which are not opening i.e which displays 404 , page not found error and so on should be commented in the file with # symbol.

for the correct url's : nothing to be done(except script should validate whether the url is working or not)

for the wrong urls's : script should place a # symbol infront of the url in a file
example: # http://yahoommail.com/

Kindly help me out with a script...... pls

cbkihong · August 23, 2007, 6:09am

I believe the practice of this forum is not to encourage excessive spoon feeding, instead, to give you pointers for you to explore on your own, to give you an opportunity to learn.

The HEAD HTTP method is frequently used instead of GET to get the headers without downloading the message body. That seems to fit what you want. But you can use GET if you like.

Look at this for example:

I think it should be enough to get you going about the HTTP part. For the file writing part, you should read some Perl documentation for file I/O for further information.

Please try to assemble something on your own first and post for advice before asking for more code given. Thanks.