how to judge wether a url is valid or not using awk

rt
3ks:confused:

So how do you define a valid url?

i am using a c++ html parser to extracr links from the web pages.
but there are many abnormal url in the results.
fro exampel:
http://:http://www.g.cn
or
http://123/a.html

---------- Post updated at 11:41 AM ---------- Previous update was at 11:40 AM ----------

i am using a c++ html parser to extracr links from the web pages.
but there are many abnormal urls in the results.
for example:
http://:http://www.g.cn
or
http://123/a.html

In your first example, there are non-ASCII code in url, you will think it is not valid url, or there should not have two http in one url?

in your second example, there is no . in first // / sesson?

Are the only roles for your request?

yes, can you give me any idea .
thanks

$ cat urfile
http://:http://www.g.cn
http://www.google.com/ab.html
http://123/a.html

$ grep -E -iv "http.*http|\/\/[0-9a-z]*\/" urfile
http://www.google.com/ab.html
awk '{if ($0 ~ /^http:\/\/www*/) { print $0 ,"valid" } else { print $0 ,"invalid" }}' abc.txt


http://www.abc.com valid
http://abc.com invalid
http://123/a.html invalid
http://..:http://www.g.cn invalid
https://www.abc.com invalid

HTH,
PL