Grep in regex

Hello guys,
Here i am writing a script in bash to check for a valid URL from a file using regex
This is my input file

http://www.yahoo.commmmmm
http://www.google.com
https://www.gooogle.co
www.test6.co.in
www.gmail.com
www.google.co
htt://www.money.com
http://eeeess.google.com
https:/ww.test.c.in

#my script

URL=$(grep -E -o   "^(http(s)?://)?+(w{3}\.)+([a-z0-9]{1,64}\.)+\w{2,3}" $path )

What my output is:

http://www.yahoo.com
http://www.google.com
https://www.gooogle.co
www.test6.co.in
www.gmail.com
www.google.co

here it is trimming the htttp://www.yahoo.commmmmm
Help me out from this

You asked it to trim to at most 3 with {2,3} .

I think you are over-simplifying the issue. I don't think that there is no way for certain to know if names exist with a regular expression. You cannot just assume that the last part of a domain name (the Top Level Domain) is a 2 or three characters only.

List of Internet top-level domains - Wikipedia

You might have to trim out the domain name from the full URL & perform a get to the real site to see if you connect. That might be the only way.

  • There may be a formal list of names of the TLDs
    [list]
  • Each of those may have a list of valid names below them
    [list]
  • Each of those may have a list of valid names below them
    [list]
  • Each of those may have a list of valid names below them
    [list]
  • Each of those may have a list of valid names below them ..............
    [/list]

    [/list]

    [/list]

    [/list]

You can see the problem. The list (if you could even build one) would be huge and would be frequently updating. Perhaps a DNS query would give you enough though.

host $extracted_domain_name >/dev/null
if [ $? -eq 0 ]
then
   echo "DNS entry exists"
else
   echo "It is an invalid domain"
fi

Does that help?

Robin