Grep expression between double quotes

I need a quick expression to be able to pull out all the data in a text file that looks like "http:// some random url etc" So it should grab any string that begins with "http:// and ends with " There are other double quotes in the file but I only want the ones that start with "http:// and the closing quotes for each incident of that.

---------- Post updated at 04:00 PM ---------- Previous update was at 03:46 PM ----------

I should mention the double quotes are actually in the file, I wasn't adding them myself.

How about:

grep -o '"http://[^"]*"' infile

Yes, perfect. Thank you!

---------- Post updated 12-18-09 at 08:32 AM ---------- Previous update was 12-17-09 at 07:30 PM ----------

Could someone please explain why it worked? I thought [^"] would mean to grab a string that begins with a double quote. I see also the dbl quotes enclosed in single quotes. I would love to hear a rundown of how it all worked though.

[^"] means "any character other than a quote", just like [^q9] means "any character other than q or 9". [^"]* means "zero or more characters that aren't quotes", and so [^"]*" means "any number of non-quotes followed by a quote".

You may want to google 'regular expresssion'. IF you want to become proficient in unix, regex. as it is called is a very important tool. It has spilled over into Windows programming in the past few years as well.

You may use egrep or grep as well and result here will be the same.

egrep ""http//.*"" 
or
egrep "\"http//.*\"" 
or
egrep '"http//.*"'

:D:p:D:):smiley:

This is not correct. You may want to search for "greedy matching", and look up the -o option. Also your quoting alternatives will prohibit the regex from being properly evaluated.

This one works perfectly i think
grep -o '"http://.*"' input_file
can anyone give a failing case?

No that will only work if there is one address per line and if there are no further "-characters on that line. You may want to search for "greedy matching".

Give more explanation, and some examples for good response....

Sure:

$ echo '"http://a.b.c" blablabla "http://c.d.e"' |grep -o '"http://.*"'
"http://a.b.c" blablabla "http://c.d.e"

$ echo '"http://a.b.c" blablabla "http://c.d.e"' |grep -o '"http://[^"]*"'
"http://a.b.c"
"http://c.d.e"

$ echo '"http://a.b.c" blablabla ""' |grep -o '"http://.*"'
"http://a.b.c" blablabla ""

$ echo '"http://a.b.c" blablabla ""' |grep -o '"http://[^"]*"'
"http://a.b.c"

If you do not limit greedy matching grep will try to find the longest match possible, hence the use of [^"] instead of .

Scrutinizer you are wrong because see below:-
Iam using SunOS server2 5.10 Generic_118833-36 sun4u sparc SUNW,Netra-210

bash-3.00$ echo '"http://abc" blablabla "http://cvd"' | grep '"http://.*"'
o/p:- "http://abc" blablabla "http://cvd"

and when using the 2nd way:-

bash-3.00$ echo '"http://abc" blablabla "http://cvd"' | grep '"http://[^"]*"'

o/p:- "http://abc" blablabla "http://cvd"

both ways are acting greedy.

Aaiaz, that is a peculiar conclusion given that I posted my output above that proved my point.

Besides: you left out the -o option! Without it you always print the entire line.

-o option is not supported in grep command in my OS "SUN Solaries".

Hi aaiaz, if your grep does not support the -o option then you have to either download a grep that can or use a sed/awk/shell script. It can not be accomplished with a regular grep statement.

ok thanks man
:b::):):):slight_smile:

---------- Post updated at 04:31 AM ---------- Previous update was at 04:30 AM ----------

but how can I achieve this using sed or awk. it still not direct reach.
:confused::confused::confused:

Hi aaiaz, it was a bit complicated, but I've come up with this as a grep -o replacement:

sed 's|"http://[^"]*"|&\n|g' infile | sed -n 's|.*\("http://[^"]*"\).*|\1|p'

-or-

pat='"http://[^"]*"'
sed "s|$pat|&\n|g" infile | sed -n "s|.*\($pat\).*|\1|p"

Or use Perl:

perl -nle'print$1while m|("http://[^"]*")|g' infile

Scrutinizer it is not working:-

bash-3.00$ echo '"http://a.b.c" blablabla "http://b.b.f"' | sed 's|\"http://[^\"]*\"|&\n|' | sed -n 's|.*\("http://[^"]*"\).*|\1|p'

o/p
"http://b.b.f"

:(:(:frowning:

and radoulov unfortunately I do not have perl.:frowning:

Ha! :slight_smile: A Solaris machine without a Perl interpreter?

Try with nawk:

nawk -F\" '{ 
  for (i=0; ++i<=NF;)
    if ($i ~ /^http:\/\//) print FS $i FS
  }' infile

All my examples assume the urls do not span more than one line, otherwise you'll need something different.

---------- Post updated at 12:07 PM ---------- Previous update was at 11:58 AM ----------

Or:

nawk  '/^http:/&&$0=RS$0RS' RS=\" infile