Find line containing string in a file.

cbreiny · November 17, 2010, 2:14am

Hello. I have a large file that contains a lot of gibberish and also a lot of http addresses. How can i read the file, take out the http addresses, and write each one of them on one line each into another file?

It looks something like this.

gibberishgibberishgibberishgibberishgibberish"http://yadayada.com"gibberishgibberishgibberishgibberish"http://yadayada2.com"gibberishgibberish

I just want to know how to extract the http addresses from the file, and write them to another file.

Thanks in advance.

Scrutinizer · November 17, 2010, 2:22am

Do you have grep -o ?

grep -o 'http://[^"]*' infile

Ygor · November 17, 2010, 4:03am

GNU grep...

grep -o 'http://[^"]*' file1

Scrutinizer · November 17, 2010, 4:25am

awk '/http:/' RS=\" infile

michaelrozar17 · November 17, 2010, 4:29am

similarly by sed..

sed '/http/s/.*\(http[^"]*\).*\(http[^"]*\).*/\1\n\2/g' inputfile > outfile

Scrutinizer · November 17, 2010, 4:38am

@michael. That works if there are exactly two URL's on a line. For a universal solution sed needs two stages, for example:

sed 's/http:/\n&/g' infile | sed -n '/http:/s/".*//p'

or

sed 's/"/\n"\n/g' infile | sed -n '/http:/p'

michaelrozar17 · November 17, 2010, 5:03am

hmm... yes. you are right. Thank you

durden_tyler · November 17, 2010, 9:10am

Or with Perl -

perl -lne 'print $1 while(/"(http:.*?)"/g)' your_file

tyler_durden