cbreiny
November 17, 2010, 2:14am
1
Hello. I have a large file that contains a lot of gibberish and also a lot of http addresses. How can i read the file, take out the http addresses, and write each one of them on one line each into another file?
It looks something like this.
gibberishgibberishgibberishgibberishgibberish"http://yadayada.com"gibberishgibberishgibberishgibberish"http://yadayada2.com"gibberishgibberish
I just want to know how to extract the http addresses from the file, and write them to another file.
Thanks in advance.
Do you have grep -o ?
grep -o 'http://[^"]*' infile
Ygor
November 17, 2010, 4:03am
3
GNU grep...
grep -o 'http://[^"]*' file1
awk '/http:/' RS=\" infile
similarly by sed..
sed '/http/s/.*\(http[^"]*\).*\(http[^"]*\).*/\1\n\2/g' inputfile > outfile
@michael . That works if there are exactly two URL's on a line. For a universal solution sed needs two stages, for example:
sed 's/http:/\n&/g' infile | sed -n '/http:/s/".*//p'
or
sed 's/"/\n"\n/g' infile | sed -n '/http:/p'
hmm... yes. you are right. Thank you
Or with Perl -
perl -lne 'print $1 while(/"(http:.*?)"/g)' your_file
tyler_durden