Extracting the column containing URL from a text file

csim_mohan · July 16, 2014, 10:49am

I have the file like this:

Timestamp       URL                    Text                     1331635241000   http://example.com     Peoples footage at www.test.com,http://example4.com 1331635231000   http://example1.net    crack the nuts http://example6.com    1331635280000   http://example2.net    Loving this

Each column is tab separated. I need to extract only the URLs from column 2 and column 3 if in case of the no URLs then leave it empty for example to get the result like this:

URL                    Text http://example.com     www.test.com,http://example4.com  http://example1.net    http://example6.com http://example2.net

I tried this script

awk 'BEGIN {FS="\t"} {print $2,$3}' file | grep -oP '(((http|https|ftp|gopher)|mailto)[.:][^ >"\t]*|www\.[-a-z0-9.]+)[^ .,;\t>">\):]'

This script can give me the all URLS in a single column without the header. Any suggestion to resolve this.