How to extract only relevant part from a sentence?

Hello All,

I have a file with details such as below. How do i extract only the
host and port ?

eg: dbs.ads.com 1521

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbs.ads.com)(PORT=1521))(CONNECT_DATA=(SID=vug)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=captain.adsoe.com)(PORT=1521))(CONNECT_DATA=(SID=x10)))
(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = 11.69.21.37)(PORT = 1521)))(CONNECT_DATA = (SERVICE_NAME = o901.ads.com)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ap60.as.com)(PORT=1521))

TIA,
John

Hello JohnJacobChacko,

Following may help you in same, it will work if your Input_file has data same format as per shown sample input.

awk '{gsub(/.*HOST/,X,$0);gsub(/\)\).*/,X,$0);gsub(/\)\(/," ",$0);gsub(/PORT| = |=/,X,$0);print $0}'  Input_file

Output will be as follows.

dbs.ads.com 1521
captain.adsoe.com 1521
11.69.21.37 1521
ap60.as.com 1521
 

EDIT: Adding a sed solution for also.

sed -e 's/.*HOST\| = \|=\|)).*//g;s/)(PORT/ /;' Input_file

Thanks,
R. Singh

May this help you

[akshay@localhost tmp]$ cat file
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbs.ads.com)(PORT=1521))(CONNECT_DATA=(SID=vug)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=captain.adsoe.com)(PORT=1521))(CONNECT_DATA=(SID=x10)))
(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = 11.69.21.37)(PORT = 1521)))(CONNECT_DATA = (SERVICE_NAME = o901.ads.com)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ap60.as.com)(PORT=1521))
[akshay@localhost tmp]$ awk '/HOST/{h=$2}/PORT/{p=$2}h && p{print h,p; h=p=""}' FS='='  RS=')' file
dbs.ads.com 1521
captain.adsoe.com 1521
 11.69.21.37  1521
ap60.as.com 1521

When I tried RavinderSingh13's sed suggestion with the supplied sample input, I got the output:

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbs.ads.com =1521))(CONNECT_DATA=(SID=vug)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=captain.adsoe.com =1521))(CONNECT_DATA=(SID=x10)))
(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = 11.69.21.37  = 1521)))(CONNECT_DATA = (SERVICE_NAME = o901.ads.com)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ap60.as.com =1521))

His suggestion might work with GNU sed , but it doesn't work with the sed on OS X. You might want to try:

sed 's/.*HOST *= *\([.[:alnum:]]*\).*PORT *= *\([[:digit:]]*\).*/\1 \2/' file

which, with a standard sed produces the output:

dbs.ads.com 1521
captain.adsoe.com 1521
11.69.21.37 1521
ap60.as.com 1521

With a GNU sed , the following should work:

sed --posix 's/.*HOST *= *\([.[:alnum:]]*\).*PORT *= *\([[:digit:]]*\).*/\1 \2/' file

An alternative to sed or awk.

cat johnjacobchacko.file
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbs.ads.com)(PORT=1521))(CONNECT_DATA=(SID=vug)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=captain.adsoe.com)(PORT=1521))(CONNECT_DATA=(SID=x10)))
(DESCRIPTION =(ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = 11.69.21.37)(PORT = 1521)))(CONNECT_DATA = (SERVICE_NAME = o901.ads.com)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ap60.as.com)(PORT=1521))
Has nothing of Possible interest.
Here is another line with nothing of interest.
perl -nle '/H\w+\W+(.+)\)\(P\w+\W+(\d+)/ and print "$1 $2";' johnjacobchacko.file
dbs.ads.com 1521
captain.adsoe.com 1521
11.69.21.37 1521
ap60.as.com 1521
awk -F'[)]|[ \t]*=[ \t]*' '/HOST/{h=$2} /PORT/{print h, $2}' RS=\( file

or

awk -F'[=)]' '/HOST/{h=$2} /PORT/ {print h, $2}' RS=\( file

if you don't mind spurious spaces in the output

--
sed:

sed 's/.*HOST[^=]*=[^=]\([^)]*\).*PORT[^=]*=[^=]\([^)]*\).*/\1 \2/' file