How to grep for a word in xml?

Hi,

I have the below tag/s in my xml.

<foreign-server name="MOHTASHIM_SERVER">

What will be the easist way to extract MOHTASHIM_SERVER without the double quotes "" from the above tag?

Desired Output:

Hello mohtashims,

Could you please try following and let me know if this helps.

awk '{match($0,/\".*\"/);print substr($0,RSTART+1,RLENGTH-2)}'  Input_file

Thanks,
R. Singh

I need to get the value only for <foreign-server name=

while your suggestion looks like populates all double quotes.

Can you please check ?

Hello mohtashims,

Could you please try following and let me know if this helps you.

awk '{LEN=length("<foreign-server name=");match($0,/<foreign-server name=\".*\"/);print substr($0,RSTART+LEN+1,RLENGTH-LEN-2)}'  Input_file

Thanks,
R. Singh

Hi.

R. Singh's solution seemed to work for me:

echo '<foreign-server name="MOHTASHIM_SERVER">' |
awk '{match($0,/\".*\"/);print substr($0,RSTART+1,RLENGTH-2)}'

produced:

MOHTASHIM_SERVER

on a system:

OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.4 (jessie) 
awk GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2-p3, GNU MP 6.0.0)

Best wishes ... cheers, drl

I will only be able to test this sometime later and update this thread.

---------- Post updated at 11:29 AM ---------- Previous update was at 07:17 AM ----------

It works but the output is several blank lines are before and after the output while i just needed the output with no white spaces above or below.

Desired Output:

Current Output:

Hello mohtashims,

Could you please try following and let me know how it goes then.

awk '{LEN=length("<foreign-server name=");match($0,/<foreign-server name=\".*\"/);VAL=substr($0,RSTART+LEN+1,RLENGTH-LEN-2);if(VAL){print VAL}}'  Input_file

Thanks,
R. Singh

I don't think you need the length function for a string constant, and the match should be in the pattern, not in the action part, to reliably eliminate lines without the search string:

awk 'match($0,/<foreign-server name=\".*\"/) {print substr($0,RSTART+22,RLENGTH-23)}' file
MOHTASHIM_SERVER

Thank you Rudi, I have put length inside match because I don't want to hardcode values inside the code eg--> +22 and -23 so putting that we could look for any string and it's length will take care of that part was my thinking on it. Please do correct me if I am wrong here.

Thanks,
R. Singh

mohtashims,
Assuming that there is no more than one "foreign-server" tag on each line in your input file, you might also try this slight modification to RavinderSingh13's code:

awk '
BEGIN {	LEN1 = length("<foreign-server name=\"")
	LEN2 = length("\">")
}
match($0, /<foreign-server name="[^"]*">/) {
	print substr($0, RSTART + LEN1, RLENGTH - LEN1 - LEN2)
}'  Input_file

which should also work if other text containing quotes follows the "foreign-server" tag on lines containing that tag.

Neither of these suggestions will work if more than one "foreign-server" tag appears on a single line, but they will fail in different ways.

If someone wants to try either of these suggestions on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

RudiC,
I agree that the BEGIN section (and the LEN1 and LEN2 variables) aren't needed, but they might make it easier for someone less experienced to understand the calculations going on in the substring operation.

Ravinder,
Note that when an ERE is delimited by / characters, " characters in the ERE don't need to be escaped.

1 Like

As the length of this thread has shown, handling XML is not trivial, not unless its very simple and regular XML. Also, posting "prettier" data than your real input is liable to get non-working solutions because it's much easier to write for pretty XML than ugly XML.

If we throw out "simple", you can use my yanx.awk library like:

awk -f yanx.awk -e 'TAG=="FOREIGN-NAME" { print ARGS["NAME"];  delete ARGS["NAME"]; }' ORS="\n" filename

Use nawk on solaris.

If this doesn't work, please show your actual input so I can figure out why.