XML: Find Parent tag based on search of Child tag

Hi ,
I have a XML file. I have to find a distinct parent tags under which a particular child tags are showing up. I am using bash.

Below is the sample xml file -

<?xml version="1.0" encoding="utf-8"?>
<root>
	<result>
		<parent></parent>
		<report_number>1</report_number>
		<actual_completion_date>2009-03-20</actual_completion_date>
		<upon_reject>Cancel all future Tasks</upon_reject>
        <division>
			<d_value>IIT</d_value>
			<link>www.yahoo.com</link>
		</division>
		<opened_by>
			<d_value>Roberta</d_value>
			<link>www.google.com</link>
		</opened_by>
    </result>
   <result>
		<parent></parent>
		<report_number>2</report_number>
		<actual_completion_date>2009-02-20</actual_completion_date>
		<upon_reject>Cancel all future Tasks</upon_reject>
		<sys_domain>
			<d_value>global</d_value>
			<link>www.msn.com</link>
		</sys_domain>
	</result>	
</root>

I am looking to search for "d_value" tags and find there distinct parent tags. This tag shows up under different parent tag as shown above.
From example above - the output will be "<division>" and
"<opened_by>" and <sys_domain>

as these three are parent tag containing child <d_value> tag.

@decostaronny1 , Welcome. The 'xml' posted does not validate , please supply
valid xml (use https://codebeautify.org/xmlvalidator ) to validate, then edit the post and replace with the well formed xml. thks

Furthermore , the forum is primarily a cooperation - post your challenge, along with your attempt(s) - including failed/incomplete , the team can review, give suggestions (which may include complete solutions, suggested alternatives etc .... ) .
You state you are using bash - but haven't supplied that code - please do. There are xml specific tools that are specifically written to handle XML, bash certainly isn't, have you researched any of those tools ?

tks

1 Like

I have updated the xml snippet in my query above.
I tried the code below. It gives me the whole block with parent tag. If somehow it can be formatted, I can try to grab the parent tag.

Open for any suggestions or better solutions.

xmllint --xpath "//*[name()='d_value']/.." test.xml

This solutions does not give the results you requested. On my mac:

MacStudio $ bash test.sh
<division>
			<d_value>IIT</d_value>
			<link>www.yahoo.com</link>
		</division>
<opened_by>
			<d_value>Roberta</d_value>
			<link>www.google.com</link>
		</opened_by>
<sys_domain>
			<d_value>global</d_value>
			<link>www.msn.com</link>
		</sys_domain>

Here is a bash solutions which is longer (more code) but easier to read and it works on my mac as you requested:

#!/bin/bash

# Extract distinct parent tags of <d_value>
xmllint --format input.xml | \
  grep -B 1 "<d_value>" | \
  grep -v "<d_value>" | \
  grep "<" | \
  sed -E 's/.*(<[^>/]+>).*/\1/' | \
  sort | \
  uniq

Testing

$ bash test.sh
<division>
<opened_by>
<sys_domain>

ruby gives a "less interesting"(in my view) approach than bash, but here it is, since you asked. You might like it:

require 'nokogiri'

# Parse the XML file
doc = Nokogiri::XML(File.read('input.xml'))

# Find and print distinct parent tags of <d_value>
doc.xpath('//d_value/..').map(&:name).uniq.each do |parent|
  puts "<#{parent}>"
end

The bash script looks nicer, in my view only because I like simple and easy to read and modify without a lot of "thinking"; and the bash code does not require an external lib.

Anyway, here is the ruby example output:

FYI Only

$ ruby test.rb
<division>
<opened_by>
<sys_domain>

Note: Personally, I'm not a fan of "one-liners" but of course anyone is free to post any solution(s) they wish :slight_smile:

1 Like

A pipe to grep works for me:

xmllint --xpath "//*[name()='d_value']/.." test.xml | grep "^[^[:space:]]"
xmllint --xpath "//d_value/.." test.xml | grep "^[^[:space:]]"

To eliminate duplicates add a pipe to sort -u or awk 's[$0]++==0'

1 Like

Thank you Neo. It is working perfectly.

1 Like