How to pull multiple XML tags from the same XML file in Shell.?

I'm searching for the names of a TV show in the XML file I've attached at the end of this post. What I'm trying to do now is pull out/list the data from each of the <SeriesName> tags throughout the document. Currently, I'm only able to get data the first instance of that XML field using the following:

cat My.xml | awk -F'</?SeriesName>' ' { print $2 } '

I'm missing something. Obviously. Also tried/failled with sed and xmllint although I'm not as fluent with those commands as I am with cat and awk. Open to suggestions. I'm on a Mac running Unix.

Now, here's a version of the XML file I'm using, not parsed, so it prints all nice-nice in the window:

<?xml version="1.0" encoding="UTF-8"?>
<Data><Series><seriesid>71862</seriesid><language>en</language><SeriesName>Chappelle's Show</SeriesName><banner></banner><Overview>"Chappelle's Show" takes comedian Dave Chappelle's own personal joke book and brings it to life, with episodes consisting of sketches, man-on-the-street pieces, and pop culture parodies introduced by Dave in a stand-up format in front of a studio audience. Chappelle's unique point-of-view on the world provides a hilarious, defiant and sometimes dangerous look at American culture, including music, movies, television, advertising, current events, and everyday life situations.</Overview><FirstAired>2003-1-22</FirstAired><IMDB_ID></IMDB_ID><zap2it_id></zap2it_id><AliasNames></AliasNames><id>71862</id></Series><Series><seriesid>257136</seriesid><language>en</language><SeriesName>Dave Chappelle</SeriesName><banner>/banners/graphical/257136-g.jpg</banner><Overview>Dave Chappelle's career started while he was in high school at Duke Ellington School of the Arts in Washington, DC where he studied theatre arts. At the age of 14, he began performing stand-up comedy in nightclubs. Shortly after graduation, he moved to New York City where he quickly established himself as a major young talent. At the age of 19, Chappelle made his film debut in Robin Hood: Men in Tights (1993). Chappelle then starred in the short-lived sitcom, Buddies (1996) and had a featured role in The Nutty Professor (1996).</Overview><FirstAired>1998-1-9</FirstAired><IMDB_ID></IMDB_ID><zap2it_id></zap2it_id><AliasNames></AliasNames><id>257136</id></Series><Series><seriesid>76837</seriesid><language>en</language><SeriesName>Challenge of the SuperFriends</SeriesName><banner></banner><Overview>Banded together from remote galaxies, 13 of the most sinister villains of all time - the Legion of Doom! Dedicated to a single object, the conquest of the universe! Only one group dares to challenge this inter-galactic threat - The SuperFriends! Challenge is a sequel to the two earlier Super Friends shows. It drops Zan and Jayna from the previous incarnation and makes several "guest stars" full members of the Justice League, giving us a membership of Superman, Aquaman, Wonder Woman, Batman, Robin, Hawkman, Flash, Green Lantern, Black Vulcan, Samurai, and Apache Chief. Pitted against them are some real villains - the Legion of Doom: 13 of the most powerful supervillains from "remote galaxies" (well, 12 + Black Manta... :) ). Who are they? Lex Luthor, Brainiac, Solomon Grundy, the Riddler, the Scarecrow, Bizarro, Cheetah, Black Manta, Giganta, Gorilla Grodd, Sinestro, Captain Cold, and the Toyman. Okay, everyone but Brainiac, Sinestro, and Bizarro was from Earth (most of the ga</Overview><FirstAired>1978-9-9</FirstAired><IMDB_ID></IMDB_ID><zap2it_id></zap2it_id><AliasNames></AliasNames><id>76837</id></Series><Series><seriesid>76420</seriesid><language>en</language><SeriesName>Challenge of the GoBots</SeriesName><banner></banner><Overview>Transforming robots from the planet Gobotron wage war across the galaxy: the heroic Guardians and the evil Renegades. The Guardians, led by the heroic Leader-1, must battle the evil of Cy-Kill and his Renegades! Together with UNECOM's Matt, A.J. and Nick, they'll save the Earth and Gobotron!</Overview><FirstAired>1984-10-29</FirstAired><IMDB_ID></IMDB_ID><zap2it_id></zap2it_id><AliasNames></AliasNames><id>76420</id></Series><Series><seriesid>366596</seriesid><language>en</language><SeriesName>CHALLENGER</SeriesName><banner></banner><Overview></Overview><FirstAired></FirstAired><IMDB_ID></IMDB_ID><zap2it_id></zap2it_id><AliasNames></AliasNames><id>366596</id></Series><Series><seriesid>364973</seriesid><language>en</language><SeriesName>Challenger Disaster: Lost Tapes</SeriesName><banner></banner><Overview>Challenger Disaster: Lost Tapes follows the story of the Space Shuttle Challenger and its crew, specifically Christa McAuliffe, the first civilian to be launched into space. McAuliffe was a teacher from Concord, N.H. She was chosen from thousands of applicants to expand the understanding of the nation's school children about space and the next generation of interplanetary travel. But her dreams - and those of NASA - were tragically cut short when the Challenger exploded just after liftoff in front of a live television audience. The events of the days leading up to the disaster are detailed in this unique film, which uses no narration and no interviews. Instead the story is told solely with reports of journalists covering the story, extensive recordings from the NASA team, and interviews with McAuliffe and others who were part of this one-of-a-kind mission. Using rarely seen images and audio recordings, this show takes viewers behind the scenes of this compelling and historic story in a way never before seen.</Overview><FirstAired>2016-1-25</FirstAired><IMDB_ID></IMDB_ID><zap2it_id></zap2it_id><AliasNames></AliasNames><id>364973</id></Series><Series><seriesid>351481</seriesid><language>en</language><SeriesName>Challenging Taboos</SeriesName><banner></banner><Overview>Smashing stereotypes, breaking taboos and challenging stigma - this playlist takes you on a journey through history, culture and geography.</Overview><FirstAired>2016-9-15</FirstAired><IMDB_ID></IMDB_ID><zap2it_id></zap2it_id><AliasNames></AliasNames><id>351481</id></Series><Series><seriesid>350777</seriesid><language>en</language><SeriesName>Challenge Accepted </SeriesName><banner></banner><Overview></Overview><FirstAired>2018-4-30</FirstAired><IMDB_ID></IMDB_ID><zap2it_id></zap2it_id><AliasNames></AliasNames></Series></Data>

Hi
try this

awk -F '>' '/^SeriesName/ {print $2}' RS='<' file
2 Likes

Hello hungryd,

Could you please try following.

awk '
{
  while(match($0,/<SeriesName>[^<]*/)){
    print substr($0,RSTART+12,RLENGTH-12)
    $0=substr($0,RSTART+RLENGTH)
  }
}
'   Input_file

Thanks,
R. Singh

1 Like

It's a simple XPATH query with xmlstarlet

xmlstarlet sel -t -v "//SeriesName" data.xml

Chappelle's Show
Dave Chappelle
Challenge of the SuperFriends
Challenge of the GoBots
CHALLENGER
Challenger Disaster: Lost Tapes
Challenging Taboos
Challenge Accepted 

Look for the keyword XPath for tutorials on the topic on the web or via forum search for more info.

Thank you Nezabudka. This is great AND includes the line breaks. I wasn't seeing those with some of the other XML parsing tool options I'd experimented with in shell including:

xmllint  --xpath "//SeriesName" file.xml

Thank you

--- Post updated at 11:09 AM ---

Can you please just explain to me the meaning/syntax of the

RS='<'

call you've placed in the larger line of code? That I've not seen before and would like to know more. Thank you again, Nezabudka.

xmllint does not separate the output result with newlines, xmlstarlet does

1 Like
LESS=+/"^\s*RS\s" man awk
      RS          The input record separator, by default a newline.
...

it was RS='\n' has become RS='<'
this can be imagined as if in the text the character '<' is replaced by the character of the end of the string '\n'
and therefore the character following it becomes the character of the beginning of the next line
glad that helped

--- Post updated 02-01-20 at 00:08 ---

1 Like

But xmllimit is native on macOS and xmlstarlet is not:
Fink - Package Database - Package xmlstarlet (Command-line XML manipulation tool)

And while I'm open to installing additional tools, @Stomp, my preference - both for speed and convenience - is to use native tools in the OS stack. However, I still owe you some props as I believe I learned about xmllimit from YOU in another post on these boards, so thank ye, sir.

--- Post updated at 02:47 PM ---

But xmllimit is native on macOS and xmlstarlet is not:
Fink - Package Database - Package xmlstarlet (Command-line XML manipulation tool)

And while I'm open to installing additional tools, @Stomp, my preference - both for speed and convenience - is to use native tools in the OS stack. However, I still owe you some props as I believe I learned about xmllimit from YOU in another post on these boards, so thank ye, sir.

I have no idea how to delete a duplicate post/reply. Oh well.

Ok, you really appreciate native tools? Me too :wink:

Good news. As I wrote before(libxml2 change xpath result separator): The change for newline-separators had been made within the source-code of libxml as of Sep 2018.

I just compiled libxml2 and verified that it works as expected:

$ xmllint-current --xpath "//SeriesName/text()" data.xml

Chappelle's Show
Dave Chappelle
Challenge of the SuperFriends
Challenge of the GoBots
CHALLENGER
Challenger Disaster: Lost Tapes
Challenging Taboos
Challenge Accepted 

So it's only a matter of time until it flows downstream into the major distributions. But if one has the need to be really compatible, one better may take other approaches.

...and l like hearing that my efforts contributed to your knowledge.

1 Like