XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere.

I can not supply any sample file as it contains private data but you can download your own contacts using this script:

#!/bin/sh

# imports Google Contacts
# imported data is stored in contacts.xml file (current directory)

# You will need curl and xmllint tools

LOGIN="your.login@gmail.com"
PASSW="your_passw"

AUTH=$(curl --silent https://www.google.com/accounts/ClientLogin \
-d Email=$LOGIN \
-d Passwd=$PASSW \
-d accountType=GOOGLE \
-d service=cp \
-d Gdata-version=3.0 | grep '^Auth')

curl --silent -o /tmp/contacts.tmp https://www.google.com/m8/feeds/contacts/default/full?max-results=5 \
--header "Authorization: GoogleLogin auth=${AUTH#*=}" \
--header "GData-Version: 3.0" \

# format nicely the Google output
xmllint --format /tmp/contacts.tmp > contacts.xml

I can get the root node:

$ xmllint --xpath '/' contacts.xml

But it fails when I try the first node below root: <feed>

$ xmllint --xpath '/feed' contacts.xml 
XPath set is empty

Please understand that you need to supply a sample file if you expect people to help you. Simply obscure your private data or make up some replacement data.

Here you go. A bit tedious to obscure a Google contacts file. This one contains 3 records.

I found this link reporting a similar problem and solution suggesting 2 different approaches. So I thought to share it with you, not sure if it will help.

Thanks for the hint but I have seen that post but it's about another problem.

There is something wrong with that Google XML file -or with my way to access it. When I grep a record in the xmllint shell, it returns some wild card instead of the node path:

$ xmllint --shell cts.xml 
/ > grep Arthur
/*/*[16]/*[5] : tan        9 Arthur M.
/*/*[16]/gd:name/gd:fullName : tan        9 Arthur M.
/*/*[16]/gd:name/gd:givenName : ta-        6 Arthur
/ > 

My understanding is that it should have returned a full path to the node. Something like:

/feed/entry/gd:name/gd:fullName

Could that be that the file is corrupt? A xmllint --debug doesn't return anything abnormal though.

But I could be on something:

xmllint --valid cts.xml
cts.xml:2: validity error : Validation failed: no DTD found !
tp://schemas.google.com/g/2005" gd:etag="W/"A0AFRHc4eit7I2A9WhNVEkU.""
                                                                               ^

But all other test files that work all right with xmllint generate the same error on validation...

Again, I am getting nowhere...

No, your sample file is not corrupt. It is valid XML. It just has no DTD - which is fine.

It has multiple namespaces in it, that is why xmllint does not work.

Could xsltproc be able to extract data despite that multiple namespace thing?

Yes, here is an example XSL stylesheet which outputs the name elements:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:atom="http://www.w3.org/2005/Atom"
                xmlns:gd="http://schemas.google.com/g/2005"
                version="1.0">

   <xsl:output method="text" />

   <xsl:template match="atom:feed">
      <xsl:apply-templates select="atom:entry" />
   </xsl:template>

   <xsl:template match="atom:entry">
      <xsl:apply-templates select="gd:name" />
   </xsl:template>

   <xsl:template match="gd:name">
      FULLNAME: <xsl:value-of select="gd:fullName" />
      GIVENNAME: <xsl:value-of select="gd:givenName" />
      FAMILYNAME: <xsl:value-of select="gd:familyName" />
      <xsl:text>
</xsl:text>
   </xsl:template>

   <xsl:template match="*"/>

</xsl:stylesheet>

which produces the following output from your supplied XML:

      FULLNAME: Arthur M.
      GIVENNAME: Arthur
      FAMILYNAME: M.

      FULLNAME: Eric D.
      GIVENNAME: Eric
      FAMILYNAME: D.

      FULLNAME: Jack Ppppppp
      GIVENNAME: Jack
      FAMILYNAME: Ppppppp
1 Like

Great! Thanks.

Now I only have to take a deep breath and dive into XLS. It looks like black magic to me. It took me a couple of months to get use to CSS and now XLS...

If you know a good step by step tutorial...

Merry Xmas

XSLT is a basically declarative pattern-matching language with some functional language concepts. If you have not got experience of declarative languages, it may you a while to get your head around the concepts.

If you wish to learn XSLT, you should also study XPath at the same time as they are frequently used together.