I wanted to get my sitemap data into Google Sheets for a few content audit and statistics dashboards.
The IMPORTXML() function in Google Sheets sounded pretty self explanatory and the XPATH syntax seemed almost graspable.
But it just wouldn’t sodding work.
It turned out that this was because Google XML Sitemaps declare a namespace.
Once a namespace has been declared it’s no good saying you want //url/loc, you have to specifically ask for that data from the applicable namespace. But IMPORTXML() doesn’t support that.
The trick I eventually found was to use LOCAL-NAME with a search filter, which lets you pick out the loc data by searching for it by name regardless of namespace.
This is the working formula:
=IMPORTXML("https://www.example.com/sitemap.xml", "//*[local-name() ='url']/*[local-name() ='loc']")
You can use the same method to get any of the other data in the sitemap.
=IMPORTXML("https://www.example.com/sitemap.xml", "//*[local-name() ='url']/*[local-name() ='lastmod']")
Import more than one column from a sitemap
Update: Since first writing this I’ve found a formula which brings the whole sitemap in a multi-column table:
=importxml("http://example.com/sitemap.xml","//*[local-name() = 'url']")
I’m not sure if I simply hadn’t tried that before or whether Google Sheets has tweaked things recently. Alternatively, the only other option seems to be to bring in the columns one at a time and rely on them lining up.
PATH syntax uses a pipe to separate multiple queries but this creates just a single column. As this is my first use of XPATH I don’t know if this is a flaw of XPATH or Google Sheets, or a super important design feature.
=importxml("http://example.com/sitemap.xml","//*[local-name() ='url']/*[local-name() ='loc'] | //*[local-name() ='url']/*[local-name() ='lastmod']")
I’ve only been able to get around this by placing separate IMPORTXML formulas in adjacent cells (eg. Allowing for headers in row 1, put ‘loc’ formula in A2 and ‘lastmod’ in B2).
As long as the sitemap file is properly structured the rows should line up.
It’s really easy to import a Google XML Sitemap into Excel, but i wanted it in Google sheets so i could use this data alongside Google Analytics stats, which are much easier to import into Google Sheets.
I can now:
- Find unviewed pages by comparing the sitemap contents against Google Analytics pageviews
- Use the latest page edits in statistics dashboards