Categories
Spreadsheets

Import a Google Sitemap XML file into Google Sheets

I wanted to get my sitemap data into Google Sheets for a few content audit and statistics dashboards.

The IMPORTXML() function in Google Sheets sounded pretty self explanatory and the XPATH syntax seemed almost graspable.

=IMPORTXML("https://www.example.com/sitemap.xml", "//url/loc")

But it just wouldn’t sodding work.

It turned out that this was because Google XML Sitemaps declare a namespace.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

Once a namespace has been declared it’s no good saying you want //url/loc, you have to specifically ask for that data from the applicable namespace. But IMPORTXML() doesn’t support that.

The trick I eventually found was to use LOCAL-NAME with a search filter, which lets you pick out the loc data by searching for it by name regardless of namespace.

This is the working formula:

=IMPORTXML("https://www.example.com/sitemap.xml", "//*[local-name() ='url']/*[local-name() ='loc']")

You can use the same method to get any of the other data in the sitemap.

=IMPORTXML("https://www.example.com/sitemap.xml", "//*[local-name() ='url']/*[local-name() ='lastmod']")

Import more than one column from a sitemap

Update: Since first writing this I’ve found a formula which brings the whole sitemap in a multi-column table:

=importxml("http://example.com/sitemap.xml","//*[local-name() = 'url']")

I’m not sure if I simply hadn’t tried that before or whether Google Sheets has tweaked things recently. Alternatively, the only other option seems to be to bring in the columns one at a time and rely on them lining up.

PATH syntax uses a pipe to separate multiple queries but this creates just a single column. As this is my first use of XPATH I don’t know if this is a flaw of XPATH or Google Sheets, or a super important design feature.

=importxml("http://example.com/sitemap.xml","//*[local-name() ='url']/*[local-name() ='loc'] | //*[local-name() ='url']/*[local-name() ='lastmod']")

Google Sheets showing values merged into one column

I’ve only been able to get around this by placing separate IMPORTXML formulas in adjacent cells (eg. Allowing for headers in row 1,  put ‘loc’ formula in A2 and ‘lastmod’ in B2).

As long as the sitemap file is properly structured the rows should line up.

Google Sheets using multiple formulas to create two columns

Why bother?

It’s really easy to import a Google XML Sitemap into Excel, but i wanted it in Google sheets so i could use this data alongside Google Analytics stats, which are much easier to import into Google Sheets.

I can now:

  • Find unviewed pages by comparing the sitemap contents against Google Analytics pageviews
  • Use the latest page edits in statistics dashboards

9 replies on “Import a Google Sitemap XML file into Google Sheets”

Thanks for sharing really good technique. I have a question, is there a way if we add an option to in the query part of importxml to exclude certain item if it matches the criteria? example

=importxml(importxml(“http://example.com/sitemap.xml”,”//*/chanel | not(//*/chanel/item=”safely”)

meaning if branches under //*/chanel/item consist word “safely” then skip that item.

Wow. Thank you for the information. Btw, as to your problem with adjacent cells, I was able to resolve it using transpose function.

I’ve got what I’m trying to to partially working.., can someone help me?

I’m trying to import the following the same guidelines above, but i’m having and issue.

——————–
Using this:
//*[local-name() =’url’]/*[local-name() =’image:image’]/*[local-name() =’image:loc’]

——————–
To try and import this:

http://evryng.com/gallery
… daily
… 0.75
… 2016-07-18


……
………… http://www.example.com/image.jpg
……
……
………… Gallery
……

What am I doing wrong?

Leave a Reply

Your email address will not be published. Required fields are marked *