Importing an XML Sitemap into Google Sheets is a really quick (and free) way to start carrying out content audits on your website.
The =IMPORTXML() function in Google sheets lets you load in data from a web page, but I found the syntax didn’t work how I’d expect for a sitemap XML file.
In this blog I’ll use the example of the Sitemap XML file of Bodleian Library which is at https://www.bodleian.ox.ac.uk/sitemap.xml
The working function is:
=importxml("https://www.bodleian.ox.ac.uk/sitemap.xml","//*[local-name() = 'url']")
This will load in the contents of the XML file, which is usually a URL and sometimes other information like the date it was last updated.
In this screenshot, I’ve typed in my own column headings. Not all sitemaps have the same content, or the same order.
IMPORTXML Syntax
=IMPORTXML(URL,XPATH Query)
The URL is the URL of your sitemap, and is a text string so you put it in double quotes.
The XPATH Query is also a string so stored in double quotes.
Extra tips
Filter the sitemap to a section of the website
You can filter using the XPATH Query but I usually prefer to combine it with the =QUERY() function in Google Sheets which I find easier to use.
Import a sitemap into a sheet called Sitemap, where this function will assume the URL is in Column A and Date Modified is in Column B
=query(Sitemap!A:B,"Select A, B where A starts with 'https://www.bodleian.ox.ac.uk/maps'")
Filter by date updated to get new pages
Using the QUERY formula (like the tip above) you can filter on the date column too.
=query(Sitemap!A:B,"Select A, B where B >= date '2024-10-20'")
This example only displays pages that have been edited since 20 October 2024. The date format is YYYY-MM-DD.
You can combine that with other date functions to get recent pages no matter the date.
=query(Sitemap!A:B,"Select A, B where B >= date '"&text(now()-3,"YYYY-MM-DD")&"'")
Create a Spreadsheet to act as a function library to save examples of your sitemap functions
Every time I’ve learned how to do something new with a Sitemap XML file in Google Sheets, I add it to a reference Spreadsheet I created so that I can easily find it and copy and paste the formula into whatever I’m working on.
Combine IMPORTXML and QUERY into a single function
When I’m learning something new in Google Sheets I like to separate out each part. In the examples above I’ve imported the sitemap using IMPORTXML into one sheet and then used QUERY to refer to it.
You can do it all in a single function, like this:
=query(importxml("https://www.bodleian.ox.ac.uk/sitemap.xml","//*[local-name() = 'url']"),"Select Col1, Col2 where Col1 starts with 'https://www.bodleian.ox.ac.uk/maps' label Col1 'URL', Col2 'Date Modified'")
In this example the Sitemap is being imported inside the QUERY function. Rather than referring to column letters A and B, you refer to Col1 and Col2 of the data. These names are case sensitive.
I’ve also added the labels using the QUERY function syntax. Your sitemap might have different columns, the date might not be Col2.
Copy my example
View my XML Sitemap Example in Google Sheets where you can copy it and change it as you need.
Leave a Reply