Import an XML Sitemap into Google Sheets

Importing an XML Sitemap into Google Sheets is a really quick (and free) way to start carrying out content audits on your website.

The =IMPORTXML() function in Google sheets lets you load in data from a web page, but I found the syntax didn’t work how I’d expect for a sitemap XML file.

In this blog I’ll use the example of the Sitemap XML file of Bodleian Library which is at https://www.bodleian.ox.ac.uk/sitemap.xml

The working function is:

=importxml("https://www.bodleian.ox.ac.uk/sitemap.xml","//*[local-name() = 'url']")

This will load in the contents of the XML file, which is usually a URL and sometimes other information like the date it was last updated.

In this screenshot, I’ve typed in my own column headings. Not all sitemaps have the same content, or the same order.

IMPORTXML Syntax

=IMPORTXML(URL,XPATH Query)

The URL is the URL of your sitemap, and is a text string so you put it in double quotes.

The XPATH Query is also a string so stored in double quotes.

Extra tips

Filter the sitemap to a section of the website

You can filter using the XPATH Query but I usually prefer to combine it with the =QUERY() function in Google Sheets which I find easier to use.

Import a sitemap into a sheet called Sitemap, where this function will assume the URL is in Column A and Date Modified is in Column B

=query(Sitemap!A:B,"Select A, B where A starts with 'https://www.bodleian.ox.ac.uk/maps'")

Filter by date updated to get new pages

Using the QUERY formula (like the tip above) you can filter on the date column too.

=query(Sitemap!A:B,"Select A, B where B >= date '2024-10-20'")

This example only displays pages that have been edited since 20 October 2024. The date format is YYYY-MM-DD.

You can combine that with other date functions to get recent pages no matter the date.

=query(Sitemap!A:B,"Select A, B where B >= date '"&text(now()-3,"YYYY-MM-DD")&"'")

Create a Spreadsheet to act as a function library to save examples of your sitemap functions

Every time I’ve learned how to do something new with a Sitemap XML file in Google Sheets, I add it to a reference Spreadsheet I created so that I can easily find it and copy and paste the formula into whatever I’m working on.

Combine IMPORTXML and QUERY into a single function

When I’m learning something new in Google Sheets I like to separate out each part. In the examples above I’ve imported the sitemap using IMPORTXML into one sheet and then used QUERY to refer to it.

You can do it all in a single function, like this:

=query(importxml("https://www.bodleian.ox.ac.uk/sitemap.xml","//*[local-name() = 'url']"),"Select Col1, Col2 where Col1 starts with 'https://www.bodleian.ox.ac.uk/maps' label Col1 'URL', Col2 'Date Modified'")

In this example the Sitemap is being imported inside the QUERY function. Rather than referring to column letters A and B, you refer to Col1 and Col2 of the data. These names are case sensitive.

I’ve also added the labels using the QUERY function syntax. Your sitemap might have different columns, the date might not be Col2.

Copy my example

View my XML Sitemap Example in Google Sheets where you can copy it and change it as you need.


Comments

11 responses to “Import an XML Sitemap into Google Sheets”

  1. Thanks for sharing really good technique. I have a question, is there a way if we add an option to in the query part of importxml to exclude certain item if it matches the criteria? example

    =importxml(importxml(“http://example.com/sitemap.xml”,”//*/chanel | not(//*/chanel/item=”safely”)

    meaning if branches under //*/chanel/item consist word “safely” then skip that item.

    1. never mind your “3 XML sitemap tricks” post answered my question.

  2. Thank you soo much for this – looked everywhere and thankfully found this.

  3. Marek avatar
    Marek

    Wow, thank you, Ben, very much. It helped me a lot!

    Marek

  4. Wow. Thank you for the information. Btw, as to your problem with adjacent cells, I was able to resolve it using transpose function.

  5. Len Clarke avatar
    Len Clarke

    Thank you – this was exactly what I needed!

  6. Brilliant! Thanks man!

  7. I’ve got what I’m trying to to partially working.., can someone help me?

    I’m trying to import the following the same guidelines above, but i’m having and issue.

    ——————–
    Using this:
    //*[local-name() =’url’]/*[local-name() =’image:image’]/*[local-name() =’image:loc’]

    ——————–
    To try and import this:

    http://evryng.com/gallery
    … daily
    … 0.75
    … 2016-07-18


    ……
    ………… http://www.example.com/image.jpg
    ……
    ……
    ………… Gallery
    ……

    What am I doing wrong?

  8. Right after I submitted that last one, I tried this:

    //*[local-name() =’url’]/*[local-name() =’image’]/*[local-name() =’title’]

    and it worked.

  9. hi there i really appricate it and thankful for your help wondering if you can help because i tried your method but its not working do you have any other suggestions that may help to get all the URLs from my sitemap

    1. Hi Iman, your link is to a ‘sitemap index’ file which is a group of sitemaps rather than an individual sitemap, it lists the URLs of sitemaps rather than the URLs of pages. That happens when you either choose to have different sitemaps for different groups of content, or when you have over 50,000 pages which is Google’s limit per individual sitemap and they have to be broken up.

      You can change the formula to load the content of the Sitemap Index file, which will bring back the URLs of the individual sitemaps

      =IMPORTXML("https://duo.com/sitemap_index.xml", "//*[local-name() ='sitemap']/*[local-name() ='loc']")

      Or point to one of your individual sitemaps:

      =IMPORTXML("https://duo.com/page-sitemap.xml", "//*[local-name() ='url']/*[local-name() ='loc']")

Leave a Reply

Your email address will not be published. Required fields are marked *