Categories
Spreadsheets

3 XML Sitemap tricks for Google Sheets

In the last post I described how I eventually managed to import a Google XML sitemap file into Google sheets. In short, this:

=importxml("http://example.com/sitemap.xml","//*[local-name() = 'url']")

I’ve been building on that recently and thought I’d share a few ways I’ve been using it.

  • Filtering the sitemap by path
  • Filtering the sitemap by date
  • Counting pages

Sitemap for a particular section (URL query)

Using QUERY ()

=query(importxml("http://example.com/sitemap.xml","//*[local-name() = 'url']"), "Select * where Col1 starts with 'http://example.com/path/'")

Using XPATH

=importxml("http://example.com/sitemap.xml","//*[local-name() ='url']/*[local-name() ='loc' and starts-with (text(),'http://example.com/path/')]")

Recent content

Current Month

=query(importxml("http://example.com/sitemap.xml","//*[local-name() = 'url']"), "Select Col1,Col2 where Col2 >= date '"&text(eomonth(today(),-1)+1,"yyyy-mm-dd")&"' order by Col2 label Col1 'loc', Col2 'lastmod'")

Last full calendar month

=query(importxml("http://example.com/sitemap.xml","//*[local-name() = 'url']"), "Select Col1,Col2 where Col2 >= date '"&text(eomonth(today(),-2)+1,"yyyy-mm-dd")&"' and Col2 <= date'"&text(eomonth(today(),-1),"yyyy-mm-dd")&"' order by Col2 label Col1 'loc', Col2 'lastmod'")

With this example keep in mind that a page modified during the last full calendar month might have been edited again since, during the current month, so would not appear.

To get around that you could delve into Google Sheets scripts and trigger the sheet to update on the first of every month. The same trigger could Copy and Paste as Values the sitemap content to store it permanently.

Content Age

This adds maths things using QUERY. Whereas the previous examples refer to Col1 [loc] and Col2 [lastmod] this creates a column DATEDIFF(now(),Col2) which gives you the difference (in days) between the current date and the lastmod date.

=QUERY(importxml("http://example.com/sitemap.xml","//*[local-name()='url']"),"Select Col1, Col2, datediff(now(),Col2) label Col1 'loc', Col2 'lastmod', datediff(now(),Col2) 'age'")

Google XML Sitemap with Age

Page counts

Adding a count of pages is another useful thing for content dashboards. As well as the tables produced in the examples above, you can use a COUNT() function in the QUERY forumla.

Getting page counts from a Google XML Sitemap

In the formulas below I add label count(Col1) ” which nulls the column title, giving you the response in a single cell. Otherwise you get a column heading and the value in the cell below it.

Count of pages in a section

=query(importxml("http://example.com/sitemap.xml","//*[local-name() ='url']/*[local-name() ='loc' and starts-with (text(),'http://example.com/about/news/')]"),"select count(Col1) label count(Col1) ''")

Count of pages edited this month

=query(importxml("http://example.com/sitemap.xml","//*[local-name() = 'url']"), "Select count(Col1) where Col2 >= date '"&text(eomonth(today(),-1)+1,"yyyy-mm-dd")&"' label count(Col1) ''")

Count of pages older that 365 days

=QUERY(importxml("http://example.com/sitemap.xml","//*[local-name()='url']"),"Select count(Col1) where datediff(now(),Col2) >365 label count(Col1) ''")

One reply on “3 XML Sitemap tricks for Google Sheets”

Leave a Reply

Your email address will not be published. Required fields are marked *