Categories
Blog

Creswell Crags Fundraising to Secure Their Future

Creswell Crags was quick to highlight the plight of museums and heritage organisations during the covid-19 crisis, with their “future survival cast into doubt.”

The attraction on the Derbyshire and Nottinghamshire border is famous for its long history including Ice Age rock art as well as the more recent witches marks.

Not only did they use their reaction speed to their advantage, they made use of their high profile patron, Alice Roberts to help them raise money through crowdfunding on JustGiving.

Existing content has also been bought to the fore. A video tour based on 3D laser scanning was released in October 2019 but has been reported on more recently in the Derbyshire Times: Take a virtual tour of Creswell Crags’ Witch Mark cave with amazing 3D video. This perhaps shows an increased appetite from the media to highlight this type of activity that people can do from home.

JustGiving: Help Save Creswell Crags

Categories
Lockdown Heritage

The Manchester Zoom Tour Series

Sceenshot of the Manchester Zoom Tour Series

Zoom doesn’t need an introduction, does it? Nor does Manchester and nor do Tours. What can I say then?

At the end of the day a video is just a video so it interests me how we do still perceive the ‘event’ in this idea.

It might the scarcity effect of the live event giving you the fear of missing out (though you get the recording.)

Perhaps it’s also in part because it recreates the social gathering. You’re not just watching a video, you are really going on a tour with other people.

Would I watch an hour long video about Manchester’s tunnels on YouTube for free? Probably not. Would I pay £6 to watch it live? Yes, as it turns out.

Find our more about Jonathan Schofield’s Manchester Zoom Tours

Categories
Blog

A historic lockdown

A recent survey by Historic England asked whether the current Coronavirus crisis might also result in better ways of working. Respondents said that the “enforced familiarity” with video conferencing and social media might lead to better ways of working.

I’m curious about what that might look like, and will be collecting examples of how heritage groups and organisations are using digital technology to open up access to heritage.

[novo-map id=1]

Categories
Spreadsheets

How to run a website metadata audit in Google Sheets

Google Sheets has a formula that lets you scrape values from a website: =importxml().

Get a list of your web pages (you can do this by importing your xml sitemap into Google Sheets) and then use the =importxml() formula to scrape values for things like metadata title and description, or on-page text like your Heading 1.

woometadata

Importxml(URL,Xpath)

Importxml has two parameters:

URL: is your web page address

XPATH: is the description of the content you’re requesting. w3schools.com has a pretty useful overview of the syntax.

Google Sheets importxml()

Fetch individual metadata values

Assuming you have the URL in column A, and column headings in row 1, the formulas you would use starting from row 2 are:

H1:

=importxml(A2,"//h1")

Browser Title

=importxml(A2,"//title")

Description

=importxml(A2,"//meta[@name='Description']/@content")

Keywords

=importxml(A2,"//meta[@name='Keywords']/@content)

There seems to be a hidden speed limit; I find that the first few hundred fields will populate really quickly but then it really slows down to just one cell updating every 10-20 seconds. Making a cup of tea or writing a blog post about it can help kill the time.

Fetch multiple values in one request

You can fetch multiple values in a single request. I presume this will help speed up the process but I don’t have it working reliably enough to use it in place of the above method.

Use a pipe separater in the XPATH value to request more than one item.

Fetching the H1 and Title in a single request can be done like this:

=importxml(A2,"//h1 | //title")

The first thing to be aware of is that this will bring each value in as a new row. To convert that to columns instead wrap the formula in =transpose():

=transpose(importxml(A2,"//h1 | //title"))

The problem I find with this method is that it gives the results in the order they appear on the page, not the order you request them.

For example the following formula doesn’t bring back keywords first:

=transpose(importxml(A1,"//meta[@name='Keywords']/@content | //h1 | //title | //meta[@name='Description']/@content"))

For me it returns Title, Description, Keywords, H1 because that’s the order in the HTML I’m scraping.

The bigger problem is that where a field doesn’t exist, its column doesn’t get skipped. Where keywords are blank it doesn’t leave a blank cell, it shifts the H1 into the column that contains keywords for other pages.

You can only really use this then for fields you know will be filled in.

My workflow for scraping metadata values

The value gets cached in Google Sheets for a few hours before it will fetch new content. Before this happens you should copy and paste as values to overwrite the formula with the content it fetched.

My process is:

  1. Get the URL list prepared, perhaps sorted by priority (such as an audit of most viewed pages, or all level 1 & 2 landing pages)
  2. Put the formulas in for the first page, and drag the formulas down to autofill about 50 rows.
  3. Wait for Google Sheets to fetch the values and then select all but the last row (e.g. rows 2-49 if you had filled the formula to row 50)
  4. Copy and Paste special: Paste values only
  5. Drag the formula from your last rown down about 50 rows
  6. Repeat 3-5

Categories
Blog

My method of filtering Analytics spam

There’s a growing problem of spam traffic in Google Analytics, but recently I’ve been able to eradicate most of ours with one very simple filter.

Filter Spam traffic from Google Analytics

Spammers are using the Analytics Measurement Protocol. Unlike the normal JavaScript implementation of Analytics, this is a server-side method. This means they can push data straight into your Analytics reporting without visiting your website.

Google analytics is a pretty open system. The account ID is all that’s needed to push data to your reports and these IDs are numeric so it’s easy for them to automate account IDs and have their servers push pageview and event data straight into your account.

Fake events recorded in Google Analytics

How to detect and remove spam

When something like a pageview is recorded in Analytics there are also a lot of secondary dimensions recorded that you can use to verify whether it’s spam or not.

Real and fake referrals in Google Analytics
I used the Hostname dimension. It records the domain that the visitor was viewing when triggering the pageview. By adding the hostname secondary dimension to pageview, referral and event reports I realised that this was missing for the fake spam data. In the screenshot you can see genuine referrals with the hostname set to historicengland.org.uk. The hostname for spam traffic was (not set).

When I first tried to filter traffic based on hostname I set up a filter to exclude hostnames matching “” (blank) or “(not set)” but I discovered that a blank hostname is not the same thing as a missing hostname.

I changed this to a filter which I think is working well so far:

Only include traffic where hostname equals .

Google Analytics View Filter

The full stop is a regular expression wildcard for any 1 character.

Another option would be to specify your domain name, but I felt that that approach was too specific. There are genuine reasons for using alternative hostnames, like Google Translate. Also in case the domain for our website did ever change then I didn’t want the Google Analytics set up to be so heavily customised that the rest of the team might change something which stops it working.

There are some other links which I used along the way to understand the issue, but they tended to rely on individually filtering out specific referral sites and I didn’t want to have to keep on top of that!

Categories
Spreadsheets

3 XML Sitemap tricks for Google Sheets

In the last post I described how I eventually managed to import a Google XML sitemap file into Google sheets. In short, this:

=importxml("http://example.com/sitemap.xml","//*[local-name() = 'url']")

I’ve been building on that recently and thought I’d share a few ways I’ve been using it.

  • Filtering the sitemap by path
  • Filtering the sitemap by date
  • Counting pages
Categories
Spreadsheets

Import a Google Sitemap XML file into Google Sheets

I wanted to get my sitemap data into Google Sheets for a few content audit and statistics dashboards.

The IMPORTXML() function in Google Sheets sounded pretty self explanatory and the XPATH syntax seemed almost graspable.

=IMPORTXML("https://www.example.com/sitemap.xml", "//url/loc")

But it just wouldn’t sodding work.