Monday, May 16, 2011

How To Build Agile SEO Tools Using Google Spreadsheets

In the past few years innovations on the web have made it incredibly easy for regular people like you and I to enter the world of coding. For example, right at the end of 2010 I started dabbling with Google Appengine and shipped a fully functional interactive site in 4 weeks. (Read how I built 7books in 4 weeks)
Of course, advances in technology have also made it easier for the pros to build awesome applications. Just look at SEOmoz’s Open Site Explorer which relies on Amazon Web Services.
So as SEOs we have a huge arsenal of tools that we can call upon for various different functions. A lot of these tools, services and platforms however either require learning a large amount of code or take a long time to build something bespoke. So in this post I’m going to talk about using Google Spreadsheets to build small, agile tools which can be built to match your exact needs.

Agile vs Scaleable

Before I dive into the technical details, a quick word on what I use Google Docs for. In my SEO-ninja toolset Google Docs are used for quick, agile tools. That means that if there’s a specific problem I need to overcome or some weird thing I’m testing I always turn to Google Docs first. That’s because I can build things quickly. They aren’t always robust, but if I’m only building a tool to solve a unique problem (as opposed to a problem I encounter all the time) then speed is of the essence. I don’t want to have to spend a lot of time building a tool I’m only going to use once. Or running a test that turns out to not give me the expected results. If you want to build scaleable tools then I suggest you leave it to the pros (though Appengine is a great place to start with building “real” tools).

Let’s start with the complete beginner

Ok, so you might be scared. I’m going to talk about writing functions and building tools. You’re going to get your hands dirty. But literally anyone can do this. You need no prior knowledge. None. This post should take the complete beginner to ninja in 5 easy steps. The steps I’m going to cover:
  1. Simple Web Scraping
  2. Advanced Web Scraping
  3. Google Docs Scripts – The Secret Sauce
  4. Script Triggers
  5. Putting It All Together

Lesson 1 – Simple Web Scraping

Ok, so the bedrock of using Google Spreadsheets for fun and profit is their nifty little function called ImportXML. Richard Baxter wrote a great intro post on ImportXML which you can check out. The basic premise is that using a function like this:
Gets us a list of blog post titles into a Google Spreadsheet like this:
Try it for yourself! Copy and paste that code into a blank Google Spreadsheet and see what happens :)
Don’t get scared! There’s lots of things you probably don’t understand so let’s walk through them for you. To read the full Google Help on ImportXML go here.
A standard function looks like this =importxml(“url”, “query”). So the URL can be explicit (like I typed above) or a reference file like this =importxml(A1, “query”) just like you would with a regular spreadsheet function. The query is an XPATH query. For a tutorial reference on XPATH here’s a good guide.
If you can’t be bothered reading that then here’s a few quick definitions (warning! hand-wavey!)
  • // – this means select all elements of the type
  • //h3 – this means select all h3 elements
  • [@class=''] – this means only select those elements that meet the criteria given
  • //h3[@class='storytitle'] – this means only select elements that look like: <h3 class=”storytitle”>Title</h3>

Walkthrough Example for Simple Web Scraping

So, now we’re getting to grips with the code let’s step through a practical example. A common SEO task is “how can I find as many blogs on niche X as possible”. So I google around and find a list of 100 blogs like this: It’s manual and time consuming having to click on each one to copy the link. I want to get the list of URLs into a spreadsheet as quick as possible.
1) First we take a look at the source code of the page and we see something like this:
2) We load up a Google Docs and fire up the importxml function. We can see that all the blogs are a separate li element so let’s try something like
(where A1 is the cell with the URL of the page). We get this back:
3) As you can see, it contains the blog names so we’re getting there. But our query is also getting a whole load of other stuff we don’t want. So let’s look in the code and see if we can isolate the list of blog items. I find the “inspect element” control in Google Chrome excellent for visualising this. As you hover over the code, it highlights the section of the page that applies to it.
4) We refine our guess to limit ourselves to the div with class ‘intro-box-wide’ using a query like
Which loosely translated says “fetch the div with that class and then select all li elements within it” which results in:
5) We’re nearly there! We now have a list of all the blog elements. The next step is to pull the URL. So we modify our function to be:
Which says, from the li elements select the href contents from the a element. This /a/@href is a very common thing to tag on the end of importxml functions so I suggest you memorise it. This results in:
And we’re done! If you want to look at the spreadsheet within Google Docs go here and make a copythen you can play around to your heart’s content :)

Lesson 2 – More Advanced Web Scraping

Ok, now we have the basics down let’s move on to some more fun activities. Of course, as soon as I get computers involved my thoughts turn to rank checking… This is a common task that we might want to do so let’s quickly discuss how to do that. Firstly we construct the search URL like this:
Where the query to search is in cell A2. Then we parse the Google URL using importxml like this:
I’m not going to break that down, hopefully you can figure out what I’m getting off the page. Again, check the source code for the page if you’re not sure what to write in your importxml function. Output like this:
Of course, since this is a common function we might want to roll it into a single cell like this:
=ArrayFormula(MATCH(1, FIND("",importxml(
I was going to add some explanation here as to what this formula does but actually it gets pretty complicated. Either you already know what an arrayforumla does (in which case it should be straightforward) or you don’t. In which case you probably just want to copy and paste for now :)
I should note at this stage that there is a limit for 50 importxml calls per spreadsheet which limits us from building a full web crawler but for most agile tools this is sufficient (especially when combined with scripts, see lesson 3).

Walkthrough Example for Complex Scraping

Cool, so now we have a bunch of moving parts let’s start combining things. Here’s a spreadsheet where you enter a keyword, 4 location variants and a domain and it gives you the ranking position for all the keywords that search suggest spits out for that domain. Nifty.
As always, have a poke around with the Google Doc to understand what’s happening.

Lesson 3 – Google Docs Scripts – The Secret Sauce

Now, all this is very well – we have functions which pull in data but it’s all a little “flat” if you know what I mean? Let’s try and jazz things up a little by making it MOVE. For anyone familiar with macros in excel, scripts function in a very similar way. Two big advantages here however are the ability to crawl URLs and also the ability to email you. Nice.
Google Scripts are very powerful and essentially allow you to build fully featured programs so I’m not going to go into massive detail here. There are great tutorials from Google already for example:
You can easily lose days of your life browsing through and playing with all the things that Google Scripts do. Here, I’m going to present a simple example to show you how agile this is. That’s the key here, building tools that fit your exact needs quickly and easily. Let’s imagine I want to quickly check a bunch of URLs for their tweet count to produce something like this:
What’s happening here is that I’m inputting a keyword, then pulling the top 10 results for that keyword from Google and then checking each of those URLs for the number of tweets they have (imagine perhaps you’re gathering data like for this post by SharkSEO). Let’s step through the thing you need to do:

Walkthrough Example for Google Scripts

The code I’m using in column C is this:
Where tweetCount is a predefined function that I’ve written in Google Scripts. Let’s fire up Google Scripts and take a look at it: 
(Note that you’ll need to grab your own free Backtype API Key to make this script work)
Once you’ve read through the Google Scripts tutorials above you should be fairly comfortable with how this works so I’m not going to step through it in detail. The parsing XML tutorial will likely come in handy.

Lesson 4 – Google Scripts Triggers

Ok, now for the magic. Google scripts are nice, but the real power comes from triggering these scripts in different situations. You can cause a script to trigger on any of the following:
  • The spreadsheet is opened
  • A form is submitted
  • A button is pressed
  • A specific time happens
The most useful here is the time-based trigger I think. Let’s take a quick look at writing a time-based script.

Walkthrough example of time-based trigger

Let’s again take a simple example. As I’m writing this post I know that I’m going to put it live soon, so let’s build a spreadsheet to check the keyword “seo tools” and see if perhaps QDF will push this post onto the first page at any point. How viciously referential :)
Step 1 – we write a simple spreadsheet to rank check against a particular keyword Step 2 – we write a script that tracks the rank and logs it in a new cell:
Step 3 – we create a time-based trigger to run the script every 30mins:
A few things to note:
  • I’ve used
    in the URL to generate a unique URL each time. Otherwise Google caches the data and you won’t get fresh data each time
  • Note the getRange and setValue functions – these are very useful to get your head around. See this tutorial.
The final result (you might have to scroll down for a while depending how long after I wrote this post you’re reading this!):

Lesson 5 – putting it all together

So, finally let’s put it all together in a fun example. I’ve created a form here where you can enter your city and your email address and my script will fetch some data and email it to you. Just like magic! Go ahead, try it out :)

Taking it further

The sky really is the limit when it comes to Google Scripts but I think that if you start doing any more heavy lifting than what I’ve done in this post you almost certainly want to start building in exception handling and learning to code properly (which I, to be clear, have very much not done!). That said, if you do fancy using Google Scripts there are all kinds of funky things it can do:
But for me, the real power is in hacking together things in a few minutes which gather the data I need so I can get back to getting stuff done. I’ll leave building real SEO tools to the pros for now.



  1. Wow. This is a great seo services and Google spreadsheets tutorial. You did a great job of walking me through the whole thing. I'm not much of a computer whiz and it took me a few times through it to get it right, but it's pretty awesome now that I have it. Thanks so much!

  2. This link in Lesson 1 is now a 404 -