September 06, 2007

Dynamic Google Custom Search Engines

For some reason, I haven't been tracking the Google Custom Search Engine blog which is stupid, stupid, STUPID, because the service now provides plenty of searchfeedr-like components as Lego blocks. I half twigged this in Google Custom Search on the Fly (just like searchfeedr....err?) but didn't really grok the power of it until just now, whilst looking through the Google CSE documentation, and this post on the CSE blog - Custom Search Engine APIs.

The post began innocuously (?!) enough: "With our new Linked CSEs, we are exposing the API to create and control CSEs."

Then the hints start to drop: for example, using the original CSE, "[i]t was difficult to use other data sources such as iCal, RSS, Google Base, etc. to programmatically create CSEs."

True...

So get this: "[Y]ou can now specify your CSE using a ... parameter that points to a URL, anywhere on the web. You update this URL at your end and don't have to upload it or edit your CSE using our tools. The URL can take arguments to produce dynamic CSEs, based on the current page, the current user visiting your site, etc. ... You can use any script you want, or reference a static file, when creating your CSE."

Cool - so now I can define a "live" CSE using my own dynamic PHP script, say, and use it to power a custom search... that is, I could now build something like searchfeedr using a CSE...

How does this work? With Linked CSEs, you designate a CSE specification URL with each search request (as a hidden form field in your search box HTML code). Google retrieves the CSE specification from the URL when your user searches in the CSE. We cache and refresh the results so that only the first search to your CSE incurs any delay. The flexibility to specify how your search engine should behave, just when your user is doing the query, using whatever data sources you want, opens up many possibilities.
Possibilities? For coders, maybe. But that said, it's still quite a niche community. Writing the link scraping and feed parsing scripts to grab the links to feed in to a dynamically created CSE definition file is not likely to appeal to many people...

But hang on:

You can use our makecse tool to generate CSEs from different sources of links:

You can combine multiple sources of links using our makeannotations tool and the <Include> tag. For example, its easy to create a search engine from the links on the front pages of techmeme, slashdot and digg.

You can write your own tools to produce <Annotations /> XML from other data sources such as Google Calendar or iCal feeds, Google Base or any other structured source of information.

You can automatically generate any number of CSEs, each possibly tuned to a particular user. For example, we've created a sample that builds a CSE from a user's digg.com friend network and submissions using the Digg API. Try it out and view the source. This makes use of two simple python CGI scripts:

  • diggannos.py generates <Annotations> from the specified user's submitted stories

  • diggcse.py generates <GoogleCustomizations> from the specified user's friend network. For each friend, it generates an <Include> element pointing to the appropriate diggannos.py URL

The post closes with this: "Linked CSEs are a very big step for Google Custom Search."

They're not wrong...

If only I had a clear week or two to be able to play with this...

Posted by ajh59 at September 6, 2007 01:02 PM
Comments