July 15, 2008

IWMW2008 Innovation Competition - Searching for Media Release Related News Stories

It seems the Institutional Web Management Workshop 2008: Innovation Competition is in need of entries, so I had quick a half hour hack around the University of Aberdeen newsfeeds to see if I could come up with anything interesting (Brian Kelly hinted that if I used any OU info sources, no-one would take any notice becuase it would be seen as an 'official' OU development!;-)

Here's where I've got so far...

The University of Aberdeen (and many other universities) publish an RSS feed of their media releases. But how effective are these releases? That is, do the releases actually manage to generate any news coverage?

University of Aberdeen media releases: http://www.abdn.ac.uk/mediareleases/feed.xml (RSS format, limited to 10 items. ?How can I increase the number of items in this feed?)

One of the easiest ways of doing a news search around an organisation is to simply do a search on Google news.


Conveniently, RSS feeds for persistent (alerting) Google News searches are available: Atom feed for Google news search for university-of-aberdeen OR aberdeen-university location:uk over the last month.

That's possibly useful in a different way, though, because it will capture all recent mentions of the university in the UK press.

So here's a recipe that attempts to find news stories about the university that are also on the topics mentioned in the press releases. For convenience, I've specified the recipe as a working prototype in the form Yahoo Pipe.

Fetch the press release feed and run the description of each one through the Yahoo content analysis/term extraction tool.


Use these terms as the basis of a search query, supplemented by additional search terms for ("University of Aberdeen" OR "Aberdeen University").


Generate a set of Google news persistent search RSS URLs for UK news mentions over the last 3 months using the above search terms:


We're now going to replace the original feed items with a set of items returned from the contextualised/media release related news search:


Finally, we use the news story date for the timestamp of each item.


This feed can then be used in the normal way - for example, here it is on a Dipity timeline:


What I need now, of course, is a timeline comparison tool, that lets me compare items on two separate, aligned timelines - one for the media releases, and one for the 'media release contextual news search'...

UPDATE: In the meantime, it's easy enough to merge a feed of the original media releases and the contextual/content extracted terms news search.


By plotting this feed on a timeline, you can see whether the media release lead to resulting story...


Of course, you'd get an even more comprehensive view if you just plotted a 'raw' news search for "Aberdeen University" OR "University of Aberdeen" (which is what this pipe does, as displayed on this timeline). Which is actually better... Hmm... But this post is about building semantic news filters, right?! (Err... ;-)

So to recap, the idea is to take a university media release news feed, automatically extract a set of keywords related to each media release, and then use those keywords as the basis for a set of queries on Google news for UK based news items related to those topics that also mention the university in question. The results are then output from the pipe as an RSS feed, or otherwise. Each item is timestamped with the date of the news article, so it can be displayed on a timeline, or otherwise, as required.

Note that it would be possible to modify the pipe so that searched for news stories were associated with the media release item that provided the corresponding content extracted query terms that turned up the particular search results. However, to make full use of these results would require a client that could consume and display the JSON output of the pipe, which would take a little bit more time to do... And the point of the innovation competition is that the mashups are quick'n'easy to do, right, and ideally make use of stuff that's already out there?! ;-)

PS another approach might be to search for news stories or blogposts that reference the media release items linked to in the media release feed. For example:


This pipe loops through each item in the media release feed, and searches for blog posts that have linked to the original story (that is, that have linked to the URL for the particular media release).

The "URL inLinks (blogs, raw)" module is one I had built earlier, that uses a Google blogsearch to find posts that mention a particular link. It would probably be more effective to look through each media release to see whatstory related websites (if any) it mentions/links to, and then use these as the basis for the blogged link search (for example, I may not be likely to link to the press release, but I may point to the research group web page it describes).

Blogged with the Flock Browser

Tags: , ,

Posted by ajh59 at July 15, 2008 01:47 PM

I 100% need to talk to you about this! Things this week are hectic and then on leave for 3 weeks, so, um, how's late august looking for you ;-)

Posted by: stuart at July 15, 2008 08:39 PM

I was thinking that another approach would be to retrieve the full text of news articles, and dump them into a plagiarism checker that had been seeded with the original press releases?

A variant of that is to pull out soundbites from any quotes in the press release, and use those to seed the search?

Surely media tracking agencies offer this sort of service anyway?

Posted by: Tony Hirst at July 15, 2008 10:19 PM