Muddy was a tool we developed to automatically index and categorise web content.

Starting as an experimental project, the concept was progressed to become a service used internally by the BBC.

The service was innovative in its use of Wikipedia data. We designed the tool so that it would attempt to dynamically link key names and phrases to the relevant Wikipedia resources.

The core computational problem in this kind of task is in resolving ambiguous terms like 'Apple' (the fruit or technology company). Our approach in tackling this was to again use data from Wikipedia, by analysing clusters of closely-linked pages to help identify the correct reference from the surrounding content of a page.

Some of the ideas and technologies in the project were further developed in-house by the BBC for use in production systems.

Project information

Frankie Roberto, James Boardwell, Rob Lee