Information Retrieval

With the exponential growth in the quantity and complexity of information sources on the internet, information retrieval systems have evolved from a simple concern with the storage and distribution of artefacts, to encompass a broader concern with the transfer of meaningful information. Over the last twenty years, much effort has gone into the development of approaches to deal effectively with this complexity. This section examines and evaluates established and developing approaches and achievements of information retrieval on the internet.

What is Information Retrieval ?

Information retrieval, as the name implies, concerns the retrieving of relevant information from databases. It is basically concerned with facilitating the user's access to large amounts of (predominantly textual) information. The process of information retrieval involves the following stages:

1. Representing Collections of Documents - how to represent, identify and process the collection of documents.
2. User-initiated querying - understanding and processing of the queries.
3. Retrieval of the appropriate documents - the searching mechanism used to obtain and retrieve the relevant documents

Applications of Information Retrieval

a. Text Information Retrieval

Perhaps one of the most common and well known application of information retrieval is the retrieval of text documents from the internet. With its recent growth, the internet is fast becoming the main media of communications for business and academic information. Thus it is essential to be able to tap the right document from this vast ocean of information. This is in fact, one of the main pushing force for the development of information retrieval. To date, many relatively successful systems have been developed. Some examples include:

Isoquest NetOwl

NetOwl is an advanced information retrieval system with automatic indexing and summarization capabilities. The product provides an easy, cost-efficient way for common users to benefit from text analysis aimed at intelligence analysts.

NetOwl makes use of a combination of computational linguistics and Knowledge-based pattern matching methods to analyze natural language to determine the categories of words in the language. By identifying key concepts and relationships, it allows users to quickly find relevant content, eliminate inappropriate materials, and get the information they need. An additional feature is that NetOwl is capable of building an electronic "back of the book" type index on a company's own web server, which enables users to spot important information or launch a request for information.


The EUROSPIDER system is an Information Retrieval (IR) system which searches very large and complex data collections for relevant information. It is a commercial version of the IR system SPIDER, developed by the Swiss Federal Institute of Technology. EUROSPIDER can be used in various ways:
1. as a standalone IR system
2. as an add-on to a World-Wide Web server which makes data collection accessible through a private or public network
3. added to a commercial database (DB) system to access possibly very dynamic and structured data.
The EUROSPIDER retrieval system provides advanced Information Retrieval (IR) functions such as relevance ranking, feedback searches, linguistic document analysis, and automatic indexing. Document analysis and indexing optionally includes fuzzy term matching to cope with recognition errors of OCR-devices.

b. Multimedia Information Retrieval

In this era of information overloading, the amount of information available to us is simply so much that it is virtually impossible for us to deal with in an efficient manner. One solution to this problem is to set up databases for multimedia data. Hundreds of television and radio broadcasts would then be covered by a database application which keeps track of the information available. Thus these vast amount of informations could then be managed and captured in an efficient way.

However, perhaps the biggest problem of multimedia information retrieval is the provision of content-based retrieval facilities. Unlike full-text retrieval systems, multimedia data cannot be easily described in plain text. Take some audio clips for example, different people would probably have different ways of describing a particular sound. Therefore, in addition to textual descriptions, other forms of representations should be used to describe an object. The querying system should also be able to interact with the user so as to iteratively refine the query. A good example of a multimedia information retrieval system is the PArallel Multimedia Information Retrieval (PAMIR) which performs context-based image retrieval on a database consisting of 2100 images.