Essay On Information Retrieval

845 Words2 Pages

CHAPTER 2

IR AND CLIR METHODOLOGIES

Information retrieval (IR) has become a mature technology to discover relevance among retrieved information from different sources, not only in the news domain but also in special domains. In this research work, Information retrieval is limited to information available on web. This chapter starts with the retrieval models and the techniques used to improve retrieval; then it reviews approaches for cross-language information retrieval; and finally it discusses the information retrieval methods applied in Telugu.

2.1. Information Retrieval

The term “Information Retrieval” was first coined by Mooers [6]. After many early studies, such as [7, 8, 9], IR came to maturity in the mid-1990s. In this section, IR refers to “information retrieval”, where queries and information are presented in the same language.

2.1.1 The Definition

The research work in [10], “Information Retrieval” refers to the technology of “finding information of an unstructured nature (text) that satisfies an information need from within large collections of information available in different sources. The general workflow of information retrieval is illustrated as Figure 2.1, which can be separated as three sections: the first focuses on techniques to prepare information for retrieval; the second presents algorithms used to parse users’ queries and then improve these queries; and the third describes the retrieval engine itself.

The first step is collecting information from multiple sources, such as online documents, databases, etc. Before indexing the information, several pre-processes are required:

In general, the information will have too high or too low frequency is removed from information at this stage, because they scarc...

... middle of paper ...

...assignment of term weights, on which the query-document relevance measurement depends; iii) the retrieved documents are usually presented in a random order, that is, with no ranking, because the Boolean model does not provide an estimate of the query-document relevance; iv) the size of the subset of documents to be returned is difficult to control; and v) it is difficult or impossible to find a satisfactory middle ground between AND and OR. Salton [Sal86] proposed a compromise by the use of a query formulation that is neither too broad nor too narrow.

Several studies [Sal82, SFW83] have extended the base Boolean model to add term weighting and output ranking features.

2.1.3.2 The Vector Space Model

The vector space model (VSM) [Sal71] uses a ranking algorithm that tries to rank documents according to the overlap between the query terms and document terms [Boo82].

More about Essay On Information Retrieval

Open Document