CHAPTER 2
IR AND CLIR METHODOLOGIES
Information retrieval (IR) has become a mature technology to discover relevance among retrieved information from different sources, not only in the news domain but also in special domains. In this research work, Information retrieval is limited to information available on web. This chapter starts with the retrieval models and the techniques used to improve retrieval; then it reviews approaches for cross-language information retrieval; and finally it discusses the information retrieval methods applied in Telugu.
2.1. Information Retrieval
The term “Information Retrieval” was first coined by Mooers [6]. After many early studies, such as [7, 8, 9], IR came to maturity in the mid-1990s. In this section, IR refers to “information retrieval”, where queries and information are presented in the same language.
2.1.1 The Definition
The research work in [10], “Information Retrieval” refers to the technology of “finding information of an unstructured nature (text) that satisfies an information need from within large collections of information available in different sources. The general workflow of information retrieval is illustrated as Figure 2.1, which can be separated as three sections: the first focuses on techniques to prepare information for retrieval; the second presents algorithms used to parse users’ queries and then improve these queries; and the third describes the retrieval engine itself.
The first step is collecting information from multiple sources, such as online documents, databases, etc. Before indexing the information, several pre-processes are required:
In general, the information will have too high or too low frequency is removed from information at this stage, because they scarc...
... middle of paper ...
...assignment of term weights, on which the query-document relevance measurement depends; iii) the retrieved documents are usually presented in a random order, that is, with no ranking, because the Boolean model does not provide an estimate of the query-document relevance; iv) the size of the subset of documents to be returned is difficult to control; and v) it is difficult or impossible to find a satisfactory middle ground between AND and OR. Salton [Sal86] proposed a compromise by the use of a query formulation that is neither too broad nor too narrow.
Several studies [Sal82, SFW83] have extended the base Boolean model to add term weighting and output ranking features.
2.1.3.2 The Vector Space Model
The vector space model (VSM) [Sal71] uses a ranking algorithm that tries to rank documents according to the overlap between the query terms and document terms [Boo82].
Leerlingen uit de 3e Bachelor Toegepaste Informatica, dienen in het kader van het vak ICT en recht een paper te schrijven. Het onderwerp dat in deze paper besproken wordt, is computercriminaliteit.
In this paper the author address these questions by conducting a systematic investigation of two representative probabilistic topic models PLSA and LDA using three representative text mining tasks, document clustering, text categorization, and ad-hoc retrieval.
This utility lets the end user easily locate information using keywords and phrases. In a few short years this has become the”most widely used searching tool on the Internet.” (Levin, 60) The annual growth rate for Gopher traffic is 997%! (Fun Facts, 50) Up until recently, this Internet protocol had been mainly used by the government and academics. But it has caught on and is being used for business and leisure purposes. If one is interested in the latest NFL scores, schedules and point spreads, they can easily access this information at News and Weather. Business administrators can learn more about total quality management (TQM) by visiting (Maxwell, 299 and 670)
Search Engines are specialized programs that facilitate the retrieval of data from the Internet, on a business related network or on a personal computer system. They allow users to ask for contents meeting certain criteria usually involving a certain word or phrase, then gives a list of articles that matches those words or phrases. There are number of different engines that can be used for different types of searches and can be narrowed for optimal results.
In today’s fast paced technology, search engines have become vastly popular use for people’s daily routines. A search engine is an information retrieval system that allows someone to search the...
Much like fast-food or entertainment, our modern world has access to tools for the nearly instant obtaining of information: search engines. But as with any service in today’s free market, there must be competition between two or more companies offering similar assets; in this case, Google’s search engine and Microsoft’s Bing. How do they compare to each other? Which delivers better results? Are there any distinguishing factors for one not common to the other? These questions are among many in the comparison between the two search engines. Through analyzing and weighing each option for Internet searches, one will be able to correctly determine which medium has greater value to the online community.
Generally, Divide and Conquer is a powerful tool for solving conceptually difficult problems. This leads to enter of research in to the introduction of new sorting algorithm using Divide and Conquer technique with better performance. Sorting makes the problem much simpler and easier. This idea leads our research to the application of sorting in different data structures like Binary search tree, Balanced search tree, Hashing data structure and in the area of Cryptography.
When you are in hurry, which search engine do you choose in order to get the best result? Maybe you just use the one that is familiar to you. Google, Yahoo, and Msn are the three most common search engines that we use in daily life. Although Yahoo and Msn are not the top five of search engines (based on Searchengineswatch.com Feb 2003), we still use those because we are used to using those sites. In my personal experience, I also never realized why I use those search engines. I just use them because the first time I use I was told to use “google.com” by my friend and it has become my habit. In this paper particularly, I will discuss how search engines work, and the similarities and differences of those three search engines. I hope based on the information I give, you will choose the right search engines in order to maximize your result and minimize your time.
Information Retrieval (IR) is to represent, retrieve from storage and organise the information. The information should be easily access. User will be more interested with easy access information. Information retrieval process is the skills of searching for documents, for information within documents and for metadata about documents, as well as that of searching relational databases and the World Wide Web. According to (Shing Ping Tucker, 2008), E-commerce is rapidly a growing segment in the internet.
Query optimization is the method of practicing the most efficient means of extracting data quickly from the database through performance optimal SQL queries. We can obtain the same result by writing different SQL queries. But by using the best query is important when performance is critical. Main objective of query optimization is to retrieve the data quickly. Query optimization helps to bring down speed of execution and save time in extracting the data and is cost effective. Most of the Databases after they are initially built and loaded with data and further usage will bring down the speed of data extraction. The time taken to execute the query and return the results increases as the amount of data increases in the database leading to more wait times on the application and user. Sometimes wait time to execute the query might lead to hours or days and could also result in deadlock issue in database. An SQL query or statement can be executed in many different ways, like index scans, full table scan, nested loops, and hash joins. It can handle more queries in the same amount of time because; each request will take less time than un-optimized queries.
Information Retrieval is simply a field concerned with organizing information. In other terms, IR is emphasizing the range of different materials that need to be searched. Others researcher said that IR is the contrast between the strong structure and typing a database system with the lack of structure in the objects typically searched in IR. The actual process in information retrieval systems is it has to deal with incomplete or under specified information in the form of the queries issued by users. IR uses the techniques of storing and recovering and often disseminating recorded data especially through the use of a computerized system.
Sentiment analysis, also called as opinion mining, is the field of study that analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes and emotion towards entities such as products, services or organizations, individuals, issues, topics and their attributes. Sentiment analysis and opinion mining mainly focuses on opinions which express or imply positive, negative or neutral sentiments. Due to the big diversity and size of social media there is a need of automated and real time opinion extraction and mining. Mining online opinion is a form of sentiment analysis that is treated as a difficult text classification task.
An information retrieval system (IRS) is the activity of obtaining information resources relevant to an information need for a collection of information resources. Searches can be based on metadata or on full text (or content based) indexing. The automated information retrieval system is used to reduce what has been called “information overload”. Many universities and public libraries use information retrieval system to provide access to books, journals, and other documents. Web search engines are the most visible information retrieval application.
Web 3.0 also means that if the user was to search for something such as ‘man’ it would not just display results just for ‘man’ it will also know to display ...
Gray hat SEO is a mixture of white hat and Black hat SEO technique. In another way we can say that it is a technique to get ranked on search engines in natural and unnatural or unethical mannerGray hat SEO is a mixture of white hat and Black hat SEO technique. In another way we can say that it is a technique to get ranked on search engines in natural and unnatural or unethical mannerGray hat SEO is a mixture of white hat and Black hat SEO technique. In another way we can say that it is a technique to get ranked on search engines in natural and unnatural or unethical mannerGray hat SEO is a mixture of white hat and Black hat SEO technique. In another way we can say that it is a technique to get ranked on search engines in natural and unnatural or unethical mannerGray hat SEO is a mixture of white hat and Black hat SEO technique. In