Techniques in Text Mining The application of techniques from different areas such as Information Extraction, Information Retrieval, Natural Language Processing(NLP), Query Processing, Categorization and Clustering. All these stages of Text Mining process can be combined into a single workflow.
Figure 1: Techniques in Text Mining
Information Extraction
Information Extraction is the process of extracting information from unstructured document using the techniques. The main objective of Information Extraction is to find specific data or information in natural language text. This information is stored in database like patterns and it can be available for further use. It converts a quality of textual documents into more structured
…show more content…
The most important application of Information Retrieval system is search engine like Google, which identify those documents on the World Wide Web that are relevant to user queries or a set of given words. It also refers to the automatic retrieval of documents from document collection. It deals with crawling, indexing documents and retrieving documents. Information Retrieval system used in digital libraries, online document systems and search engine. Information Retrieval is deals with entire range of information processing from data retrieval to knowledge retrieval.
Figure 3: Information Retrieval
Natural Language Processing (NLP) It is concerned with interactions between computer and human (natural) languages. NLP is related to the area of human-computer interaction. NLP is the component of an Artificial Intelligence (AI). It is used to analyze the human languages so that computers can understand natural languages as humans do. The approaches to NLP is based on machine learning, a type of artificial intelligence that examines and uses the patterns in data to improve a program's own understanding. The role of NLP in Text Mining is to provide the systems in the information extraction phase with linguistic data that they need to perform their task. NLP includes the
…show more content…
It is a developing technology for analyzing large amount of unstructured documents for the purpose of extracting interesting pattern or knowledge. There are several domain specific applications of Text Mining, a number of applications had explained here:
Security Application Many text mining packages are marketed for security applications, particularly observation and analysis of online plain text sources like web news, blogs, etc. for national security functions. It also concerned with the study of text encryption and decryption.
Biomedical Application Text Mining is used in medical specialty for identification and classification of technical terms within the domain of biological science corresponding to the concepts.
Company Resource Planning Mining Company’s reports and correspondences for activities, thus its resource status and problems will be handled properly and future action planned can be design. Market Analysis With the help of numerous text mining techniques, market analysis is concerned to analyze the competitors within the market and can also be used to monitor customer opinions and searing for new potential customers.
Customer Relationship
1. What is the name of the document? Ida Tarbell Criticizes Standard Oil (1904) 2. What type of document is it? (newspaper, map, image, report, Congressional record, etc.)
Leerlingen uit de 3e Bachelor Toegepaste Informatica, dienen in het kader van het vak ICT en recht een paper te schrijven. Het onderwerp dat in deze paper besproken wordt, is computercriminaliteit.
With multitudes of information shared on social media websites like Twitter, Facebook etc and discussions about company offerings on blogs, forums and posts; companies want to extract every bit of the information available to analyze consumer sentiments about their offerings. Companies from different domains are trying to use the customer sentiment information to gain a competitive edge in business. Below are some common industries that use sentiment analysis. Examples of companies within the domain are also mentioned to know the current users of Sentiment analysis.
IBM has a tool “Social Media Analytics” for discovering customer needs and sentiments, to gain better understanding of the market. It also helps understand new market trends and patterns, which is advantageous to the product development team of a firm. The tool also helps in targeting prospective customers by:
There is a debate between the benefits and potential informational privacy issues in web-data mining. There are large amount of valuable data on the web, and those data can be retrieved easily by using search engine. When web-data mining techniques are applied on these data, we can get a large number of benefits. Web-data mining techniques are appealing to business companies for several reasons [1]. For example, if a company wants to expand its bu...
Imagine asking your computer to do something in the same way you would ask a friend to do it. Without having to memorize special commands that only it could understand. For computer scientists this has been an ambitious goal; that can further simplify computers. Artificial Intelligence, a system that can mimic human intelligence by performing task that usually only a human can do, usually has to use a form of natural language processing. Natural language processing, a sub-field of computer science and artificial intelligence, concerns the successfully interaction between a computer and a human. Currently one of the best examples of A.I.(Artificial Intelligence) is IBM 's Watson. A machine that gained popularity after appearing on the show
Information Retrieval (IR) is to represent, retrieve from storage and organise the information. The information should be easily access. User will be more interested with easy access information. Information retrieval process is the skills of searching for documents, for information within documents and for metadata about documents, as well as that of searching relational databases and the World Wide Web. According to (Shing Ping Tucker, 2008), E-commerce is rapidly a growing segment in the internet.
Information Retrieval is simply a field concerned with organizing information. In other terms, IR is emphasizing the range of different materials that need to be searched. Others researcher said that IR is the contrast between the strong structure and typing a database system with the lack of structure in the objects typically searched in IR. The actual process in information retrieval systems is it has to deal with incomplete or under specified information in the form of the queries issued by users. IR uses the techniques of storing and recovering and often disseminating recorded data especially through the use of a computerized system.
Neuro Linguistic Programming (NLP) was developed in the 1970s by a linguist John Grinder and by a mathematician Richard Bandler. Neuro Linguistic Programming (NLP) is a therapy that deals with one’s perceptions of the world by their experiences, beliefs, values, assumptions, and sensory systems. NLP was developed by studying and examining the modeling pattern of human internal and external behaviors of the world. According to NLP website, “NLP investigates the inner functions of the human mind: how we think, how we develop our desires, goals and fears and how we motivate ourselves, make connections, and give meaning to our experiences” (NLP Comprehensive, 2013). NLP entails various collections of psychological practices that target to improve peoples’ lives. Mainly, it is a therapy of motivating the conscious mind by acting upon the unconscious mind; the experience is subjective to the person.
THURAISINGHAM, BHAVANI. (2003). Web Data Mining and Applications in Business Inteligence and Counter-Terrorism.Taylor & Francis.http://www.myilibrary.com?id=6372.
... applied on different Domain data sets and sub level data sets. The data sets are applied on Maximum entropy, Support Vector Machine Method, Multinomial naïve bayes algorithms, I got 60-70% of accuracy. The above is also applied for the Unigrams of Maximum entropy, Support Vector Machine Method, Multinomial naïve bayes algorithms achieved an accuracy of 65-75%. Applied the same data on proposed lexicon Based Semantic Orientation Analysis Algorithm, we received better accuracy of 85%. In subjective Feature Relation Networks Chi-square model using n-grams, POS tagging by applying linguistic rules performed with highest accuracy of 80% to 93% significantly better than traditional naïve bayes with unigram model. The after applying proposed model on different sets the results are validated with test data and proved our methods are more accurate than the other methods.
The Database Management System (DBMS) is software that enables the users to define, create, maintain and control the access to the database. It is a software that interact with the user’s applications programs and it database. Meanwhile, information retrieval system is a system that involved the activity that the systems obtain the information. The obtaining information action need the information from it resources.
Linguistics and Computer Science are the main components of CL. According to Bolshakov & Gelbukh (2004) CL can be defined as a synonym of NLP. CL aims to construct computer programs which are able to process (recognize and synthesize) texts and speech of natural languages. This process enables scientists to create several applications related to this field such as Machine Translation, Spell and Grammar Checkers, Information Retrieval, Speech Recognition and Speech Synthesis, Topical Summarization, Extraction of factual data from texts, Natural language interface.
The other part of computational linguistics is called applied computational linguistics which focuses on the practical outcome of modeling human language use. The methods, techniques, tools, and applications in this area are often subsumed under the term language engineering or (human language technology. The current computational linguistic systems are far from achieving human ability of communicating they have numerous applications. The goal for this is to eventually have a computer program that will have the same communication skills as a human being. Once this is achieved it will open doors never thought possible in computing. After all the major problem today with computing is communication with the computer. Today’s computers don’t really understand our language and it is very difficult to learn computer language, plus computer language doesn’t correspond to the structure of human thought.