CHAPTER 1 INTRODUCTION
1.1 Data Mining (Knowledge Discovery Process)
Data Mining (DM), or Knowledge Discovery is extraction of implicit, hidden trends, previously unknown, and useful information from data. DM research adopted many techniques from research areas like artificial intelligence, statistics and machine learning.
Stages in Data Mining:
1. Selection of data: selecting data to be analysed.
2. Preprocessing: Preprocessing of data to ensure consistent and common format. This is the data cleaning stage where certain information is removed and dicarded which is unnecessary and may slow down queries or missing value treatment is done. To make data consistent, it is reconfigured to proper format as there may inconsistent data formats because the data is collected from several sources e.g. information about sex may be recorded as f or m and as 1 or 0.
3. Transformation of data: the data are transformed into forms appropriate for analysis e.g. normalization. Some operations like summary or aggregation can be performed.
4. Data mining: Patterns are extraction, this stage is concerned with the extraction of patterns from the data.
5. Interpretation of patterns for decision
…show more content…
We can find and measure the strength of a relationship between two variables from the data e.g. co-variance. From a regression analysis, relationships can be discovered between dependent and independent variables. This type of analysis is used for prediction. Discriminant analysis is a classifier-based approach which categorizes the data based upon the combination of features/attributes that maximally separate the data. Clustering is an unsupervised classification of observations, data points, data items, or feature/attribute vectors to groups or clusters. We can group the data points into clusters or classes, so that objects within a cluster are more similar and very dissimilar to objects in different
The K-Means algorithm is used for cluster analysis by dividing data points into k clusters. The K means algorithm will group the data into the cluster based on feature similarity.
Furthermore, the methods applied convey “the techniques or procedures used to gather and analyze data that is
After this analysis of the data is done to sort out those subjective and the objective data,
In this stage, we need to examine the historical data to have the knowledge about what happened in the past. The descriptive analysis makes data more understandable, helpful and useful.
ACTDIK002 Foundation to Y2, Digital Technologies and Understanding. Recognize and explore patterns in data and represent data as diagrams, symbols and drawing (ACARA 2016 n.p.).
McGonigle and Mastrian (2013) defines data mining as a process of utilizing software to sort through data so as to discover patterns and ascertain or establish relationships. They also state this process may help to discover or uncover previously unidentified relationships among the data in a database. Data mining is very important to healthcare organizations. It can help in ways such as to determine treatment effectiveness, identify problems, decrease costs for the organization, and can even detect possible fraudulent activity. Not only is data mining used in healthcare, but it is also used in other businesses as well. Although data mining is a great asset to healthcare, an informatics nurse has to be very careful due to the lack of a standardized
Establishment of novel data mining methods in interpreting data 4. In short, provide a precise pattern on how to extract information from a data set and transform it into an understandable
... different layers such as ETL stage, SIF, BDW and how data is processed to generate reports according to the requirement. The processing of information from raw data to different processing stages culminating in coherent information is fascinating.
Data mining has emerged as an important method to discover useful information, hidden patterns or rules from different types of datasets. Association rule mining is one of the dominating data mining technologies. Association rule mining is a process for finding associations or relations between data items or attributes in large datasets. Association rule is one of the most popular techniques and an important research issue in the area of data mining and knowledge discovery for many different purposes such as data analysis, decision support, patterns or correlations discovery on different types of datasets. Association rule mining has been proven to be a successful technique for extracting useful information from large datasets. Various algorithms or models were developed many of which have been applied in various application domains that include telecommunication networks, market analysis, risk management, inventory control and many others
There are various kinds of definitions about what data mining is. The authors in [1] define data mining as “the process of extracting previously unknown information from (usually large quantities of) data, which can, in the right context, lead to knowledge”. Data mining is widely used in areas such as business analysis, bioinformatics analysis, medical analysis, etc. Data mining techniques bring us a lot of benefits. Business companies can use data mining tools to search potential customers and increase their profits; medical diagnosis can use data mining to predict potential disease. Although the term “data mining” itself is neutral and has no ethical implications, it is often related to the analysis of information associated with individuals. “The ethical dilemmas arise when data mining is executed over the data of an individual” [2]. For example, using a user’s data to do data mining and classifying the user into some group may result in a variety of ethical issues. In this paper, we deal with two kinds of ethical issues caused by data mining techniques: informational privacy issues in web-data mining and database security issues in data mining. We also look at these ethical issues in a societal level and a global level.
Data is collected and the patterns are recognized, in order to understand the physical properties, and further to visualize the data as
The dynamics of our society bring many challenges and opportunities to the business world. Within the last decade, hundreds of jobs have emerged particularly in the technology sector to help keep up with the ever-changing world and to compete on a larger and better scale than the competition. Two key job markets and the basis of this research paper are business intelligence or BI and data mining or DM. These two fields play a very important role in small to large companies and are becoming higher desired sectors within the back offices of the workplace. This paper will explore what the meaning of BI and DM really is, how they are used and what we can expect as workers and learners of the technology and business fields for the future.
HAND, D. J., MANNILA, H., & SMYTH, P. (2001).Principles of data mining. Cambridge, Mass, MIT Press.
The data mining process will use the mapping function which involved the decision tree and also the neural network to develop. It needs the web server and the database server to be constructed in an operating database to record the browsing route of the users. The data mining will use to identify the user’s information and classify them into different classes using decision tree.
The two terms clustering and classification are extensively used throughout this thesis. The question that rises at this point is: are they synonymous or is there a distinction...