Data Mining Assignment # 01
Aisha Akbar (BSCS 2011) Page 1
Q1: What are the challenges of data mining?
Challenges Of Data Mining:
1. Handling of different types of data.
2. Efficiency and scalability of data mining algorithm.
3. Usefulness, certainly, and expressiveness of data mining results.
4. Expression of various kinds of data mining requests and result.
5. Interactive mining knowledge at multiple abstraction levels.
6. Mining information from different sources of data.
7. Protection of privacy and data security.
Q2: What is the use of mined information?
Uses Of Mined Information:
Data mining is also called knowledge discovery in database. In computer science, the process of discovering knowledge and relationship in large amount of data. This field combines from statistics and artificial intelligence with database management known as data set.
Data mining is used in business (banking, insurance), government security, science research etc.
Q3: How can we classify data miners?
Classification Of Data Mining Techniques:
…show more content…
DHP also reduced the size of database by not only trimming but also pruning the number of transaction in the database.
Q7: What is an interesting association rule?
An Interesting Association Rule:
All the discovered strong association rules are interesting enough to present.
Q8: How sampling can produce effective and efficient results?
Ans: The several applications required mining transaction data to capture the customer’s behavior. The efficiency of data mining is more important factor than requirement of accuracy of the result. As the size of database increase now a days very fastly, its can be an effective approach to data mining.
Q9: Describe offline and online data analysis techniques briefly?
Ans:
Q10: Define supervised and unsupervised learning in details?
Supervised And Unsupervised Learning:
In supervised, the model defines one set of observation, called
The K-Means algorithm is used for cluster analysis by dividing data points into k clusters. The K means algorithm will group the data into the cluster based on feature similarity.
McGonigle and Mastrian (2013) defines data mining as a process of utilizing software to sort through data so as to discover patterns and ascertain or establish relationships. They also state this process may help to discover or uncover previously unidentified relationships among the data in a database. Data mining is very important to healthcare organizations. It can help in ways such as to determine treatment effectiveness, identify problems, decrease costs for the organization, and can even detect possible fraudulent activity. Not only is data mining used in healthcare, but it is also used in other businesses as well. Although data mining is a great asset to healthcare, an informatics nurse has to be very careful due to the lack of a standardized
Big Data is a term used to refer to extremely large and complex data sets that have grown beyond the ability to manage and analyse them with traditional data processing tools. However, Big Data contains a lot of valuable information which if extracted successfully, it will help a lot for business, scientific research, to predict the upcoming epidemic and even determining traffic conditions in real time. Therefore, these data must be collected, organized, storage, search, sharing in a different way than usual. In this article, invite you and learn about Big Data, methods people use to exploit it and how it helps our life.
Data mining is the technique to interpret the data from other perspective and summarize the data so that the data can be useful information. Technically, data mining is a process to identify relations or patterns in the databases to predict the likelihood of future events. According to Eliason et al, there are three systems for healthcare organization to implement the mining data systems. The three systems are the analytics system, the content system and the deployment system. The analytics system is a system that used to collect all data such as patients clinical data, patients financial data, patients satisfactory data and other data. The content system is used to store all medical evidenced data. The deployment system is used to make new organization structure. There are several elements that consist in data mining which are first extract, transform and load transaction data onto the data warehouse system, second, store and manage the data in a multidimensional system, third, provide data access to information technology professionals, forth, analyze the data by application software and lastly, present the data in graph or table format.
Data mining has emerged as an important method to discover useful information, hidden patterns or rules from different types of datasets. Association rule mining is one of the dominating data mining technologies. Association rule mining is a process for finding associations or relations between data items or attributes in large datasets. Association rule is one of the most popular techniques and an important research issue in the area of data mining and knowledge discovery for many different purposes such as data analysis, decision support, patterns or correlations discovery on different types of datasets. Association rule mining has been proven to be a successful technique for extracting useful information from large datasets. Various algorithms or models were developed many of which have been applied in various application domains that include telecommunication networks, market analysis, risk management, inventory control and many others
There are various kinds of definitions about what data mining is. The authors in [1] define data mining as “the process of extracting previously unknown information from (usually large quantities of) data, which can, in the right context, lead to knowledge”. Data mining is widely used in areas such as business analysis, bioinformatics analysis, medical analysis, etc. Data mining techniques bring us a lot of benefits. Business companies can use data mining tools to search potential customers and increase their profits; medical diagnosis can use data mining to predict potential disease. Although the term “data mining” itself is neutral and has no ethical implications, it is often related to the analysis of information associated with individuals. “The ethical dilemmas arise when data mining is executed over the data of an individual” [2]. For example, using a user’s data to do data mining and classifying the user into some group may result in a variety of ethical issues. In this paper, we deal with two kinds of ethical issues caused by data mining techniques: informational privacy issues in web-data mining and database security issues in data mining. We also look at these ethical issues in a societal level and a global level.
6- Developing of new tools, for all the new fields of studying, and developing programs for data mining and analysis of huge databases.
A database is a structured collection of data. Data refers to the characteristics of people, things, and events. Oracle stores each data item in its own field. For example, a person's first name, date of birth, and their postal code are each stored in separate fields. The name of a field usually reflects...
The dynamics of our society bring many challenges and opportunities to the business world. Within the last decade, hundreds of jobs have emerged particularly in the technology sector to help keep up with the ever-changing world and to compete on a larger and better scale than the competition. Two key job markets and the basis of this research paper are business intelligence or BI and data mining or DM. These two fields play a very important role in small to large companies and are becoming higher desired sectors within the back offices of the workplace. This paper will explore what the meaning of BI and DM really is, how they are used and what we can expect as workers and learners of the technology and business fields for the future.
HAND, D. J., MANNILA, H., & SMYTH, P. (2001).Principles of data mining. Cambridge, Mass, MIT Press.
Moreover, E-commerce has widely recognized nowadays among people. Therefore such data should be secure in databases and privacy of data should be maintained.
Big data is a concept that has been misunderstood therefore I will be writing this paper with the intentions of thoroughly discussing this technological concept and all its dimensions with regard to what constitutes big data and how the term came about. The rapid innovations in Information Technology have brought about the realisation of big data. The concept of big data is complex and has different connotations but I intend to clarify its functions. Big data refers to the concept of a collection of large and complex amounts of data that are found extremely difficult to notate or even process by most on-hand devices and database technologies.
The data mining process will use the mapping function which involved the decision tree and also the neural network to develop. It needs the web server and the database server to be constructed in an operating database to record the browsing route of the users. The data mining will use to identify the user’s information and classify them into different classes using decision tree.
Humans can expand their knowledge to adapt the changing environment. To do that they must “learn”. Learning can be simply defined as the acquisition of knowledge or skills through study, experience, or being taught. Although learning is an easy task for most of the people, to acquire new knowledge or skills from data is too hard and complicated for machines. Moreover, the intelligence level of a machine is directly relevant to its learning capability. The study of machine learning tries to deal with this complicated task. In other words, machine learning is the branch of artificial intelligence that tries to find an answer to this question: how to make computer learn?
Adopting big data can also help the banking industry by saving them from lots of embarrassment resulting from increase in the number of customer which in turn requires banks to improve on their performance. As stated earlier banks are entrusted with lots of information and this information must be safe will be required to be accessed ready and in a timely fashion. The use a normal small database will not be enough to perform this operation and if banks don’t embrace the use of big data they might start to experience failure in there system.