Recent Trends in Document Clustering with Evolutionary-Based Algorithms

742 Words2 Pages

Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Previously, a number of statistical algorithms had been applied to perform clustering to the data including the text documents. There are recent endeavors to enhance the performance of the clustering with the optimization based algorithms such as the evolutionary algorithms. Thus, document clustering with evolutionary algorithms became an emerging topic that gained more attention in the recent years. This paper presents an up-to-date review fully devoted to evolutionary algorithms designed for document clustering. Its firstly provides comprehensive inspection to the document clustering model revealing its various components and related concepts. Then it shows and analyzes the principle research work in this topic. Finally, it brings together and classifies various objective functions from the collection of research papers. The paper ends up by addressing some important issues and challenges that can be subject of future work.

The objective function (or fitness function) is the measure that evaluates the optimality of the generated evolutionary algorithm solutions in the search space. In clustering domain, the fitness function refers to the adequacy of the partitioning. Accordingly, it needs to be formulated carefully, taken into consideration that the clustering is an unsupervised process.
Different objective functions generate different solutions even form the same evolutionary algorithm. Presuming also that the fitness could either be a minimization or a maximization function. Moreover, the algorithm could be formulated with one or with multi objective functions. To sum up, "choosing optimizati...

... middle of paper ...

...traction. 1999.
76. Turney, P.D., Learning algorithms for keyphrase extraction. Information Retrieval, 2000. 2(4): p. 303-336.
77. Wu, J.-l. and A.M. Agogino, Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms. Proceedings of the Hawaii International Conference on System Science, HICSS 2003, 2003.
78. Sathya, A.S. and B.P. Simon, A document retrieval system with combination terms using genetic algorithm. International Journal of Computer and Electrical Engineering, 2010. 2(1): p. 1-6.
79. Dorfer, V., et al. Optimization of keyword grouping in biomedical information retrieval using evolutionary algorithms. 2010.
80. Dorfer, V., et al., On the performance of evolutionary algorithms in biomedical keyword clustering, in Proceedings of the 13th annual conference companion on Genetic and evolutionary computation2011, ACM: Dublin, Ireland. p. 511-518.

More about Recent Trends in Document Clustering with Evolutionary-Based Algorithms

Open Document