Essay On Probabilistic Topic Models

1819 Words4 Pages

INVESTIGATING TASK PERFORMANCE OF PROBABILISTIC TOPIC MODELS: AN EMPIRICAL STUDY OF PLSA AND LDA Introduction and Problem statement: This paper deals with the task performance of PLSA(Probabilistic Latent Semantic Analysis) and LDA(Latent Dirichlet Allocation). There has been lot of work done, reporting promising performance of topic models, but none of the work has systematically investigated the task performance of topic models. As a result, some critical questions that may affect the performance of all applications of topic models are mostly unanswered, particularly • how to choose between competing models? • how multiple local maxima affect task performance? • and how to set parameters in topic models? In this paper the author address these questions by conducting a systematic investigation of two representative probabilistic topic models PLSA and LDA using three representative text mining tasks, document clustering, text categorization, and ad-hoc retrieval. Important Terms: Probabilistic Topic Models: The basic idea behind Probabilistic topic models is that documents are mixtures of topics, where a topic is represented by a multinomial distribution of words. ϕw(j) = P(w/z=j) refer to the multinomial distribution over words for topic j and θj(d)=P(z=j/d) refer to the multinomial distribution over topics for document d. The parameters ϕ and θ indicate which words are important for which topic and which topics are important for a particular document respectively. Probabilistic Latent Semantic Analysis(PLSA): PLSA was introduced by Hoffman. A document d is regarded as a sample of the following mixture model. I.e probability distribution over words w for a given document d. the word-topic distributions ϕ an... ... middle of paper ... ...been answered. The authors address these problems in this current paper empirical study of plsa and lda. A paper by Chang et al.2009 conducts user studies to quantitatively compare the semantic meaning in topics inferred by PLSA and LDA. The focus is to quantify the interpretability of topics with human effort, The author of this paper(current) study the task performance of topic models in three standard text mining applications, which can be quantified objectively using standard measures. So this work is supplementary to theirs. Previous Work: As stated above there has been lot of work done reporting promising performance of topic models like results on text categorization in the original LDA paper(Blei et al.2003). Work done by Wei and Bruce Croft(2006) shows that LDA could improve the state of art information retreival in the language modeling framework. Etc.

Open Document