Saturday, August 3, 2019
Essay --
Chapter Four: Related work there are several work and study on text category with Arabic text and every work take the study from some points and leave others depend on type of study. in [68] make classification for Arabic text and the result was that very robust and reliable without morphological analysis, in [71] make comparative study using N-Gram and using two measures, Manhattan measure and Diceââ¬â¢s measure and make comparison between them and the result was the N-Gram with Dice's measure better than using Manhattan measure and make experimental on four category, in other [83] Text Classification from Labeled and Unlabeled Documents using EM, Been proposed Algorithm used expectation - maximization with the naive Bayes classifier to learn from the documents labeled and non-labeled, The first step classifier using trains and documents named, and labels potentially Unnamed documents. And then trained on the new classifier using the labels for all the documents, and is repeated to convergence. many rese arches are proposed and presented for the problem of the Arabic text classification In this section we mention the main algorithms of these studies such as: Decision tree [36], KNN [37,38,39,40], NB [17,41,42], N-Gram frequency [5,45],Rocchio [4], SVM [19,21,43], and distance based classifier [ 46,47,48]. â⬠¢ Syiam et. al. [40] presented an intelligent Arabic text categorization system that used the KNN and Rocchio profile-based [50] classifiers to classify a set of Arabic text documents collected from three Egyptians news paper called Al Ahram, Al Gomhoria, and Al Akhbar during the period from August 1998 to September 2004. the corpus contains 1132 documents with 39468 words and cover six topics. Three approaches were adopted as pre... ... Agency website. The corpus contain 1562 documents of different lengths belongs to six categories. The documents were normalized and preprocessed by removing digits, foreign words, punctuation marks, and stop-words. The Chi square method was used for feature selection with various numbers of words ranging from 10 to 1000. The corpus was spied such as 70% of the documents were used for training the classifier while the remaining 30% of documents were used for testing. Three evaluation measures precision, recall, and F-measure were used to evaluate the performance of the NB classifier. Results showed that the NB classifier work well when the number of words grows. The NB classifier reach its peak for precision and F-measure when the number of selected words equal 800 words, while the peak for the recall measure was when the number of selected words equal to 700 words.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.