발간년도 : [2017]
논문정보 |
|
논문명(한글) |
[Vol.12, No.1] Paragraph-based K-Means Clustering by using Meaning-based Paragraph Division |
|
논문투고자 |
Sa-Joon Park, Jae-Ho Kim |
|
논문내용 |
As the number of electronic documents explosively increases, it becomes more and more difficult to retrieve information from them rapidly and accurately. To solve this problem, documents are clustered in various ways and generally K-Means algorithm is used to achieve it. K-Means algorithm is adequate to cluster so many documents rapidly and easily, but it does not consider the meaning of documents on clustering. In this research, we propose a document clustering technique of using meaning-based paragraphs. The proposed technique divides documents in a document set into meaning-based paragraphs by measuring similarity between sentences, chooses representative paragraphs having the maximum coherence value from each document, and then commits K-Means algorithm depending on them. In this paper, different from existing methods, we proposed a novel similarity function between two adjacent sentences by using WordNet as a ontology to calculate the similarity between words. And we introduced a method which can be used to calculate coherence of meaning-based paragraph by normalizing the sum of tf-idf value of words in the paragraph. We conducted experiments to prove the performance of the proposed technique by using the Reuter-21578 document set. The experimental result showed the document clustering technique of using meaning-based paragraphs improves the precision and the recall of document retrieval. |
|
첨부논문 |
|
|
|
|
|