WEB NEWS DOCUMENTS CLUSTERING IN INDONESIAN LANGUAGE USING SINGULAR VALUE DECOMPOSITION-PRINCIPAL COMPONENT ANALYSIS (SVDPCA) AND ANT ALGORITHMS

  • Arif Fadllullah Department of Informatics Engineering, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember
  • Dasrit Debora Kamudi Department of Informatics Engineering, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember, Politeknik Negeri Nusa Utara
  • Muhamad Nasir Department of Informatics Engineering, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember, Politeknik Negeri Bengkalis
  • Agus Zainal Arifin Department of Informatics Engineering, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember
  • Diana Purwitasari Department of Informatics Engineering, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember
Keywords: web news documents clustering, principal component analysis, singular value decomposition, dimension reduction, ant algorithms

Abstract

Ant-based document clustering is a cluster method of measuring text documents similarity based on the shortest path between nodes (trial phase) and determines the optimal clusters of sequence document similarity (dividing phase). The processing time of trial phase Ant algorithms to make document vectors is very long because of high dimensional Document-Term Matrix (DTM). In this paper, we proposed a document clustering method for optimizing dimension reduction using Singular Value Decomposition-Principal Component Analysis (SVDPCA) and Ant algorithms. SVDPCA reduces size of the DTM dimensions by converting freq-term of conventional DTM to score-pc of Document-PC Matrix (DPCM). Ant algorithms creates documents clustering using the vector space model based on the dimension reduction result of DPCM. The experimental results on 506 news documents in Indonesian language demonstrated that the proposed method worked well to optimize dimension reduction up to 99.7%. We could speed up execution time efficiently of the trial phase and maintain the best F-measure achieved from experiments was 0.88 (88%).

Published
2016-02-15
How to Cite
Fadllullah, A., Kamudi, D. D., Nasir, M., Arifin, A. Z., & Purwitasari, D. (2016). WEB NEWS DOCUMENTS CLUSTERING IN INDONESIAN LANGUAGE USING SINGULAR VALUE DECOMPOSITION-PRINCIPAL COMPONENT ANALYSIS (SVDPCA) AND ANT ALGORITHMS. Jurnal Ilmu Komputer Dan Informasi, 9(1), 17-25. https://doi.org/10.21609/jiki.v9i1.362