Pengelompokkan Judul Buku dengan Menggunakan Algoritma K-Nearest Neighbor (K-NN) dan Term Frequency – Inverse Document Frequency (TF-IDF)

Fahrur Rozi, Farid Sukmana, Muhammad Nabil Adani

Abstract


Universitas Bhinneka PGRI Library has many collections in both printed and digital forms, which collections will increase over time. Thus the number of collections of books in the library will be more and more diverse, it will make the process of grouping existing collections difficult. The method used in this study is data mining with the K-Nearest Neighbors (K-NN) algorithm approach by combining TF-IDF as word frequency weighting. The stages of working on the K-NN method in this study went through 4 stages, namely: (1) text preprocessing by applying the tokenization method, case folding, stopword removal and stemming, (2) Word weighting using the TF-IDF method (3). Modeling the k value from a minimum limit of 1 to a maximum limit of 30. (4) Classification of data using the most optimal k value based on k value modeling. (5) discussion of classification results. Data collection techniques using literature studies and datasets. With this classification system, it is expected to provide useful information for users. In addition, this study also aims to implement the K-NN method by combining it with TF-IDF while at the same time knowing the accuracy of the sales prediction system. The results of this study are based on the highest accuracy value for the classification of book titles of 66.67% and the lowest accuracy value of 60% with an average accuracy value of 63.33%.

Kata kunci— Data Mining, K-Nearest Neighbor (K-NN), TF-IDF


References


A. Ali, M. Alrubei, L. F. M. Hassan, M. Al-Ja’afari, and S. Abdulwahed, “Diabetes classification based on KNN,” IIUM Eng. J., 2020, doi: 10.31436/iiumej.v21i1.1206.

Z. Chen, L. J. Zhou, X. Da Li, J. N. Zhang, and W. J. Huo, “The Lao text classification method based on KNN,” in Procedia Computer Science, 2020, doi: 10.1016/j.procs.2020.02.053.

M. A. Rofiqi, A. C. Fauzan, A. P. Agustin, and A. A. Saputra, “Implementasi Term-Frequency Inverse Document Frequency (TF-IDF) Untuk Mencari Relevansi Dokumen Berdasarkan Query,” Ilk. J. Comput. Sci. Appl. Informatics, 2019, doi: 10.28926/ilkomnika.v1i2.18.

J. Hu, H. Peng, J. Wang, and W. Yu, “kNN-P: A kNN classifier optimized by P systems,” Theor. Comput. Sci., 2020, doi: 10.1016/j.tcs.2020.01.001.

D. Ö. ?ahin and E. K?l?ç, “Two new feature selection metrics for text classification,” Automatika, 2019, doi: 10.1080/00051144.2019.1602293.

F. Rozi and F. Sukmana, “Document grouping by using meronyms and type-2 fuzzy association rule mining,” J. ICT Res. Appl., vol. 11, no. 3, 2017, doi: 10.5614/itbj.ict.res.appl.2017.11.3.4.

S. Xu, “Bayesian Naïve Bayes classifiers to text classification,” J. Inf. Sci., 2018, doi: 10.1177/0165551516677946.

F. Sukmana and F. Rozi, “Extraction keyterm in work order for decision support,” J. Theor. Appl. Inf. Technol., vol. 97, no. 22, pp. 3262–3272, 2019.

J. Hartmann, J. Huppertz, C. Schamp, and M. Heitmann, “Comparing automated text classification methods,” Int. J. Res. Mark., 2019, doi: 10.1016/j.ijresmar.2018.09.009.

P. J. S. Ferreira, J. M. P. Cardoso, and J. Mendes-Moreira, “KNN prototyping schemes for embedded human activity recognition with online learning,” Computers, 2020, doi: 10.3390/computers9040096.




DOI: http://dx.doi.org/10.51213/jimp.v6i3.346

Copyright (c) 2021 J I M P - Jurnal Informatika Merdeka Pasuruan