Classification of Industrial Relations Dispute Court Verdict Document with XGBoost and Bidirectional LSTM

Galih Wicaksono - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Ulfah Nur Oktaviana - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Said Noor Prasetyo - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Tiara Intana Sari - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Nur Putri Hidayah - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Nur Rohim Yunus - Universitas Islam Negeri Syarif Hidayatullah Jakarta, Jakarta, Indonesia
Solahudin Al-Fatih - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.7.3-2.2373

Abstract


Industrial relations disputes (Perselisihan Hubungan Industrial (PHI)) are essential to examine because these disputes represent unbalanced bargaining positions between workers and corporations. On the other hand, there are many PHI documents, so they need to be classified and distinguished from other types of other decisions for other types of civil cases. PHI decisions document can be accessed openly from a special directory of civil courts. This ruling has similarities with other decisions regarding consumer protection or bankruptcy. This study used 450 documents consisting of 255 PHI court decisions and 255 non-PHI court decisions. This study takes the case as a classified part. We use several feature extractions and three methods: Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Bidirectional Long Short-Term Memory (Bi-LSTM). For SVM and XGBoost classifier, we utilize Frequency-inverse document frequency (TF-IDF). Another classifier needs word embedding Glove Wikipedia Indonesian with a dimension size of 50. Various experiments conducted found that the best classification results used Bi-LSTM with Gloves. This classification has 100% accuracy without overfitting. We found the second result using XGBoost with parameters optimized using random search, while the lowest accuracy results were obtained using the SVM method. The accuracy of the classification results in this study can impact the availability and quality of open legal knowledge that can be utilized by society and for future research.

Keywords


classification of court documents, Bidirectional LSTM, Extreme Gradient Boosting, Industrial Relations Disputes

Full Text:

PDF

References


R. Keeling et al., “Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review,†in Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, Dec. 2019, pp. 2038–2042, doi: 10.1109/BigData47090.2019.9006248.

J. Lee and H. Lee, “A Comparison Study on Legal Document Classification Using Deep Neural Networks,†in ICTC 2019 - 10th International Conference on ICT Convergence: ICT Convergence Leading the Autonomous Future, Oct. 2019, pp. 926–928, doi: 10.1109/ICTC46691.2019.8939926.

M. Y. Noguti, E. Vellasques, and L. S. Oliveira, “Legal Document Classification: An Application to Law Area Prediction of Petitions to Public Prosecution Service,†Jul. 2020, doi: 10.1109/IJCNN48605.2020.9207211.

K. Dedes, A. B. P. Utama, A. P. Wibawa, A. N. Afandi, A. N. Handayani, and L. Hernandez, “Neural Machine Translation of Spanish-English Food Recipes Using LSTM,†JOIV Int. J. Informatics Vis., vol. 6, no. 2, pp. 290–297, Jun. 2022, doi: 10.30630/JOIV.6.2.804.

Y. Zhang, “Research on text classification method based on lstm neural network model,†Proc. IEEE Asia-Pacific Conf. Image Process. Electron. Comput. IPEC 2021, pp. 1019–1022, Apr. 2021, doi: 10.1109/IPEC51340.2021.9421225.

R. Saputra, A. Waworuntu, and A. Rusli, “Classification of Indonesian News using LSTM-RNN Method,†Proc. 2021 6th Int. Conf. New Media Stud. CONMEDIA 2021, pp. 72–77, 2021, doi: 10.1109/CONMEDIA53104.2021.9617187.

S. Undavia, A. Meyers, and J. E. Ortega, “A comparative study of classifying legal documents with neural networks,†in Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, FedCSIS 2018, 2018, pp. 515–522, doi: 10.15439/2018F227.

M. Goudjil, M. Koudil, M. Bedda, and N. Ghoggali, “A Novel Active Learning Method Using SVM for Text Classification,†Int. J. Autom. Comput., vol. 15, no. 3, pp. 290–298, 2018, doi: 10.1007/s11633-015-0912-z.

N. Kalcheva, M. Karova, and I. Penev, “Comparison of the accuracy of SVM kemel functions in text classification,†in Proceedings of the International Conference on Biomedical Innovations and Applications, BIA 2020, Sep. 2020, pp. 141–145, doi: 10.1109/BIA50171.2020.9244278.

C. A. E. Piter, S. Hadi, and I. N. Yulita, “Multi-Label Classification for Scientific Conference Activities Information Text Using Extreme Gradient Boost (XGBoost) Method,†in 2021 International Conference on Artificial Intelligence and Big Data Analytics, Oct. 2022, pp. 1–5, doi: 10.1109/icaibda53487.2021.9689699.

Z. Qi, “The Text Classification of Theft Crime Based on TF-IDF and XGBoost Model,†in Proceedings of 2020 IEEE International Conference on Artificial Intelligence and Computer Applications, ICAICA 2020, Jun. 2020, pp. 1241–1246, doi: 10.1109/ICAICA50127.2020.9182555.

R. Anhar, T. B. Adji, and N. Akhmad Setiawan, “Question classification on question-answer system using bidirectional-LSTM,†Jul. 2019, doi: 10.1109/ICST47872.2019.9166190.

J. Li, Y. Xu, and H. Shi, “Bidirectional LSTM with Hierarchical Attention for Text Classification,†in Proceedings of 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2019, Dec. 2019, pp. 456–459, doi: 10.1109/IAEAC47372.2019.8997969.

F. Hartono, R. Lim, and L. P. Dewi, “Pembuatan Sistem Rumah Pintar dengan Voice Assistant di Raspberry Pi,†J. Infra, vol. 8, no. 1, pp. 82–88, Apr. 2020.

P. Verma, A. Goyal, and Y. Gigras, “Email phishing: text classification using natural language processing,†Comput. Sci. Inf. Technol., vol. 1, no. 1, pp. 1–12, 2020, doi: 10.11591/csit.v1i1.p1-12.

M. Dwarampudi and N. V. S. Reddy, “Effects of padding on LSTMs and CNNs,†2019.

J. Dr. Menyhárt and J. H. Gomes Da Costa Cavalcanti, “LSI with Support Vector Machine for Text Categorization – a practical example with Python,†Int. J. Eng. Manag. Sci., vol. 6, no. 3, pp. 18–29, 2021, doi: 10.21791/ijems.2021.3.2.

D. & E. A. A. Sudana, Seminar Tahunan Linguistik 2018, no. Setali. 2016.

S. Thongsuwan, S. Jaiyen, A. Padcharoen, and P. Agarwal, “ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost,†Nucl. Eng. Technol., vol. 53, no. 2, pp. 522–531, 2021, doi: 10.1016/j.net.2020.04.008.

C. W. Chen, S. P. Tseng, T. W. Kuan, and J. F. Wang, “Outpatient text classification using attention-based bidirectional LSTM for robot-assisted servicing in hospital,†Inf., vol. 11, no. 2, 2020, doi: 10.3390/info11020106.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification,†Augment. Hum. Res., vol. 5, no. 1, 2020, doi: 10.1007/s41133-020-00032-0.

D. M. W. Powers, “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation,†pp. 37–63, 2020.

R. Doon, T. Kumar Rawat, and S. Gautam, “Cifar-10 classification using deep convolutional neural network,†in 1st International Conference on Data Science and Analytics, PuneCon 2018 - Proceedings, 2018, no. x, pp. 1–5, doi: 10.1109/PUNECON.2018.8745428.

Chandrapaul, R. Soni, S. Sharma, H. Fagna, and S. Mittal, “News analysis using word cloud,†in Lecture Notes in Electrical Engineering, 2019, vol. 526, pp. 55–64, doi: 10.1007/978-981-13-2553-3_6.

A. Haidar, B. Verma, and R. Haidar, “A Swarm based Optimization of the XGBoost Parameters,†vol. 16, no. 4, pp. 74–81.

C. Bian, H. He, and S. Yang, “Stacked bidirectional long short-term memory networks for state-of-charge estimation of lithium-ion batteries,†Energy, vol. 191, p. 116538, 2020, doi: 10.1016/j.energy.2019.116538.

T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, and F. Zhuang, “LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification,†2021.

G. Chen, P. Chen, Y. Shi, C.-Y. Hsieh, B. Liao, and S. Zhang, “Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks,†2019.

B. Å krlj, J. Kralj, N. LavraÄ, and S. Pollak, “Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture,†Mach. Learn. Knowl. Extr., vol. 1, no. 2, pp. 575–589, 2019, doi: 10.3390/make1020034.

S. Sun, Z. Cao, H. Zhu, and J. Zhao, “A Survey of Optimization Methods from a Machine Learning Perspective,†IEEE Trans. Cybern., vol. 50, no. 8, pp. 3668–3681, 2020, doi: 10.1109/TCYB.2019.2950779.

Z. Yang et al., “TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing,†2020, pp. 9–16, doi: 10.18653/v1/2020.acl-demos.2.