Classification of Industrial Relations Dispute Court Verdict Document with XGBoost and Bidirectional LSTM

Galih Wasis Wicaksono; Ulfah Nur Oktaviana; Said Noor Prasetyo; Tiara Intana Sari; Nur Putri Hidayah; Nur Rohim Yunus; Solahudin Al-Fatih

doi:10.30630/joiv.7.3-2.2373

Classification of Industrial Relations Dispute Court Verdict Document with XGBoost and Bidirectional LSTM

Galih Wicaksono - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Ulfah Nur Oktaviana - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Said Noor Prasetyo - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Tiara Intana Sari - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Nur Putri Hidayah - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia
Nur Rohim Yunus - Universitas Islam Negeri Syarif Hidayatullah Jakarta, Jakarta, Indonesia
Solahudin Al-Fatih - Universitas Muhammadiyah Malang, Malang, Jawa Timur, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.7.3-2.2373

Abstract

Industrial relations disputes (Perselisihan Hubungan Industrial (PHI)) are essential to examine because these disputes represent unbalanced bargaining positions between workers and corporations. On the other hand, there are many PHI documents, so they need to be classified and distinguished from other types of other decisions for other types of civil cases. PHI decisions document can be accessed openly from a special directory of civil courts. This ruling has similarities with other decisions regarding consumer protection or bankruptcy. This study used 450 documents consisting of 255 PHI court decisions and 255 non-PHI court decisions. This study takes the case as a classified part. We use several feature extractions and three methods: Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Bidirectional Long Short-Term Memory (Bi-LSTM). For SVM and XGBoost classifier, we utilize Frequency-inverse document frequency (TF-IDF). Another classifier needs word embedding Glove Wikipedia Indonesian with a dimension size of 50. Various experiments conducted found that the best classification results used Bi-LSTM with Gloves. This classification has 100% accuracy without overfitting. We found the second result using XGBoost with parameters optimized using random search, while the lowest accuracy results were obtained using the SVM method. The accuracy of the classification results in this study can impact the availability and quality of open legal knowledge that can be utilized by society and for future research.

Keywords

classification of court documents, Bidirectional LSTM, Extreme Gradient Boosting, Industrial Relations Disputes

Full Text:

PDF

References

R. Keeling et al., â€œEmpirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review,â€ in Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, Dec. 2019, pp. 2038â€“2042, doi: 10.1109/BigData47090.2019.9006248.

J. Lee and H. Lee, â€œA Comparison Study on Legal Document Classification Using Deep Neural Networks,â€ in ICTC 2019 - 10th International Conference on ICT Convergence: ICT Convergence Leading the Autonomous Future, Oct. 2019, pp. 926â€“928, doi: 10.1109/ICTC46691.2019.8939926.

M. Y. Noguti, E. Vellasques, and L. S. Oliveira, â€œLegal Document Classification: An Application to Law Area Prediction of Petitions to Public Prosecution Service,â€ Jul. 2020, doi: 10.1109/IJCNN48605.2020.9207211.

K. Dedes, A. B. P. Utama, A. P. Wibawa, A. N. Afandi, A. N. Handayani, and L. Hernandez, â€œNeural Machine Translation of Spanish-English Food Recipes Using LSTM,â€ JOIV Int. J. Informatics Vis., vol. 6, no. 2, pp. 290â€“297, Jun. 2022, doi: 10.30630/JOIV.6.2.804.

Y. Zhang, â€œResearch on text classification method based on lstm neural network model,â€ Proc. IEEE Asia-Pacific Conf. Image Process. Electron. Comput. IPEC 2021, pp. 1019â€“1022, Apr. 2021, doi: 10.1109/IPEC51340.2021.9421225.

R. Saputra, A. Waworuntu, and A. Rusli, â€œClassification of Indonesian News using LSTM-RNN Method,â€ Proc. 2021 6th Int. Conf. New Media Stud. CONMEDIA 2021, pp. 72â€“77, 2021, doi: 10.1109/CONMEDIA53104.2021.9617187.

S. Undavia, A. Meyers, and J. E. Ortega, â€œA comparative study of classifying legal documents with neural networks,â€ in Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, FedCSIS 2018, 2018, pp. 515â€“522, doi: 10.15439/2018F227.

M. Goudjil, M. Koudil, M. Bedda, and N. Ghoggali, â€œA Novel Active Learning Method Using SVM for Text Classification,â€ Int. J. Autom. Comput., vol. 15, no. 3, pp. 290â€“298, 2018, doi: 10.1007/s11633-015-0912-z.

N. Kalcheva, M. Karova, and I. Penev, â€œComparison of the accuracy of SVM kemel functions in text classification,â€ in Proceedings of the International Conference on Biomedical Innovations and Applications, BIA 2020, Sep. 2020, pp. 141â€“145, doi: 10.1109/BIA50171.2020.9244278.

C. A. E. Piter, S. Hadi, and I. N. Yulita, â€œMulti-Label Classification for Scientific Conference Activities Information Text Using Extreme Gradient Boost (XGBoost) Method,â€ in 2021 International Conference on Artificial Intelligence and Big Data Analytics, Oct. 2022, pp. 1â€“5, doi: 10.1109/icaibda53487.2021.9689699.

Z. Qi, â€œThe Text Classification of Theft Crime Based on TF-IDF and XGBoost Model,â€ in Proceedings of 2020 IEEE International Conference on Artificial Intelligence and Computer Applications, ICAICA 2020, Jun. 2020, pp. 1241â€“1246, doi: 10.1109/ICAICA50127.2020.9182555.

R. Anhar, T. B. Adji, and N. Akhmad Setiawan, â€œQuestion classification on question-answer system using bidirectional-LSTM,â€ Jul. 2019, doi: 10.1109/ICST47872.2019.9166190.

J. Li, Y. Xu, and H. Shi, â€œBidirectional LSTM with Hierarchical Attention for Text Classification,â€ in Proceedings of 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2019, Dec. 2019, pp. 456â€“459, doi: 10.1109/IAEAC47372.2019.8997969.

F. Hartono, R. Lim, and L. P. Dewi, â€œPembuatan Sistem Rumah Pintar dengan Voice Assistant di Raspberry Pi,â€ J. Infra, vol. 8, no. 1, pp. 82â€“88, Apr. 2020.

P. Verma, A. Goyal, and Y. Gigras, â€œEmail phishing: text classification using natural language processing,â€ Comput. Sci. Inf. Technol., vol. 1, no. 1, pp. 1â€“12, 2020, doi: 10.11591/csit.v1i1.p1-12.

M. Dwarampudi and N. V. S. Reddy, â€œEffects of padding on LSTMs and CNNs,â€ 2019.

J. Dr. MenyhÃ¡rt and J. H. Gomes Da Costa Cavalcanti, â€œLSI with Support Vector Machine for Text Categorization â€“ a practical example with Python,â€ Int. J. Eng. Manag. Sci., vol. 6, no. 3, pp. 18â€“29, 2021, doi: 10.21791/ijems.2021.3.2.

D. & E. A. A. Sudana, Seminar Tahunan Linguistik 2018, no. Setali. 2016.

S. Thongsuwan, S. Jaiyen, A. Padcharoen, and P. Agarwal, â€œConvXGB: A new deep learning model for classification problems based on CNN and XGBoost,â€ Nucl. Eng. Technol., vol. 53, no. 2, pp. 522â€“531, 2021, doi: 10.1016/j.net.2020.04.008.

C. W. Chen, S. P. Tseng, T. W. Kuan, and J. F. Wang, â€œOutpatient text classification using attention-based bidirectional LSTM for robot-assisted servicing in hospital,â€ Inf., vol. 11, no. 2, 2020, doi: 10.3390/info11020106.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, â€œA Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification,â€ Augment. Hum. Res., vol. 5, no. 1, 2020, doi: 10.1007/s41133-020-00032-0.

D. M. W. Powers, â€œEvaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation,â€ pp. 37â€“63, 2020.

R. Doon, T. Kumar Rawat, and S. Gautam, â€œCifar-10 classification using deep convolutional neural network,â€ in 1st International Conference on Data Science and Analytics, PuneCon 2018 - Proceedings, 2018, no. x, pp. 1â€“5, doi: 10.1109/PUNECON.2018.8745428.

Chandrapaul, R. Soni, S. Sharma, H. Fagna, and S. Mittal, â€œNews analysis using word cloud,â€ in Lecture Notes in Electrical Engineering, 2019, vol. 526, pp. 55â€“64, doi: 10.1007/978-981-13-2553-3_6.

A. Haidar, B. Verma, and R. Haidar, â€œA Swarm based Optimization of the XGBoost Parameters,â€ vol. 16, no. 4, pp. 74â€“81.

C. Bian, H. He, and S. Yang, â€œStacked bidirectional long short-term memory networks for state-of-charge estimation of lithium-ion batteries,â€ Energy, vol. 191, p. 116538, 2020, doi: 10.1016/j.energy.2019.116538.

T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, and F. Zhuang, â€œLightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification,â€ 2021.

G. Chen, P. Chen, Y. Shi, C.-Y. Hsieh, B. Liao, and S. Zhang, â€œRethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks,â€ 2019.

B. Å krlj, J. Kralj, N. LavraÄ, and S. Pollak, â€œTowards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture,â€ Mach. Learn. Knowl. Extr., vol. 1, no. 2, pp. 575â€“589, 2019, doi: 10.3390/make1020034.

S. Sun, Z. Cao, H. Zhu, and J. Zhao, â€œA Survey of Optimization Methods from a Machine Learning Perspective,â€ IEEE Trans. Cybern., vol. 50, no. 8, pp. 3668â€“3681, 2020, doi: 10.1109/TCYB.2019.2950779.

Z. Yang et al., â€œTextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing,â€ 2020, pp. 9â€“16, doi: 10.18653/v1/2020.acl-demos.2.

Username
Password
Remember me