Multi-Head Attention in Residual Networks to Improve Coral Reef Structure Classification

Eka Nuranti - Institut Teknologi Bacharuddin Jusuf Habibie, Jl. Balaikota, Parepare, 91122, Indonesia
Naili Intizhami - Institut Teknologi Bacharuddin Jusuf Habibie, Jl. Balaikota, Parepare, 91122, Indonesia
Muhammad Tassakka - Polytechnic of Marine and Fisheries Bone, Jl. Sungai Musi, Bone, 92719, Indonesia
Intan Areni - Universitas Hasanuddin, Jl. Poros Malino, Gowa, 92171, Indonesia
Osama Al Ghozy - Institut Teknologi Bacharuddin Jusuf Habibie, Jl. Balaikota, Parepare, 91122, Indonesia
Muhammad Jefri - , Institut Teknologi Bacharuddin Jusuf Habibie, Jl. Balaikota, Parepare, 91122, Indonesia

Citation Format:



Residual Networks (ResNet) mark a crucial advancement in convolutional neural network architecture, effectively tackling challenges like vanishing gradients for improved pattern detection in various image classification tasks. This study introduces a novel adaptation of the ResNet50 architecture that integrates a multi-head attention mechanism (MHA), coined MHA-ResNet50, for discerning coral reef structures within images. Strategic modifications are applied to the input of each stage, leading to the development of an MHA block, which is augmented by separable convolution. The deliberate inclusion of the MHA block at various stages in identity-block Resnet50, in adherence to multiscale gate principles, precedes its traversal through fully connected layers. Furthermore, we implemented the Stratified K-fold concept to ensure that each fold has a comparable proportion of each class. We successfully assessed the efficacy of the MHA-Resnet50 model in several MHA-block placement scenarios and saw improvements in the accuracy of coral reef structure predictions. The most optimal results were achieved by incorporating four attention blocks (MHA-ResNet50-4), yielding an accuracy rate of 85.23% in recognition of coral structure images, comprising a mere 409 images. This model showcases adaptability to small datasets while delivering commendable performance. The ResNet50 architecture undergoes enhancement in our proposed model by integrating multi-head attention, separable convolution, and multiscale gate principles. The MHA-ResNet50 model substantially advances accurately predicting coral reef structures, demonstrating adaptability to limited datasets. Future lines of this research involve digging deeper into the model design and using more significant amounts and classes of data to strengthen a more comprehensive range of generalizations.


Residual Networks; Convolutional Neural Network; Attention Mechanism; Coral Reef Classification

Full Text:



T. N. T. Arsad, E. A. Awalludin, Z. Bachok, W. N. J. H. W. Yussof, and M. S. Hitam, “A review of coral reef classification study using deep learning approach,” presented at the 1st International Postgraduate Conference On Ocean Engineering Technology And Informatics 2021 (IPCOETI 2021), Kuala Terengganu, Malaysia, 2023, p. 050005. doi: 10.1063/5.0110245.

A. Triwibowo, “Strategi Pengelolaan Ekosistem Terumbu Karang di Wilayah Pesisir,” J. Kelaut. Perikan. Terap., vol. 1, p. 61, Jan. 2023, doi: 10.15578/jkpt.v1i0.12048.

J. Kleypas et al., “Designing a blueprint for coral reef survival,” Biological Conservation, vol. 257, p. 109107, May 2021, doi: 10.1016/j.biocon.2021.109107.

Tri Aryono Hadi, Abrar Muhammad, Giyanto Giyanto, and Bayu Prayudha, The Status of Indonesian Coral Reefs 2019. Research Center for Oceanography, 2020. Accessed: Apr. 04, 2023. [Online]. Available:

M. I. S. Tassakka, I. Alsita, S. Sahari, and A. K. Admaja, “Identification of Coral Reef Ecosystems Using the Coral Point Count Application with Excel Extensions (CPCe) on Wangiwangi Island, Wakatobi, Indonesia,” 2023.

M. C. Ladd and A. A. Shantz, “Trophic interactions in coral reef restoration: A review,” Food Webs, vol. 24, p. e00149, Sep. 2020, doi: 10.1016/j.fooweb.2020.e00149.

A. King, S. M. Bhandarkar, and B. M. Hopkinson, “A Comparison of Deep Learning Methods for Semantic Segmentation of Coral Reef Survey Images,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT: IEEE, Jun. 2018, pp. 1475–14758. doi: 10.1109/CVPRW.2018.00188.

T. Boone-Sifuentes et al., “Marine-tree: A Large-scale Marine Organisms Dataset for Hierarchical Image Classification,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta GA USA: ACM, Oct. 2022, pp. 3838–3842. doi: 10.1145/3511808.3557634.

A. Raphael, Z. Dubinsky, D. Iluz, and N. S. Netanyahu, “Neural Network Recognition of Marine Benthos and Corals,” Diversity, vol. 12, no. 1, p. 29, Jan. 2020, doi: 10.3390/d12010029.

J. Borbon, J. Javier, J. Llamado, E. Dadios, and R. K. Billones, “Coral Health Identification using Image Classification and Convolutional Neural Networks,” in 2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Manila, Philippines: IEEE, Nov. 2021, pp. 1–6. doi: 10.1109/HNICEM54116.2021.9731905.

J. T. Ridge, P. C. Gray, A. E. Windle, and D. W. Johnston, “Deep learning for coastal resource conservation: automating detection of shellfish reefs,” Remote Sens Ecol Conserv, vol. 6, no. 4, pp. 431–440, Dec. 2020, doi: 10.1002/rse2.134.

A. Gómez-Ríos, S. Tabik, J. Luengo, A. S. M. Shihavuddin, and F. Herrera, “Coral species identification with texture or structure images using a two-level classifier based on Convolutional Neural Networks,” Knowledge-Based Systems, vol. 184, p. 104891, Nov. 2019, doi: 10.1016/j.knosys.2019.104891.

M. Gholoum, D. Bruce, and S. Alhazeem, “A new image classification approach for mapping coral density in State of Kuwait using high spatial resolution satellite images,” International Journal of Remote Sensing, vol. 40, no. 12, pp. 4787–4816, Jun. 2019, doi: 10.1080/01431161.2019.1574991.

O. Beijbom, P. J. Edmunds, D. I. Kline, B. G. Mitchell, and D. Kriegman, “Automated annotation of coral reef survey images,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI: IEEE, Jun. 2012, pp. 1170–1177. doi: 10.1109/CVPR.2012.6247798.

L. Picek, A. Říha, and A. Zita, “Coral Reef annotation, localisation and pixel-wise classification using Mask R-CNN and Bag of Tricks”.

R. Ardiwidjaja, “Pelestarian Tinggalan Budaya Bawah Air: Pemanfaatan Kapal Karam sebagai Daya Tarik Wisata Selam,” AMT, vol. 35, no. 2, p. 133, Dec. 2017, doi: 10.24832/amt.v35i2.251.

S. Jalali, P. J. Seekings, C. Tan, H. Z. W. Tan, J.-H. Lim, and E. A. Taylor, “Classification of marine organisms in underwater images using CQ-HMAX biologically inspired color approach,” in The 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA: IEEE, Aug. 2013, pp. 1–8. doi: 10.1109/IJCNN.2013.6707084.

M. Soriano, S. Marcos, C. Saloma, M. Quibilan, and P. Alino, “Image classification of coral reef components from underwater color video,” in MTS/IEEE Oceans 2001. An Ocean Odyssey. Conference Proceedings (IEEE Cat. No.01CH37295), Honolulu, HI, USA: Marine Technol. Soc, 2001, pp. 1008–1013. doi: 10.1109/OCEANS.2001.968254.

A. Mahmood et al., “Coral classification with hybrid feature representations,” in 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA: IEEE, Sep. 2016, pp. 519–523. doi: 10.1109/ICIP.2016.7532411.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.

S. Liu and W. Deng, “Very deep convolutional neural network based image classification using small training sample size,” in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia: IEEE, Nov. 2015, pp. 730–734. doi: 10.1109/ACPR.2015.7486599.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, Jun. 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.

A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., 2017. [Online]. Available:

E. Q. Nuranti, E. Yulianti, and H. S. Husin, “Predicting the Category and the Length of Punishment in Indonesian Courts Based on Previous Court Decision Documents,” Computers, vol. 11, no. 6, p. 88, May 2022, doi: 10.3390/computers11060088.

G. Yang, Y. He, Y. Yang, and B. Xu, “Fine-Grained Image Classification for Crop Disease Based on Attention Mechanism,” Front. Plant Sci., vol. 11, p. 600854, Dec. 2020, doi: 10.3389/fpls.2020.600854.

J. Li, K. Jin, D. Zhou, N. Kubota, and Z. Ju, “Attention mechanism-based CNN for facial expression recognition,” Neurocomputing, vol. 411, pp. 340–350, Oct. 2020, doi: 10.1016/j.neucom.2020.06.014.

G. Hong, X. Chen, J. Chen, M. Zhang, Y. Ren, and X. Zhang, “A multi-scale gated multi-head attention depthwise separable CNN model for recognizing COVID-19,” Sci Rep, vol. 11, no. 1, p. 18048, Sep. 2021, doi: 10.1038/s41598-021-97428-8.

H. B. Khoirullah, N. Yudistira, and F. A. Bachtiar, “Facial Expression Recognition Using Convolutional Neural Network with Attention Module,” JOIV : Int. J. Inform. Visualization, vol. 6, no. 4, p. 897, Dec. 2022, doi: 10.30630/joiv.6.4.963.

T.-H. Tan, Y.-L. Chang, J.-R. Wu, Y.-F. Chen, and M. Alkhaleefah, “Convolutional Neural Network with Multi-Head Attention for Human Activity Recognition,” IEEE Internet Things J., pp. 1–1, 2023, doi: 10.1109/JIOT.2023.3294421.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition.” arXiv, Dec. 10, 2015. Accessed: Oct. 17, 2023. [Online]. Available:

H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the Loss Landscape of Neural Nets”.

N. S. Inthizami et al., “Flood video segmentation on remotely sensed UAV using improved Efficient Neural Network,” ICT Express, vol. 8, no. 3, pp. 347–351, Sep. 2022, doi: 10.1016/j.icte.2022.01.016.

Y. Guo, Y. Li, L. Wang, and T. Rosing, “Depthwise Convolution Is All You Need for Learning Multiple Visual Domains,” AAAI, vol. 33, no. 01, pp. 8368–8375, Jul. 2019, doi: 10.1609/aaai.v33i01.33018368.

S. Prusty, S. Patnaik, and S. K. Dash, “SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer,” Front. Nanotechnol., vol. 4, p. 972421, Aug. 2022, doi: 10.3389/fnano.2022.972421.