Text-Based Content Analysis on Social Media Using Topic Modelling to Support Digital Marketing

Gandhi Surya Buana - Sepuluh Nopember Institute of Technology Jl. Teknik Kimia, Surabaya 60111, Indonesia
Raras Tyasnurita - Sepuluh Nopember Institute of Technology Jl. Teknik Kimia, Surabaya 60111, Indonesia
Nindita Cahya Puspita - Sepuluh Nopember Institute of Technology Jl. Teknik Kimia, Surabaya 60111, Indonesia
Retno Aulia Vinarti - Sepuluh Nopember Institute of Technology Jl. Teknik Kimia, Surabaya 60111, Indonesia
Faizal Mahananto - Sepuluh Nopember Institute of Technology Jl. Teknik Kimia, Surabaya 60111, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.62527/joiv.8.1.1636


This study aims to create Social Media Analytics (SMA) tools to help Digital Marketers or Content Creators create content topics for creating text-based Instagram content and support digital marketing strategy. Since no SMA tools can provide topic discovery for text-based Instagram content, this research aims to make an SMA tool. The data requirements to make an SMA tool include content text, content caption text, likes, comments, upload time, and content category obtained through the Instascrapper. The method used in this study is the Topic Modelling method using the Latent Dirichlet Allocation (LDA) approach to find the most dominant topic in the content. Optical Character Recognition (OCR) performs an image transformation process to extract text from text-based Instagram content images. The results of SMA tool creation are tested on three expert users, which shows that 93% of test participants could use the SMA to find topic references, and 85% can still be used by users even though they find it difficult. Since the test result shows that SMA tools still need development, for further research, SMA tools can focus on developing the user experience to increase the value of user acceptance by paying attention to the ease of the SMA tools. Also, SMA tools can focus on target users such as Data Analysts, Business Intelligence Analysts, or others within a company to support decision-making for the marketing department.


Digital marketing; Instagram; latent dirichlet allocation; optical character recognition; social media analytics; topic modelling

Full Text:



Napoleoncat, “Instagram users in Indonesia,” https://napoleoncat.com/stats/instagram-users-in-indonesia/2021/08/, Aug. 2021.

Instagram, “What’s trending – info-social: Creativity of the Movement,” https://business.instagram.com/blog/trends-info-social-posts, Dec. 21, 2020.

Y. Li and Y. Xie, “Is a Picture Worth a Thousand Words? An Empirical Study of Image Content and Social Media Engagement,” Journal of Marketing Research, vol. 57, no. 1, pp. 1–19, Feb. 2020, doi: 10.1177/0022243719881113.

R. Rietveld, W. Van Dolen, M. Mazloom, and M. Worring, “What you Feel, Is what you like Influence of Message Appeals on Customer Engagement on Instagram,” Journal of Interactive Marketing, vol. 49, no. 1, pp. 20–53, Feb. 2020, doi: 10.1016/j.intmar.2019.06.003.

A. M. Rafi, S. Rana, R. Kaur, Q. M. J. Wu, and P. Moradian Zadeh, “Understanding Global Reaction to the Recent Outbreaks of COVID-19: Insights from Instagram Data Analysis,” in 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, Oct. 2020, pp. 3413–3420. doi: 10.1109/SMC42975.2020.9283376.

L. D. Hollebeek and K. Macky, “Digital Content Marketing’s Role in Fostering Consumer Engagement, Trust, and Value: Framework, Fundamental Propositions, and Implications,” Journal of Interactive Marketing, vol. 45, pp. 27–41, Feb. 2019, doi: 10.1016/j.intmar.2018.07.003.

Y. A. Argyris, Z. Wang, Y. Kim, and Z. Yin, “The effects of visual congruence on increasing consumers’ brand engagement: An empirical investigation of influencer marketing on instagram using deep-learning algorithms for automatic image classification,” Comput Human Behav, vol. 112, p. 106443, Nov. 2020, doi: 10.1016/j.chb.2020.106443.

S. Acar, M. Neumayer, and C. Burnett, “Social Media Use and Creativity: Exploring the Influences on Ideational Behavior and Creative Activity,” J Creat Behav, vol. 55, no. 1, pp. 39–52, Mar. 2021, doi: 10.1002/jocb.432.

S. Ozcan, M. Suloglu, C. O. Sakar, and S. Chatufale, “Social media mining for ideation: Identification of sustainable solutions and opinions,” Technovation, vol. 107, p. 102322, Sep. 2021, doi: 10.1016/j.technovation.2021.102322.

D. Chaffey, “Recommended tools and techniques to support content ideation,” https://www.davechaffey.com/digital-marketing-glossary/content-ideation/, 2020.

S. S. Madila, M. A. Dida, and S. Kaijage, “Tourism SME’s Adoption of Social Media Analytics Tools and Technology,” African Journal of Hospitality, Tourism and Leisure, no. 11(1)2022, pp. 239–247, Feb. 2022, doi: 10.46222/ajhtl.19770720.223.

Y. Wang, M. Rod, Q. Deng, and S. Ji, “Exploiting business networks in the age of social media: the use and integration of social media analytics in B2B marketing,” Journal of Business & Industrial Marketing, vol. 36, no. 12, pp. 2139–2149, Nov. 2021, doi: 10.1108/JBIM-05-2019-0173.

P. Ducange, M. Fazzolari, M. Petrocchi, and M. Vecchio, “An effective Decision Support System for social media listening based on cross-source sentiment analysis models,” Eng Appl Artif Intell, vol. 78, pp. 71–85, Feb. 2019, doi: 10.1016/j.engappai.2018.10.014.

M. T. Ballestar, M. Cuerdo-Mir, and M. T. Freire-Rubio, “The Concept of Sustainability on Social Media: A Social Listening Approach,” Sustainability, vol. 12, no. 5, p. 2122, Mar. 2020, doi: 10.3390/su12052122.

W. Pearce et al., “Visual cross-platform analysis: digital methods to research social media images,” Inf Commun Soc, vol. 23, no. 2, pp. 161–180, Jan. 2020, doi: 10.1080/1369118X.2018.1486871.

D. Akinbade, A. O. Ogunde, M. O. Odim, and B. O. Oguntunde, “An Adaptive Thresholding Algorithm-Based Optical Character Recognition System for Information Extraction in Complex Images,” Journal of Computer Science, vol. 16, no. 6, pp. 784–801, Jun. 2020, doi: 10.3844/jcssp.2020.784.801.

J. Park, E. Lee, Y. Kim, I. Kang, H. Il Koo, and N. I. Cho, “Multi-Lingual Optical Character Recognition System Using the Reinforcement Learning of Character Segmenter,” IEEE Access, vol. 8, pp. 174437–174448, 2020, doi: 10.1109/ACCESS.2020.3025769.

W. Suteddy, D. A. R. Agustini, A. Adiwilaga, and D. A. Atmanto, “End-To-End Evaluation of Deep Learning Architectures for Off-Line Handwriting Writer Identification: A Comparative Study,” JOIV : International Journal on Informatics Visualization, vol. 7, no. 1, p. 178, Feb. 2023, doi: 10.30630/joiv.7.1.1293.

Á. MacDermott, M. Motylinski, F. Iqbal, K. Stamp, M. Hussain, and A. Marrington, “Using deep learning to detect social media ‘trolls,’” Forensic Science International: Digital Investigation, vol. 43, p. 301446, Sep. 2022, doi: 10.1016/j.fsidi.2022.301446.

Hubert, P. Phoenix, R. Sudaryono, and D. Suhartono, “Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier,” Procedia Comput Sci, vol. 179, pp. 498–506, 2021, doi: 10.1016/j.procs.2021.01.033.

B. Jeong, J. Yoon, and J.-M. Lee, “Social media mining for product planning: A product opportunity mining approach based on topic modeling and sentiment analysis,” Int J Inf Manage, vol. 48, pp. 280–290, Oct. 2019, doi: 10.1016/j.ijinfomgt.2017.09.009.

N. Hu, T. Zhang, B. Gao, and I. Bose, “What do hotel customers complain about? Text analysis using structural topic model,” Tour Manag, vol. 72, pp. 417–426, Jun. 2019, doi: 10.1016/j.tourman.2019.01.002.

M. Reisenbichler and T. Reutterer, “Topic modeling in marketing: recent advances and research opportunities,” Journal of Business Economics, vol. 89, no. 3, pp. 327–356, Apr. 2019, doi: 10.1007/s11573-018-0915-7.

C. Thorat, A. Bhat, P. Sawant, I. Bartakke, and S. Shirsath, “A Detailed Review on Text Extraction Using Optical Character Recognition,” 2022, pp. 719–728. doi: 10.1007/978-981-16-5655-2_69.

J. Memon, M. Sami, R. A. Khan, and M. Uddin, “Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR),” IEEE Access, vol. 8, pp. 142642–142668, 2020, doi: 10.1109/ACCESS.2020.3012542.

M. Zulqarnain, R. Ghazali, M. G. Ghouse, and M. F. Mushtaq, “Efficient processing of GRU based on word embedding for text classification,” JOIV : International Journal on Informatics Visualization, vol. 3, no. 4, Nov. 2019, doi: 10.30630/joiv.3.4.289.

U. Salimah, V. Maharani, and R. Nursyanti, “Automatic License Plate Recognition Using Optical Character Recognition,” IOP Conf Ser Mater Sci Eng, vol. 1115, no. 1, p. 012023, Mar. 2021, doi: 10.1088/1757-899X/1115/1/012023.

J. R. Saura, D. Palacios-Marqués, and D. Ribeiro-Soriano, “Exploring the boundaries of open innovation: Evidence from social media mining,” Technovation, vol. 119, p. 102447, Jan. 2023, doi: 10.1016/j.technovation.2021.102447.

A. Geissinger, C. Laurell, C. Öberg, and C. Sandström, “Social media analytics for innovation management research: A systematic literature review and future research agenda,” Technovation, vol. 123, p. 102712, May 2023, doi: 10.1016/j.technovation.2023.102712.

K. R. Fitzpatrick and P. L. Weissman, “Public relations in the age of data: corporate perspectives on social media analytics (SMA),” Journal of Communication Management, vol. 25, no. 4, pp. 401–416, Nov. 2021, doi: 10.1108/JCOM-09-2020-0092.

J. Choi, J. Yoon, J. Chung, B.-Y. Coh, and J.-M. Lee, “Social media analytics and business intelligence research: A systematic review,” Inf Process Manag, vol. 57, no. 6, p. 102279, Nov. 2020, doi: 10.1016/j.ipm.2020.102279.

H. Shahbaznezhad, R. Dolan, and M. Rashidirad, “The Role of Social Media Content Format and Platform in Users’ Engagement Behavior,” Journal of Interactive Marketing, vol. 53, pp. 47–65, Feb. 2021, doi: 10.1016/j.intmar.2020.05.001.

Instagram, “View Account Insights on Instagram,” https://help.instagram.com/1533933820244654, 2023.

sproutsocial, “Social Media Analytics,” https://sproutsocial.com/features/social-media-analytics/, 2023.

Brandwatch, “Brandwatch,” https://www.brandwatch.com/, 2023.

U. Ruhi, “Social Media Analytics as a Business Intelligence Practice:Current Landscape & Future Prospects,” Journal of Internet Social Networking & Virtual Communities, pp. 1–12, Jul. 2014, doi: 10.5171/2014.920553.

Z. Mottaghinia, M.-R. Feizi-Derakhshi, L. Farzinvash, and P. Salehpour, “A review of approaches for topic detection in Twitter,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 33, no. 5, pp. 747–773, Sep. 2021, doi: 10.1080/0952813X.2020.1785019.

A. R. Rahmanti, D. N. A. Ningrum, L. Lazuardi, H.-C. Yang, and Y.-C. Li, “Social Media Data Analytics for Outbreak Risk Communication: Public Attention on the ‘New Normal’ During the COVID-19 Pandemic in Indonesia,” Comput Methods Programs Biomed, vol. 205, p. 106083, Jun. 2021, doi: 10.1016/j.cmpb.2021.106083.

D. Amanatidis, I. Mylona, I. (Eirini) Kamenidou, S. Mamalis, and A. Stavrianea, “Mining Textual and Imagery Instagram Data during the COVID-19 Pandemic,” Applied Sciences, vol. 11, no. 9, p. 4281, May 2021, doi: 10.3390/app11094281.

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing For Student Complaint Document Classification Using Sastrawi,” IOP Conf Ser Mater Sci Eng, vol. 874, no. 1, p. 012017, Jun. 2020, doi: 10.1088/1757-899X/874/1/012017.

E. S. Negara, D. Triadi, and R. Andryani, “Topic Modelling Twitter Data with Latent Dirichlet Allocation Method,” in 2019 International Conference on Electrical Engineering and Computer Science (ICECOS), IEEE, Oct. 2019, pp. 386–390. doi: 10.1109/ICECOS47637.2019.8984523.

S. J. Blair, Y. Bi, and M. D. Mulvenna, “Aggregated topic models for increasing social media topic coherence,” Applied Intelligence, vol. 50, no. 1, pp. 138–156, Jan. 2020, doi: 10.1007/s10489-019-01438-z.

J. A. Wahid et al., “Topic2Labels: A framework to annotate and classify the social media data through LDA topics and deep learning models for crisis response,” Expert Syst Appl, vol. 195, p. 116562, Jun. 2022, doi: 10.1016/j.eswa.2022.116562.

S. J. Blair, Y. Bi, and M. D. Mulvenna, “Aggregated topic models for increasing social media topic coherence,” Applied Intelligence, vol. 50, no. 1, pp. 138–156, Jan. 2020, doi: 10.1007/s10489-019-01438-z.

K. Porter, “Analyzing the DarkNetMarkets subreddit for evolutions of tools and trends using LDA topic modeling,” Digit Investig, vol. 26, pp. S87–S97, Jul. 2018, doi: 10.1016/j.diin.2018.04.023.

Instagram, “dibimbing.id,” https://www.instagram.com/dibimbing.id/, 2023.