A Review on Big Data Stream Processing Applications: Contributions, Benefits, and Limitations

Shaimaa Safaa Ahmed Alwaisi - Ministry of water resources/ planning and follow-up directorate, Baghdad, Iraq
Maan Nawaf Abbood - Imam Al-adham university College, Baghdad, Iraq
Luma Fayeq Jalil - AL Rasheed University College Computer Science Department Baghdad, Iraq
Shahreen Kasim - Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), Malaysia
Mohd Farhan Mohd Fudzee - Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), Malaysia
Ronal Hadi - Department of Information Technology, Politeknik Negeri Padang, West Sumatera, Indonesia
Mohd Arfian Ismail - Faculty of Computing, College of Computing and Applied Sciences, Universiti Malaysia Pahang, Malaysia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.5.4.737

Abstract


The amount of data in our world has been rapidly keep growing from time to time.  In the era of big data, the efficient processing and analysis of big data using machine learning algorithm is highly required, especially when the data comes in form of streams. There is no doubt that big data has become an important source of information and knowledge in making decision process. Nevertheless, dealing with this kind of data comes with great difficulties; thus, several techniques have been used in analyzing the data in the form of streams. Many techniques have been proposed and studied to handle big data and give decisions based on off-line batch analysis. Today, we need to make a constructive decision based on online streaming data analysis. Many researchers in recent years proposed some different kind of frameworks for processing the big data streaming. In this work, we explore and present in detail some of the recent achievements in big data streaming in term of contributions, benefits, and limitations. As well as some of recent platforms suitable to be used for big data streaming analytics. Moreover, we also highlight several issues that will be faced in big data stream processing. In conclusion, it is hoped that this study will assist the researchers in choosing the best and suitable framework for big data streaming projects.

Keywords


Big data; machine learning; Spark; Kafka; data streaming.

Full Text:

PDF

References


J. Shao, F. Huang, Q. Yang, and G. Luo. (2017). Robust prototype-based learning on data streams IEEE Transactions on Knowledge and Data Engineering (pp. 978-991). vol. 30,.

M. A. Ahmed, R. A. Hasan, A. H. Ali, and M. A. Mohammed. 2019.The classification of the modern arabic poetry using machine learning (pp. 2667-2674) Telkomnika, vol. 17.

A. H. Ali. 2019.A Survey on Vertical and Horizontal Scaling Platforms for Big Data Analytics, ( pp. 138-150) International Journal of Integrated Engineerin vol. 11.

W. I. Yudhistyra, E. M. Risal, I.-s. Raungratanaamporn, and V. Ratanavaraha, "Using Big Data Analytics for Decision Making: Analyzing Customer Behavior using Association Rule Mining in a Gold, Silver, and Precious Metal Trading Company in Indonesia," International Journal on Data Science, vol. 1, pp. 57-71, 2020.

A. H. Ali. 2020. Fuzzy generalized Hebbian algorithm for large-scale intrusion detection system ( pp. 81-90) International Journal of Integrated Engineering, vol. 12.

A. H. Ali and M. Z. Abdullah. (2018). Recent trends in distributed online stream processing platform for big data: Survey ( pp. 140-145) in 2018 1st Annual International Conference on Information and Sciences (AiCIS).

R. A. Hasan and M. N. Mohammed.( 2017). A krill herd behaviour inspired load balancing of tasks in cloud computing (pp. 413-424) Studies in Informatics and Control, vol. 26.

A. H. Ali and M. Z. Abdullah. (2019).A novel approach for big data classification based on hybrid parallel dimensionality reduction using spark cluster,Computer Science, vol. 20.

M. A. H. Ali. (2018). An Efficient Model for Data Classification Based on SVM Grid Parameter Optimization and PSO Feature Weight Selection, International Journal of Integrated Engineering.

S. Liang, E. Yilmaz, and E. Kanoulas.( 2018). Collaboratively tracking interests for user clustering in streams of short texts ( pp. 257-272) IEEE Transactions on Knowledge and Data Engineering, vol. 31.

N. AlNuaimi, M. M. Masud, M. A. Serhani, and N. Zaki.( 2019). Streaming feature selection algorithms for big data: A survey," Applied Computing and Informatics.

Y.-J. Lee, M. Lee, M.-Y. Lee, S. J. Hur, and O. Min.(2015) . Design of a scalable data stream channel for big data processing (pp. 537-540) in 2015 17th International Conference on Advanced Communication Technology (ICACT).

P. Le Noac'H, A. Costan, and L. Bougé.(2017). A performance evaluation of Apache Kafka in support of big data streaming applications (pp. 4803-4806) in 2017 IEEE International Conference on Big Data (Big Data).

S. Ramírez-Gallego, S. García, J. M. Benítez, and F. Herrera. (2018). A distributed evolutionary multivariate discretizer for big data processing on apache spark ( pp. 240-250) Swarm and Evolutionary Computation, vol. 38.

O. A. Hammood, M. N. M. Kahar, W. A. Hammood, R. A. Hasan, M. A. Mohammed, A. A. Yoob, et al. (2020). An effective transmit packet coding with trust-based relay nodes in VANETs ( pp. 685-697) Bulletin of Electrical Engineering and Informatics, vol. 9.

O. A. Hammood, M. N. M. Kahar, and M. N. Mohammed. (2017). Enhancement the video quality forwarding Using Receiver-Based Approach (URBA) in Vehicular Ad-Hoc Network(pp. 64-67) in 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET).

O. A. Hammood, M. N. M. Kahar, M. N. Mohammed, W. A. Hammood, and J. Sulaiman.( 2018) .The VANET-Solution Approach for Data Packet Forwarding Improvement (pp. 7423-7427) Advanced Science Letters, vol. 24.

O. A. Hammood, N. Nizam, M. Nafaa, and W. A. Hammood. (2019). RESP: Relay Suitability-based Routing Protocol for Video Streaming in Vehicular Ad Hoc Networks," International Journal of Computers, Communications & Control, vol. 14.

R. A. Hasan, M. A. Mohammed, Z. H. Salih, M. A. B. Ameedeen, N. Ţăpuş, and M. N. Mohammed. (2018). HSO: A Hybrid Swarm Optimization Algorithm for Reducing Energy Consumption in the Cloudlets ( pp. 2144-2154) Telkomnika, vol. 16..

R. A. Hasan, M. A. Mohammed, N. Ţăpuş, and O. A. Hammood . (2017).A comprehensive study: Ant Colony Optimization (ACO) for facility layout problem (pp. 1-8) in 2017 16th RoEduNet Conference: Networking in Education and Research (RoEduNet) .

A. Bifet, R. Gavaldà , G. Holmes, B. Pfahringer, and F. Bach. (2018). 11 Introduction to MOA and Its Ecosystem.

A. T. Vu, G. D. F. Morales, J. Gama, and A. Bifet. (2014).Distributed adaptive model rules for mining big data streams (pp. 345-353 in 2014 IEEE International Conference on Big Data (Big Data) .

M. A. Mohammed and R. A. Hasan. (2017). Particle swarm optimization for facility layout problems FLP—A comprehensive study ( pp. 93-99) in 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP).

M. A. Mohammed, R. A. Hasan, M. A. Ahmed, N. Tapus, M. A. Shanan, M. K. Khaleel, et al(.2018). A Focal load balancer based algorithm for task assignment in cloud environment ( pp. 1-4) in 2018 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI).

M. A. Mohammed, A. A. Kamil, R. A. Hasan, and N. Tapus.(2019). An Effective Context Sensitive Offloading System for Mobile Cloud Environments using Support Value-based Classification (pp. 687-698) Scalable Computing: Practice and Experience, vol. 20.

M. A. Mohammed, I. A. Mohammed, R. A. Hasan, N. Ţăpuş, A. H. Ali, and O. A. Hammood. (2019). Green Energy Sources: Issues and Challenges in 2019 18th RoEduNet Conference: Networking in Education and Research (RoEduNet) ( pp. 1-8).

A. Jain.( 2017 ). Mastering apache storm: Real-time big data streaming using kafka, hbase and redis: Packt Publishing Ltd,.

Z. H. Salih, G. T. Hasan, M. A. Mohammed, M. A. S. Klib, A. H. Ali, and R. A. Ibrahim.(2019). Study the Effect of Integrating the Solar Energy Source on Stability of Electrical Distribution System( pp. 443-447). in 2019 22nd International Conference on Control Systems and Computer Science (CSCS).

S. A.-b. Salman, A.-H. A. Salih, A. H. Ali, M. K. Khaleel, and M. A. Mohammed.(2018). A New Model for Iris Classification Based on Naïve Bayes Grid Parameters Optimization (pp. 150-155). International Journal of Sciences: Basic and Applied Research (IJSBAR), vol. 40.

M. A. Mohammed, Z. H. Salih, N. Ţăpuş, and R. A. K. Hasan. (2016). Security and accountability for sharing the data stored in the cloud (pp. 1-5). in 2016 15th RoEduNet Conference: Networking in Education and Research .

M. A. Mohammed and N. ŢĂPUŞ . (2017) . A Novel Approach of Reducing Energy Consumption by Utilizing Enthalpy in Mobile Cloud Computing (pp. 425-434). Studies in Informatics and Control, vol. 26.

N. Q. Mohammed, M. S. Ahmed, M. A. Mohammed, O. A. Hammood, H. A. N. Alshara, and A. A. Kamil,. 2019. Comparative Analysis between Solar and Wind Turbine Energy Sources in IoT Based on Economical and Efficiency Considerations ( pp. 448-452). in 2019 22nd International Conference on Control Systems and Computer Science (CSCS).

P. Karunaratne, S. Karunasekera, and A. Harwood. (2017). Distributed stream clustering using micro-clusters on Apache Storm ( pp. 74-84). Journal of Parallel and Distributed Computing, vol. 108.

M. A. A. Royida A. Ibrahem Alhayali, Yasmin Makki Mohialden, Ahmed H. Ali. (2020). Efficient method for breast cancer classification based on ensemble hoffeding tree and naïve Bayes (pp. 1074-1080). Indonesian Journal of Electrical Engineering and Computer Science, vol. 18.

A.-H. A. Salih, A. H. Ali, and N. Y. Hashim. Jaya: An Evolutionary Optimization Technique for Obtaining the Optimal Dthr Value of Evolving Clustering Method (ECM).

Z. H. Salih, G. T. Hasan, and M. A. Mohammed, "Investigate and analyze the levels of electromagnetic radiations emitted from underground power cables extended in modern cities.(2017). in 2017 9th International Conference on Electronics, Computers and Artificial Intelligence (ECAI).

Z. Alqadi, M. Abuzalata, Y. Eltous, and G. M. Qaryouti, "Analysis of fingerprint minutiae to form fingerprint identifier," JOIV: International Journal on Informatics Visualization, vol. 4, pp. 10-15, 2020.

L. Shi, Y. Wu, L. Liu, X. Sun, and L. Jiang. (2018). Event detection and identification of influential spreaders in social media data streams (pp. 34-46). Big Data Mining and Analytics, vol. 1.

T. Al-Khateeb, M. M. Masud, K. M. Al-Naami, S. E. Seker, A. M. Mustafa, L. Khan, et al. (2015). Recurring and novel class detection using class-based ensemble for evolving data stream. ( pp. 2752-2764). IEEE Transactions on Knowledge and Data Engineering, vol. 28.

Arif, L. N. U., Barakbah, A. R., Sudarsono, A. & Edelani, R. 2019. Big Data Environment for Realtime Earthquake Data Acquisition and Visualization. JOIV: International Journal on Informatics Visualization, 3, 365-376.

Y. Awasthi, "Press “A†for Artificial Intelligence in Agriculture: A Review," JOIV: International Journal on Informatics Visualization, vol. 4, pp. 112-116, 2020.