TY - JOUR
T1 - URL filtering using big data analytics in 5G networks
AU - Khan, Nasir Ali
AU - Khan, Abid
AU - Ahmad, Mansoor
AU - Shah, Munam Ali
AU - Jeon, Gwanggil
N1 - Funding Information:
Mansoor Ahmed is a research fellow at Innovative Value Institute, Maynooth University Ireland under a Marie Sklodowska-Curie Actions (MSCA) /EU research funded project. He did his PhD from Vienna University of technology and postdoc fellow from Indiana University, USA and UCD, Ireland in 2011 and 2017. His research interest includes Semantic Web technologies, Information Security and Privacy.
Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/10
Y1 - 2021/10
N2 - The future generations networking technologies such as 5G and 6G will provide tremendous performance, network capacity, quality of service and connectivity. Therefore, the convergence of these with technologies with big data analytics in today's smart ecosystem will provide tremendous opportunities. The existing URL filtering techniques do not do real-time filtering, and lack fault-tolerance and scalability. We have addressed these issues and have developed a real-time, fault-tolerant and scalable machine learning based binary classification model, which handles streams of URL traffic and classifies it into obscene or clean material, in real-time. We have only used the URL based features for classification, and have still achieved a good accuracy of 93% on logistic regression classifier and 88%. Our model can filter 2 million URLs in 55 seconds. The proposed model achieved precision, recall and f1-score values of 0.92, 0.95 and 0.93 respectively.
AB - The future generations networking technologies such as 5G and 6G will provide tremendous performance, network capacity, quality of service and connectivity. Therefore, the convergence of these with technologies with big data analytics in today's smart ecosystem will provide tremendous opportunities. The existing URL filtering techniques do not do real-time filtering, and lack fault-tolerance and scalability. We have addressed these issues and have developed a real-time, fault-tolerant and scalable machine learning based binary classification model, which handles streams of URL traffic and classifies it into obscene or clean material, in real-time. We have only used the URL based features for classification, and have still achieved a good accuracy of 93% on logistic regression classifier and 88%. Our model can filter 2 million URLs in 55 seconds. The proposed model achieved precision, recall and f1-score values of 0.92, 0.95 and 0.93 respectively.
KW - Big data analytics
KW - Logistic regression
KW - Machine learning
KW - URL filtering
UR - http://www.scopus.com/inward/record.url?scp=85113273761&partnerID=8YFLogxK
U2 - 10.1016/j.compeleceng.2021.107379
DO - 10.1016/j.compeleceng.2021.107379
M3 - Article
AN - SCOPUS:85113273761
SN - 0045-7906
VL - 95
JO - Computers and Electrical Engineering
JF - Computers and Electrical Engineering
M1 - 107379
ER -