ARCHIVES

Original Article

AI-Driven Phishing Detection Using Natural Language Processing and Machine Learning

Sarveena S1Dhanusiya R2A. Raja3

¹ ² Department of Computer Science and Engineering (Cyber Security), United Institute of Technology, Coimbatore, Tamil Nadu, India. ³ Head of the Department, Department of Computer Science and Engineering (Cyber Security), United Institute of Technology, Coimbatore, Tamil Nadu, India.

Published Online: May-June 2026

Pages: 120-128

Abstract

View PDF

Phishing attacks represent one of the most persistent and damaging cybersecurity threats in the modern digital landscape, systematically exploiting human cognitive vulnerabilities to illicitly obtain sensitive information including login credentials, financial account data, and personal identity details. Conventional rule-based and blacklist-driven detection systems have demonstrated a pronounced inability to adapt to the rapidly evolving sophistication of contemporary phishing techniques, resulting in elevated false-positive rates, significant missed detections, and an ongoing reliance on labour-intensive manual maintenance. This paper presents a comprehensive AI-driven phishing detection framework that systematically integrates Natural Language Processing (NLP) and Machine Learning (ML) methodologies to substantially enhance both detection accuracy and operational robustness. The proposed system incorporates multi-stage text preprocessing, hybrid feature extraction combining Term Frequency–Inverse Document Frequency (TF-IDF) vectorisation and pre-trained word embeddings including Word2Vec and GloVe, alongside a comparative evaluation of supervised classification models encompassing Logistic Regression, Support Vector Machines (SVM), Random Forest, and Long Short-Term Memory (LSTM) deep learning networks. Experimental evaluation conducted across a combined dataset of 129,382 labelled email samples demonstrates that the proposed hybrid NLP-ML model substantially outperforms both traditional rule-based approaches and single-method ML baselines, with the LSTM classifier achieving 96.7% accuracy, 96.3% precision, 96.0% recall, and an F1-score of 96.1%. The principal contributions of this work include a rigorous comparative analysis of six machine learning architectures, a scalable and modular detection pipeline suitable for real-time deployment, a comprehensive feature importance analysis identifying key discriminative attributes, and actionable insights for enhancing operational phishing detection systems.

Related Articles

2026

AI-Based Stomach Cancer Detection Using Biomarkers, Medical Images, and Voice Analysis

2026

Hydrogen-Efficient Eco-Driving and Route Planning for Fuel-Cell Electric Vehicles Using Multi-Objective Optimization Under Traffic and Terrain Uncertainty

2026

A Data-Driven Machine Learning Framework for Assessing Patent Commercial Value and Technological Significance

2026

Soft Computing Approaches for Robust Analysis of Imbalanced and Noisy Data

2026

Smart Attendance System Using Face Recognition and Gaze-Based Attention Monitoring

2026

Analyzing Customer Review Sentiments using Machine Learning

2026

Agentic Artificial Intelligence as a Strategic HR Partner: Redefining Decision-Making Authority and Strategic Roles

2026

Solid Waste Management Rules, 2026 (India): Regulatory Design Review and Environmental Benefits for Urban Sustainability

2026

Optimizing Hospital Resource Utilization Using Power BI Analytics

2026

Contribution of Machine and Deep Learning methodologies in the identification of counterfeit currency notes