Using Supervised Machine Learning Models and Natural Language Processing for Identification of Fake News
DOI:
https://doi.org/10.69511/ijdsaa.v5i5.209Keywords:
Social Media, Fake News, Machine Learning, Feature Selection, Feature ExtractionAbstract
Social media has gained popularity over the last decade due to its ease of access and providing large amount of information to people. In seconds, users are able to access information from social media related to politics, life-style, science and money other fields. However, data obtained from social media platforms represent a mixture of fake and real news. Fake news are in-tended to deceive people and change their attitudes and beliefs. Machine learning algorithms have shown successful in classifying real from fake news. Nonetheless when applying machine learning models in this context related to limitations in the dataset type, balance or skewness. Hence, data pre-processing is essential prior to application of machine learning models. Therefore, this work evaluated the use of supervised machine learning models with different data pre-processing approaches for classification of fake news obtained from social media platforms. Different pre-processing techniques have been applied related to feature extraction and feature selection alongside four machine learning models being logistic regression, decision trees, random forest and extreme gradient boost. The findings showed that random forest and extreme gradient boost with bi-gram feature extraction and chi-squared feature selection showed the best performance. Future work involves using the proposed model to detecting fake news in different con-text and different languages.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Sidarth Mohan, Jolnar Assi, Ammar H Mohammed

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
International Journal of Data Science and Advanced Analytics (IJDSAA) is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. This license allows users to copy, distribute and transmit an article, adapt the article as long as the author is attributed and the article is not used for commercial purposes.
The author(s) confirms
- The manuscript submission has not been previously published, nor is it before another journal for consideration (or an explanation has been provided in Comments to the Editor).
- The published materials used in the manuscript were obtained permission for reproduction. (if any)