Predicting Credit Card Fraud on a Imbalanced Data

  • Wei Wen Soh Asia Pacific University of Technology & Innovation, Kuala Lumpur, Malaysia
  • Rika Mohd Yusuf Asia Pacific University of Technology & Innovation, Kuala Lumpur, Malaysia
Keywords: Credit Card Fraud, Data Mining, Prediction Model, Imbalanced Data, PCA


Credit card fraud is increasing considerably with the development of modern technology and the global superhighways of communication. Credit card fraudsters continuously try to come out with a new tactic challenged the present technology and system. It cost both, providers and consumers a lot of money. Thus, quick and accurate model become essential for companies and credit card providers, to decrease their financial and customer trust losses. However, there is a lack of published literature on credit card fraud detection techniques, due to the unlabeled credit card transactions dataset for researchers. High dimensional data refer to data that have multiple variables. The dataset consist of the credit card details, amount transaction, location, time, and personal details of the cardholders that are anonymized. Thus, in this study, a real-world dataset (European Credit card), with PCA transformation applied is being used. The common problem happened in this kind of research is the data tend to be imbalanced. Imbalanced data will often introduce bias which the accuracy of the prediction is not accurate. In this study, the dataset has been train with an oversampling pre-processing technique called SAS Sample and various data mining technique such as Random Forest, KNN, Decision Tree, and Logistic Regression. After several trials, we found out that the regression technique has the best performance among the others.

How to Cite
Soh, W. W., & Yusuf, R. (2019). Predicting Credit Card Fraud on a Imbalanced Data. International Journal of Data Science and Advanced Analytics (ISSN 2563-4429), 1(1), 12-17. Retrieved from