International Journal of Data Science and Advanced Analytics en-US <p><a href="" rel="license"><img style="border-width: 0;" src="" alt="Creative Commons License"></a><br>International Journal of Data Science and Advanced Analytics (IJDSAA) is licensed under a <a href="" rel="license">Creative Commons Attribution-NonCommercial 4.0 International License</a>. This license allows users to copy, distribute and transmit an article, adapt the article as long as the author is attributed and the article is not used for commercial purposes.</p> <p>The author(s) confirms</p> <ul> <li class="show">The manuscript submission has not been previously published, nor is it before another journal for consideration (or an explanation has been provided in Comments to the Editor).</li> <li class="show">The published materials used in the manuscript were obtained permission for reproduction. (if any)</li> </ul> (Manoj Jayabalan) (Deepi) Sat, 09 Feb 2019 17:27:43 +0000 OJS 60 Predicting Credit Card Fraud on a Imbalanced Data <p>Credit card fraud is increasing considerably with the development of modern technology and the global superhighways of communication. Credit card fraudsters continuously try to come out with a new tactic challenged the present technology and system. It cost both, providers and consumers a lot of money. Thus, quick and accurate model become essential for companies and credit card providers, to decrease their financial and customer trust losses. However, there is a lack of published literature on credit card fraud detection techniques, due to the unlabeled credit card transactions dataset for researchers. High dimensional data refer to data that have multiple variables. The dataset consist of the credit card details, amount transaction, location, time, and personal details of the cardholders that are anonymized. Thus, in this study, a real-world dataset (European Credit card), with PCA transformation applied is being used. The common problem happened in this kind of research is the data tend to be imbalanced. Imbalanced data will often introduce bias which the accuracy of the prediction is not accurate. In this study, the dataset has been train with an oversampling pre-processing technique called SAS Sample and various data mining technique such as Random Forest, KNN, Decision Tree, and Logistic Regression. After several trials, we found out that the regression technique has the best performance among the others.</p> Wei Wen Soh, Rika Mohd Yusuf (Author) ##submission.copyrightStatement## Sat, 20 Apr 2019 00:00:00 +0000 A Comparison of Data Mining Algorithms for Liver Disease Prediction on Imbalanced Data <p>Liver is one of the most important organs in the human body but due to unhealthy lifestyle and excessive alcohol intake, liver disease has been increasing at an alarming rate globally hence it calls for an immediate attention to predict the disease before it is too late. However, medical data is often associated to be imbalanced and complex. Hence, the aim of this project is to investigate the data mining algorithm to predict liver disease on imbalanced data through random sampling. Results are compared and analysed based on accuracy and ROC index. K-Nearest Neighbour (k-NN) outperforms the other algorithms such as Logistic Regression, AutoNeural and Random Forest with the accuracy of 99.794%. As a conclusion, the model proposed in this research is performing better than past researchers conducted on Andhra Pradesh liver disease dataset.</p> Ain Najwa Arbain, B. Yushalinie Pillay Balakrishnan (Author) ##submission.copyrightStatement## Sat, 09 Feb 2019 00:00:00 +0000