Cancer Type Driver Classification Accuracy Using Spark ML Technology

Authors

  • Daniel Mago Vistro Asia Pacific University
  • Muhammad Shoaib Farooq School of System and Technology, University of Management and Technology, Pakistan
  • Attique Ur Rehman School of System and Technology, University of Management and Technology, Pakistan
  • Hafiz Abdullah Tanveer School of System and Technology, University of Management and Technology, Pakistan

DOI:

https://doi.org/10.69511/ijdsaa.v3i3.84

Keywords:

Tumor; genes; CAMUR; Hadoop; Apache Spark; BIGBIOL; MLlib; Thyroid cancer

Abstract

In this paper, analysis of genes extracted from the body has been performed that can be a driver of tumor, resulting in a cancer of different types like breast cancer etc. motived by the BIGBIOCL. Classifier with Alternative and Multiple Rule Based (CAMUR) is a core algorithm that is applied here to dissect large datasets. For the purpose to acquire the desire goal, Apache Spark as well as MLlib is used, on stack of Hadoop in local mode. The practice has been performed using the decision tree as well as random forest separately. As far as the deployed data is concerned, in terms of measurement of F and efficiency, random forest has shown the better results. For the objective of extraction of genes and other pertinent models, deletion of features has been performed with the deployment of iterative algorithm as proposed earlier CAMUR with modified version. Finally, the extracted results are facilitated to biologist, so they can analyzed the extraction is related or either can be a driver of cancer.

Downloads

Published

2021-06-04

How to Cite

Vistro, D. M., Farooq, M. S., Rehman, A. U., & Tanveer, H. A. (2021). Cancer Type Driver Classification Accuracy Using Spark ML Technology. International Journal of Data Science and Advanced Analytics, 3(1), 54–59. https://doi.org/10.69511/ijdsaa.v3i3.84

Issue

Section

Articles

Similar Articles

You may also start an advanced similarity search for this article.