AROWOLO,, MICHEAL OLAOLU (2021) HYBRID DIMENSIONALITY REDUCTION MODEL FOR CLASSIFICATION OF RIBONUCLEIC ACID SEQUENCING MALARIA VECTOR DATASET. Other thesis, Landmark University, Omu Aran, Kwara State.
Text
AROWOLO, MICHEAL OLAOLU.pdf - Submitted Version Download (4MB) |
Abstract
Malaria is a life-threatening disease caused by plasmodium falciparum parasite and spread to people from infected mosquitoes. Gene expression data analysis is an essential procedure that reveals critical genes responsible for the biological processes involved in the infection and treatment of malaria in humans. Ribonucleic Acid Sequencing (RNA-Seq) is the technology that generates profiles of transcriptional data. This data is fundamental to a variety of scientific and clinical research and applications. The RNASeq data, in its raw form, is however blighted by noise, redundancy and other limitations associated with high dimensional data, thus making classification of genes challenging due to “curse of dimensionality” and becomes too computationally expensive for high dimensional data. Numerous approaches have been proposed to address the problem of “curse of dimensionality”. For instance, several dimensionality reduction, clustering and classification techniques have been suggested for analyzing RNA-Seq data. While these techniques detect interesting features in high dimensional data effectively, it is difficult to identify the relevant features of genes as there are inherent orthogonal problems, causing reductions to maximize its variances and making hidden correlation difficult. Essential information hidden in higher dimensions have been ignored, with some data loss and making classification output insufficient. The aim of this study is to overcome the limitations related to high dimensional data by introducing an optimized hybrid dimensionality reduction approach to better uncover relevant features for enhancing classification accuracy. GA-O, KNN, SVM, DT, ADA, BOOST, PCA, ICA, BAGGED This study involved, two hybrid dimensionality reduction techniques, experimented using the Anopheles gambiae dataset. They include an Optimized Genetic Algorithm. GA-O) and Principal Component Analysis (PCA) - (GA-O+PCA), and GA-O with Independent Component Analysis (ICA) - (GA-O+ICA). The low-dimensional data generated were then classified using the Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Decision Tree and Ensemble classifiers. Experimental results showed that (GA-O+ICA) using Ensemble classifier outperformed the other techniques with a 93% accuracy. To validate the performance of the proposed work, other approaches conducted yielded distinguishing performances of the classification accuracy with GA-O+ICA+SVM 91.7%, GA-O+ICA+KNN 90%, and GAO+PCA+DT 80% accuracies. This technique outperformed many existing methods and is thus very useful in significantly improving the performances of classification techniques. This study develops an enhanced approach in terms of computation, the obtained results are easily interpreted and can be used for the classification of other procedures and ailments. are can being can be a bare can be our can being can be a bare Keywords: RNA-Seq; Genetic Algorithm Optimization; Principal Component Analysis; Independent Component Analysis; Mosquito Anopheles; Machine Learning, Prediction; Support Vector Machine; Decision Tree; K-Nearest Neighbor, Ensemble.
Item Type: | Thesis (Other) |
---|---|
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
Divisions: | Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science |
Depositing User: | Mr DIGITAL CONTENT CREATOR LMU |
Date Deposited: | 31 May 2024 11:52 |
Last Modified: | 31 May 2024 11:52 |
URI: | https://eprints.lmu.edu.ng/id/eprint/5568 |
Actions (login required)
View Item |