Leveraging Feature Exploitation to Automate Practical Machine Learning with Text, Image and Tabular Data

dc.contributor.advisorFar, Behrouz Homayoun
dc.contributor.advisorMohammed, Emad A.
dc.contributor.authorSharifi, Fatemeh
dc.contributor.committeememberMoshirpour, Mohammad
dc.contributor.committeememberMurari, Kartikeya
dc.date2021-06
dc.date.accessioned2021-02-24T15:11:41Z
dc.date.available2021-02-24T15:11:41Z
dc.date.issued2021-02-18
dc.description.abstractThere is a huge growth in the amount of data being generated in forms of tabular, text, and image data. Machine Learning (ML) is a powerful paradigm to support the knowledge discovery process from generated data to the knowledge that is useful in decision-making. It is paramount to have methods to find important features on different applications. To this direction, this dissertation investigates four distinct problems related to exploring ML tasks on predicting various types of data including text, image and table. The first two problems concentrated around tabular data, have the overarching goal of increasing Health-Related Quality of Life (HRQoL) used in treatment and care of prostate cancer patients. Specifically, I first propose a Cluster-based method to particularly exploit the most important features for the desired output. In the second problem, my objective is to identify the minimal set of important features which can predict 1-year follow-up HRQoL while adding interpretability to the proposed model. Using 5093 patients’ information with 1500 measures, the results support the use of the proposed ML technique as an essential tool in identifying predictable features and interpreting the findings. The third study corresponds to using Natural Language Processing (NLP) to propose a test case failure prediction approach for manual testing that can be used as a specification-based heuristic for test selection, prioritization, and reduction. I show that a simple linear regression model using the extracted NLP-based feature together with a typical history-based feature can accurately predict the test cases’ failure in new releases. The comparison of several proposed approaches on 41 releases of Mozilla Firefox over three projects, shows that the NLP-based feature can improve the prediction models. The last study focuses on image analysis for velocity picking in seismic data. Velocity analysis is a time-consuming task which is mostly performed manually. I develop a novel data-driven ensembling strategy for combining geophysical models with Convolutional Neural Network (CNN), which uses spatiotemporally varying image data for training and predicting purposes. We perform extensive experiments using nine field datasets and evidence better performance compared to current state-of-the-art method.en_US
dc.identifier.citationSharifi, F. (2021). Leveraging Feature Exploitation to Automate Practical Machine Learning with Text, Image and Tabular Data (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/38651
dc.identifier.urihttp://hdl.handle.net/1880/113112
dc.publisher.facultySchulich School of Engineering
dc.publisher.institutionUniversity of Calgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.en_US
dc.subject.classificationComputer Scienceen_US
dc.titleLeveraging Feature Exploitation to Automate Practical Machine Learning with Text, Image and Tabular Dataen_US
dc.typedoctoral thesisen_US
thesis.degree.disciplineEngineering – Electrical & Computeren_US
thesis.degree.grantorUniversity of Calgaryen_US
thesis.degree.nameDoctor of Philosophy (PhD)en_US
ucalgary.item.requestcopyfalse

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2021_sharifi_fatemeh.pdf
Size:
1.26 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: