What Goes Where In Calgary? A Garbage Classification System Based on Images and Natural Language

dc.contributor.advisorSouza, Roberto
dc.contributor.authorCazarin Filho, Jose Carlos
dc.contributor.committeememberDrew, Steve
dc.contributor.committeememberMessier, Geoffrey
dc.contributor.committeememberSouza, Roberto
dc.contributor.committeememberAbdel Latif, Ahmad
dc.date2026-06
dc.date.accessioned2025-11-03T16:02:20Z
dc.date.issued2025-10-28
dc.description.abstractDisposing of garbage using the correct trash bin is important because it maximizes recycling and is good for the environment. However, this is a challenging task for individuals without proper knowledge or training to dispose of garbage properly. Artificial Intelligence methods, deep learning in special, can be leveraged in this task. Most current deep learning systems assume that all the necessary information for garbage classification is contained in images. We hypothesize that combining images with natural language descriptions of the objects provided by the individual trying to dispose of the piece of garbage can add contextual information that may not be present in the image and vice-versa, and by combining these two sources of information, images and text, it is possible to achieve better garbage classification results when performing classification using either image- or text-only information. This thesis propose (1) a novel public benchmark dataset, which includes 20,000 images of garbage with corresponding text descriptions and class labels; (2) a multimodal garbage classification model based on what we call "Reverse Cross Attention" (RCA), which explores the complementarity of information between image and text. Our proposed model achieved improved results compared to unimodal models based solely on images or text and state-of-the-art multimodal models. Our work demonstrates that the proposed model outperforms the best unimodal results by an average of 2% across all metrics when combining text and image information using the RCA mechanism.
dc.identifier.citationCazarin Filho, J. C. (2025). What goes where in Calgary? A garbage classification system based on images and natural language (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.urihttps://hdl.handle.net/1880/123137
dc.identifier.urihttps://dx.doi.org/10.11575/PRISM/50681
dc.language.isoen
dc.publisher.facultySchulich School of Engineering
dc.publisher.institutionUniversity of Calgary
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectAttention mechanisms
dc.subjectgarbage classification
dc.subjectvision-language model
dc.subjectvision-language model
dc.subject.classificationArtificial Intelligence
dc.subject.classificationEngineering
dc.subject.classificationComputer Science
dc.titleWhat Goes Where In Calgary? A Garbage Classification System Based on Images and Natural Language
dc.typemaster thesis
thesis.degree.disciplineEngineering – Electrical & Computer
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameMaster of Science (MSc)
ucalgary.thesis.accesssetbystudentI do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible.

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2025_cazarin_jose.pdf
Size:
52.55 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: