Dikwatta U, Fernando TGI & Ariyaratne MKA (2024). Exploring mechanisms for detecting violent content in Sinhala image posts: Rationale with unsupervised vs supervised techniques. International Journal of Research in Computing (IJRC)(ISSN 2820-2147 (For the on-line issues)(ISSN 2820-2139 (For the print issues)). General Sir John Kotelawala Defence University, Kandawala Road, Rathmalana, Sri Lanka.
Abstract:
This research explores the different avenues in machine learning to classify Sinhala image posts. Image posts in social media are one big weapon that conveys information directly to people. Image posts contain both visuals and text. English based research work is common in this regard, but only a handful can be seen from other languages. The target language was a low-resource language, Sinhala. Unsupervised algorithms were used to classify image posts and supervised algorithms were involved classifying manually extracted text in image posts. The classification decides whether the posts are violent or nonviolent. The trained supervised models were tested with interpretability models to identify the words that cause the decision of violent or nonviolent. The findings reveal supervised algorithms perform better than unsupervised algorithms in classifying image posts. However, improved results can be obtained by increasing the size and the variety of the dataset.