Eranga Jayalatharachchi, Asanka Wasala, Ruvan Weerasinghe
Sinhala, the majority language of Sri Lanka, is still in its infancy with respect to natural language processing research and applications. Spell checking is an important application which has received inadequate attention. One of the major issues with implementing a Sinhala spell checker is the deficiency of resources such as morphological analyzers, tagged corpora and comprehensive lexica. Due to the richness of Sinhala morphology, using an entirely rule based approach is deficient. An interesting alternative is to use data-driven approaches. This research attempts to improve the quality of Subasa, an existing n-gram based data driven spell checker using minimum edit distance techniques and to make the system freely available online. Our empirical results show that the proposed design improvements succeeded in improving the spell checking coverage. In addition, we also compare the performance of this system with others in the literature.
International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, Sri Lanka, Dec 13-14, 2012.