Inexact Matching of Proper Names in Sinhala

S. C. Fernando and Gihan Dias

ABSTRACT

With the growing use of information systems in Sri Lanka, the storage, search and retrieval of information in local languages are becoming essential, especially in public sector organisations. The majority of this information will include personal information containing proper names such as names of people and places.

Proper names can be spelled in different ways by different people based on their personal preference. For example, the name විශාඛා can be spelled in several ways such as විසාකා, විසාඛා or විශාකා in Sinhala. This problem becomes more prominent when a name from one language (e.g. Tamil) is written using another language (e.g. Sinhala).

Therefore, a search on an information store for a proper name will not find a match if a different spelling is used for search, from that in the store. In this paper, we present a solution to the above problem for the Sinhala language. We have developed a rule-based search application which, given a Sinhala input string, searches an information store and retrieves matching results even with a variant spellings.