Searching Vocalized/Unvocalized Arabic Texts Using an Improved Coding Schema
Atoum, Jalal Omer
MetadataShow full item record
Searching for a pattern in an Arabic text raises various problems due to the association of vocalization characters with alphabetical letters of Arabic words. This feature causes a problem for existing searching algorithms. They either fail to find all partial matches of a pattern or they may suffer from performance degradation when they are simply modified to ignore these vocalization characters. This paper presents a new coding schema for Arabic vocalization characters that will facilitate and improve the performance of searching for vocalized and unvocalized patterns in any Arabic text (vocalized or unvocalized). This schema is based on repositioning the vocalization characters at the end of each word. We present in this paper the coding and decoding algorithms needed to support our new coding schema. In addition, we explain the modifications to the Boyer-Moore algorithm that take advantage of our improved coding schema together with the complexity analysis.