• Login
    View Item 
    •   DSpace Home
    • ADU Repository
    • Arts
    • Education
    • View Item
    •   DSpace Home
    • ADU Repository
    • Arts
    • Education
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A syntactically-based preprocessor for a limited experimental Arabic document retrieval system

    Thumbnail
    Date
    1988
    Type
    Thesis
    Author
    Ibrahim, Farid
    Metadata
    Show full item record
    Abstract
    The research reported in this thesis is about the description and discussion of an experimental document retrieval system for Arabic texts, using linguistic methods of analysis. Specifically, Arabic presents difficulties for the efficient retrieval of information because it is an agglutinative language, thus rendering the stop list method (as commonly used for English texts) near to useless. The system has two stages: the creation of the retrieval lexicon and the search program. The latt~r+ :is done using a limited on-Hlne searching which allows for partial matching. The former has four stages. Texts in the form of abstracts are processed by morphological analysis, syntactic analysis, term extraction and term manipulation modules. Each stage produces a new representation of the source text. The morphological analyser attempts to recognise any prefixes and/or suffixes attached to the words in the corpus being processed. It also assigns grammatical labels specifying the part of speech using a contextual analysis of individual words (assuming that the inflectional features of a word are indicative of its syntactic role). An augmented transition network grammar and pars er have been built for this purpose. The same pars er has been developed and used in the second stage which is syntactic analysis. It takes as its input the representation of the text created by the morphological analysis, and uses a separate grammar file defined as a recursive transition network. The aim of syntactic analysis is the definition of the relations of the different constituents in the individual sentences being processed. The information added by the morphological and syntactic analysers is used in the term extraction module. This module uses a traversal algorithm to negotiate the structure built by syntax, utilising a set of rules, kept on a file, specifying the type of constructs needing to be selected. The manipulative module generates new entries for each term selected by rotating its elements. The system has been implemented using the Hull V-mode Pascal compiler available on the L.U.T. Prime System. It has been tested using 40 abstracts selected from a conference proceedings in the field of computer applications. The results obtained were encouraging, particularly in the identification of affixes (a success rate of 89% for suffixes and 85% for '., prefixes) and in the identification of syntactic " , categories (a success rate of 98% for nouns and 79% for verbs). These figures have highly influenced the identification of the syntagmatic relations underlying the word order in the sentence. The research concludes that although successful results have been obtained without the aid of a pre-constructed Arabic lexicon, many errors can be avoided by the inclusion of small dictionaries. This research names these particularities.
    URI
    https://dspace.adu.ac.ae/handle/1/4000
    Citation
    Ibrahim, F. (1988). A syntactically-based preprocessor for a limited experimental Arabic document retrieval system (Doctoral dissertation, Loughborough University).
    Collections
    • Education

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV