The paper considers two data-driven methods for anaphora resolution of Kazakh texts. These methods are based on machine learning with annotated corpora and using no additional information except linguistic features. The first method uses Support Vector Machine as learning and classifying algorithms, the second method uses Decision Tree inducer. We evaluate the performance of the methods with several feature sets and corpora. Feature sets included morphological, syntactic, and semantic features. In this paper We also evaluate how semantic features, namely semantic roles, impact the performance of anaphora resolution in Kazakh language. Experiments showed that precision of SVM is higher on experimental data for almost all cases. It was shown that semantic features enhance the performance of the methods for anaphora resolution of Kazakh texts. We have also calculated the optimal distance between the anaphor and the hypothetic antecedent and used it in our methods.
Open article
                    Annotation: 
                    
                
                
                    
                    Year of release:                     
                        2022                    
                
                
                    
                    Number of the journal:                     
                        2(86)                    
                
                
                    
                    Heading:                     Technical sciences and technologies
                
                
             
             English
 English Русский
 Русский Қазақ
 Қазақ