Figure 4 - uploaded by Amber Baig
Content may be subject to copyright.
dislocated relation example

dislocated relation example

Source publication
Article
Full-text available
In this paper, the process of creating a Dependency Treebank for tweets in Urdu, a morphologically rich and less-resourced language is described. The 500 Urdu tweets treebank is created by manually annotating the treebank with lemma, POS tags, morphological and syntactic relations using the Universal Dependencies annotation scheme, adopted to the p...

Context in source publication

Context 1
... UNTDT, retweets are treated with dislocated relation. Example is shown in Figure 4. Since vocative relation is used to mark entity being addressed directly in the dialog, therefore, tweet mentions, and tweet replies are also treated as vocatives in UNTDT. ...

Similar publications

Article
Full-text available
Web 2.0" captures a combination of innovations on the World Wide Web. The Paper looks at the characteristics, features and trends of Web 2.0. It also touches upon how it is different from Web 1.0, Web 3.0, and Web 4.0. The core competencies required for Web 2.0 Companies and Web 2.0 Design Patterns are mentioned along with the recent Web 2.0 Social...

Citations

... UNTDT (Urdu Noisy Text Dependency Treebank) [4], a manually annotated dependency treebank of 500 Urdu tweets is used as a gold-standard corpus for parser training in this study. The treebank is annotated at morphological and syntactic levels by adopting Universal Dependencies [15] framework to the particularities of social media text. ...
... The treebank is annotated at morphological and syntactic levels by adopting Universal Dependencies [15] framework to the particularities of social media text. Refer to [4] for full review of the treebank. This treebank was validated using 10-fold cross validation. ...
... At the end of this experiment, there are 800 (20, 951 tokens) gold standard tweets in the treebank. The reported accuracy of baseline model is LA of 69.8%, UAS of 74% and LAS of 62.9% [4], whereas the final supervised bootstrapping model obtains a LA of 72.1%, UAS of 75.7% and LAS of 64.9%. Overall, the results showed that adding automatically parsed manually corrected training data to the baseline model is beneficial. ...
Article
Full-text available
This paper describes how bootstrapping was used to extend the development of the Urdu Noisy Text dependency treebank. To overcome the bottleneck of manually annotating corpus for a new domain of user-generated text, MaltParser, an opensource, data-driven dependency parser, is used to bootstrap the treebank in semi-automatic manner for corpus annotation after being trained on 500 tweet Urdu Noisy Text Dependency Treebank. Total four bootstrapping iterations were performed. At the end of each iteration, 300 Urdu tweets were automatically tagged, and the performance of parser model was evaluated against the development set. 75 automatically tagged tweets were randomly selected out of pre-tagged 300 tweets for manual correction, which were then added in the training set for parser retraining. Finally, at the end of last iteration, parser performance was evaluated against test set. The final supervised bootstrapping model obtains a LA of 72.1%, UAS of 75.7% and LAS of 64.9%, which is a significant improvement over baseline score of 69.8% LA, 74% UAS, and 62.9% LAS
... Hence, permitting parser learning from its own or from annotations of other parsers. This paper reports on experiments carried out using self-training and co-training by utilizing state-of-the-art statistical dependency parsers, MaltParser and Parsito in an attempt for creating a silver-standard dependency treebank for Urdu tweets, using relatively small gold-standard treebank, Urdu Noisy Text Dependency Treebank (UNTDT) [3]. Urdu is a widely spoken language in South Asia, but it is still regarded as a language with limited resources in terms of language technology [18]. ...
... The gold-standard corpus for parser training in this study is UNTDT (Urdu Noisy Text Dependency Treebank) [3], a manually annotated dependency treebank of 500 Urdu tweets. By applying the Universal Dependencies (UD) framework [15] to the particularities of social media text, the treebank is annotated at the morphological and syntactic levels. ...
... By applying the Universal Dependencies (UD) framework [15] to the particularities of social media text, the treebank is annotated at the morphological and syntactic levels. For a complete review of the treebank, see [3]. 10-fold cross validation was used to validate this treebank. ...
Article
Manually annotated corpus is a perquisite for several natural language processing applications including parsing. Nevertheless, annotated corpus is not always available for resource-poor languages, especially when domain under consideration is noisy user-generated data found on social media platforms such as Twitter. To overcome this deficiency of hand-annotated corpus, researchers have focused their attention on semi-automatic corpus annotation methods. This paper describes the experiments carried out using semi-automatic methods like self-training and co-training in an attempt for creating silver-standard dependency treebank of Urdu tweets. Six iterations of each approach were performed using same experimental conditions using MaltParser and Parsito parser, both statistical data driven parsers. For self-training experiments, the best performing MaltParser model was trained on 1250 Urdu tweets, with an accuracy of 70.2% LA, 74.4% UAS, 63% LAS. Whereas the best performing Parsito model was also trained on 1250 Urdu tweets with an accuracy of 70.8% LA, 74.8% UAS, 63.4% LAS. For co-training experiments, best performing MaltParser model was trained on 1500 Urdu tweets, with an accuracy of 70.5% LA, 74.4% UAS, 63.2% LAS. The best performing Parsito model was also trained on 1500 Urdu tweets with an accuracy of 70.5% LA, 74.3% UAS, 63% LAS. Although, there was not much difference between the results of both approaches, co-training results were slightly better for both parsers and is used for generating a silver-standard dependency treebank of 4500 Urdu tweets.
... Hence, permitting parser learning from its own or from annotations of other parsers. This paper reports on experiments carried out using self-training and co-training by utilizing state-of-the-art statistical dependency parsers, MaltParser and Parsito in an attempt for creating a silver-standard dependency treebank for Urdu tweets, using relatively small gold-standard treebank, Urdu Noisy Text Dependency Treebank (UNTDT) [3]. Urdu is a widely spoken language in South Asia, but it is still regarded as a language with limited resources in terms of language technology [18]. ...
... The gold-standard corpus for parser training in this study is UNTDT (Urdu Noisy Text Dependency Treebank) [3], a manually annotated dependency treebank of 500 Urdu tweets. By applying the Universal Dependencies (UD) framework [15] to the particularities of social media text, the treebank is annotated at the morphological and syntactic levels. ...
... By applying the Universal Dependencies (UD) framework [15] to the particularities of social media text, the treebank is annotated at the morphological and syntactic levels. For a complete review of the treebank, see [3]. 10-fold cross validation was used to validate this treebank. ...
Article
Full-text available
Manually annotated corpus is a perquisite for several natural language processing applications including parsing. Nevertheless, annotated corpus is not always available for resource-poor languages, especially when domain under consideration is noisy user-generated data found on social media platforms such as Twitter. To overcome this deficiency of hand-annotated corpus, researchers have focused their attention on semi-automatic corpus annotation methods. This paper describes the experiments carried out using semi-automatic methods like self-training and co-training in an attempt for creating silver-standard dependency treebank of Urdu tweets. Six iterations of each approach were performed using same experimental conditions using MaltParser and Parsito parser, both statistical data driven parsers. For self-training experiments, the best performing MaltParser model was trained on 1250 Urdu tweets, with an accuracy of 70.2% LA, 74.4% UAS, 63% LAS. Whereas the best performing Parsito model was also trained on 1250 Urdu tweets with an accuracy of 70.8% LA, 74.8% UAS, 63.4% LAS. For co-training experiments, best performing MaltParser model was trained on 1500 Urdu tweets, with an accuracy of 70.5% LA, 74.4% UAS, 63.2% LAS. The best performing Parsito model was also trained on 1500 Urdu tweets with an accuracy of 70.5% LA, 74.3% UAS, 63% LAS. Although, there was not much difference between the results of both approaches, co-training results were slightly better for both parsers and is used for generating a silver-standard dependency treebank of 4500 Urdu tweets.