Figure 4: dislocated relation example

Context 1

... UNTDT, retweets are treated with dislocated relation. Example is shown in Figure 4. Since vocative relation is used to mark entity being addressed directly in the dialog, therefore, tweet mentions, and tweet replies are also treated as vocatives in UNTDT. ...

View in full-text

Web 2.0's traction as compared to Web 1.0, Web 3.0 and Web 4.0

Article

Full-text available

Jan 2011

Machunwangliu Kamei

Web 2.0" captures a combination of innovations on the World Wide Web. The Paper looks at the characteristics, features and trends of Web 2.0. It also touches upon how it is different from Web 1.0, Web 3.0, and Web 4.0. The core competencies required for Web 2.0 Companies and Web 2.0 Design Patterns are mentioned along with the recent Web 2.0 Social...

Bootstrapping Dependency Treebank of Urdu Noisy Text

Article

Full-text available

Aug 2021

This paper describes how bootstrapping was used to extend the development of the Urdu Noisy Text dependency treebank. To overcome the bottleneck of manually annotating corpus for a new domain of user-generated text, MaltParser, an opensource, data-driven dependency parser, is used to bootstrap the treebank in semi-automatic manner for corpus annotation after being trained on 500 tweet Urdu Noisy Text Dependency Treebank. Total four bootstrapping iterations were performed. At the end of each iteration, 300 Urdu tweets were automatically tagged, and the performance of parser model was evaluated against the development set. 75 automatically tagged tweets were randomly selected out of pre-tagged 300 tweets for manual correction, which were then added in the training set for parser retraining. Finally, at the end of last iteration, parser performance was evaluated against test set. The final supervised bootstrapping model obtains a LA of 72.1%, UAS of 75.7% and LAS of 64.9%, which is a significant improvement over baseline score of 69.8% LA, 74% UAS, and 62.9% LAS

Towards Silver Standard Dependency Treebank of Urdu Tweets

Article

Aug 2021

Sirajuddin Qureshi

Manually annotated corpus is a perquisite for several natural language processing applications including parsing. Nevertheless, annotated corpus is not always available for resource-poor languages, especially when domain under consideration is noisy user-generated data found on social media platforms such as Twitter. To overcome this deficiency of hand-annotated corpus, researchers have focused their attention on semi-automatic corpus annotation methods. This paper describes the experiments carried out using semi-automatic methods like self-training and co-training in an attempt for creating silver-standard dependency treebank of Urdu tweets. Six iterations of each approach were performed using same experimental conditions using MaltParser and Parsito parser, both statistical data driven parsers. For self-training experiments, the best performing MaltParser model was trained on 1250 Urdu tweets, with an accuracy of 70.2% LA, 74.4% UAS, 63% LAS. Whereas the best performing Parsito model was also trained on 1250 Urdu tweets with an accuracy of 70.8% LA, 74.8% UAS, 63.4% LAS. For co-training experiments, best performing MaltParser model was trained on 1500 Urdu tweets, with an accuracy of 70.5% LA, 74.4% UAS, 63.2% LAS. The best performing Parsito model was also trained on 1500 Urdu tweets with an accuracy of 70.5% LA, 74.3% UAS, 63% LAS. Although, there was not much difference between the results of both approaches, co-training results were slightly better for both parsers and is used for generating a silver-standard dependency treebank of 4500 Urdu tweets.

Towards Silver Standard Dependency Treebank of Urdu Tweets

Article

Full-text available

Jun 2021

Manually annotated corpus is a perquisite for several natural language processing applications including parsing. Nevertheless, annotated corpus is not always available for resource-poor languages, especially when domain under consideration is noisy user-generated data found on social media platforms such as Twitter. To overcome this deficiency of hand-annotated corpus, researchers have focused their attention on semi-automatic corpus annotation methods. This paper describes the experiments carried out using semi-automatic methods like self-training and co-training in an attempt for creating silver-standard dependency treebank of Urdu tweets. Six iterations of each approach were performed using same experimental conditions using MaltParser and Parsito parser, both statistical data driven parsers. For self-training experiments, the best performing MaltParser model was trained on 1250 Urdu tweets, with an accuracy of 70.2% LA, 74.4% UAS, 63% LAS. Whereas the best performing Parsito model was also trained on 1250 Urdu tweets with an accuracy of 70.8% LA, 74.8% UAS, 63.4% LAS. For co-training experiments, best performing MaltParser model was trained on 1500 Urdu tweets, with an accuracy of 70.5% LA, 74.4% UAS, 63.2% LAS. The best performing Parsito model was also trained on 1500 Urdu tweets with an accuracy of 70.5% LA, 74.3% UAS, 63% LAS. Although, there was not much difference between the results of both approaches, co-training results were slightly better for both parsers and is used for generating a silver-standard dependency treebank of 4500 Urdu tweets.

dislocated relation example

Context in source publication

Similar publications

Citations