Fig 3 - uploaded by Taisuke Sato
Content may be subject to copyright.
A parse tree from Treebank and a parse tree predicted by an induced grammar.  

A parse tree from Treebank and a parse tree predicted by an induced grammar.  

Source publication
Conference Paper
Full-text available
This paper presents a new grammar induction algorithm for probabilistic context-free grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximation of Bayesian learning. It has been em- pirically shown th...

Context in source publication

Context 1
... 0-CB is the ratio of sentences whose brackets are completely consistent with correct brackets. Fig.4 lists a subset of the induced grammar used in Fig.3. In this example, non-terminal 15 derives "DT JJ * NN" where DT is determiner, JJ is adjective and NN is noun. ...

Similar publications

Conference Paper
Full-text available
This paper presents a new approach to syntactic disambiguation based on lexicalized grammars. While existing disambiguation models decompose the probability of parsing results into that of primitive dependencies of two words, our model selects the most ...

Citations

... This method is based on [19,20], where grammar is induced incrementally. During each iteration of the algorithm, a new X j nonterminal symbol is created from another X i nonterminal symbol through a split operation. ...
Article
Full-text available
The split-based method in a weighted context-free grammar (WCFG) induction was formalised and verified on a comprehensive set of context-free languages. WCFG is learned using a novel grammatical inference method. The proposed method learns WCFG from both positive and negative samples, whereas the weights of rules are estimated using a novel Inside–Outside Contrastive Estimation algorithm. The results showed that our approach outperforms in terms of F1 scores of other state-of-the-art methods.
... Such factors include the tokenization, lemmatization, stemming, morphological analysis, syntax analysis, keyword selection methods, machine learning algorithm, model selection methods and parameter optimization techniques. These NLP methods can be logical or rule based such as inductive logic programming (Hossny et al. 2008(Hossny et al. , 2009 or probabilistic such as Bayesian classifiers and decision trees (Chien and Wu 2007;Kurihara and Sato 2006) or using deep learning and high-performance computing (Azzam et al. 2017). This study focus solely on the keyword selection method as studying the combination of all factors will lead to a huge number of possibilities that are difficult to cover. ...
Article
Full-text available
Selecting keywords from Twitter as features to identify events is challenging due to language informality such as acronyms, misspelled words, synonyms, transliteration and ambiguous terms. In this paper, We compare and identify the best methods for keyword selection as features to be used for classification purposes. Specifically, we study the aspects affecting keywords as features to identify civil unrest and protests. These aspects include the word count, the word forms such as n-gram, skip-gram and bags-of-words as well as the data association methods including correlation techniques and similarity techniques. To test the impact of the mentioned factors, we developed a framework that analyzed 641 days of tweets and extracted the words highly associated with event days along the same time frame. Then, we used the extracted words as features to classify any single day to be either an event day or a nonevent day in a specific location. In this framework, we used the same pipeline of data cleaning, prepossessing, feature selection, model learning and event classification using all combinations of keyword selection criteria. We used Naive Bayes classifier to learn the selected features and accordingly predict the event days. The classification is tested using multiple metrics, such as accuracy, precision, recall, F-score and AUC. This study concluded that the best word form is bag-of-words with average AUC of 0.72 and the best word count is two with average AUC of 0.74 and the best feature selection method is Spearman's correlation with average AUC of 0.89 and the best classifier for event detection is Naive Bayes Classifier.
... This method is strongly inspired by the works of [13,19]. In this approach, grammar is induced in an incremental way. ...
Conference Paper
Protein sequence motifs are conserved amino acid patterns of biological significance. They are vital for annotating structural and functional features of proteins. Yet, the computational methods commonly used for defining sequence motifs are typically simplified linear representations neglecting the higher-order structure of the motif. The purpose of the work is to create models of sequence motifs taking into account the internal structure of the modeled fragments. The ultimate goal is to provide the community with accurate and concise models of diverse collections of remotely related amino acid sequences that share structural features. The internal structure of amino acid sequences is modeled using a novel algorithm for unsupervised learning of weighted context-free grammar (WCFG). The proposed method learns WCFG both form positive and negative samples, whereas weights of rules are estimated using a novel Inside-Outside Contrastive Estimation algorithm. In comparison to existing approaches to learning CFG, the new method generates more concise descriptors and provides good control of the trade-off between grammar size and specificity. The method is applied to the nicotinamide adenine dinucleotide phosphate binding site motif.
... More successful approaches to grammar induction have thus resorted to carefully-crafted auxiliary objectives (Klein and Manning, 2002), priors or Code: https://github.com/harvardnlp/compound-pcfg non-parametric models (Kurihara and Sato, 2006;Johnson et al., 2007;Liang et al., 2007;Wang and Blunsom, 2013), and manually-engineered features (Huang et al., 2012;Golland et al., 2012) to encourage the desired structures to emerge. ...
... In contrast to the usual Bayesian treatment of PCFGs which places priors on global rule probabilities (Kurihara and Sato, 2006;Johnson et al., 2007;Wang and Blunsom, 2013), the compound PCFG assumes a prior on local, sentence-level rule probabilities. It is therefore closely related to the Bayesian grammars studied by , who also sample local rule probabilities from a logistic normal prior for training dependency models with valence (DMV) (Klein and Manning, 2004). ...
Preprint
We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our context-free rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-the-art methods for grammar induction from words with neural language models.
... More successful approaches to grammar induction have thus resorted to carefully-crafted auxiliary objectives ( Klein and Manning, 2002), priors or Code: https://github.com/harvardnlp/compound-pcfg non-parametric models ( Kurihara and Sato, 2006;Johnson et al., 2007;Liang et al., 2007;Wang and Blunsom, 2013), and manually-engineered features ( Huang et al., 2012;Golland et al., 2012) to encourage the desired structures to emerge. ...
... In contrast to the usual Bayesian treatment of PCFGs which places priors on global rule probabilities ( Kurihara and Sato, 2006;Johnson et al., 2007;Wang and Blunsom, 2013), the compound PCFG assumes a prior on local, sentence-level rule probabilities. It is therefore closely related to the Bayesian grammars studied by and , who also sample local rule probabilities from a logistic normal prior for training dependency models with valence (DMV) ( Klein and Manning, 2004). ...
... In natural language processing, variational inference has been used for solving problems such as parsing (Liang et al., 2007Liang et al., , 2009), grammar induction (Kurihara and Sato, 2006; Naseem et al., 2010; Cohen and Smith, 2010), models of streaming text (Yogatama et al., 2014), topic modeling (Blei et al., 2003), and hidden Markov models and part-of-speech tagging (Wang and Blunsom, 2013). In speech recognition, variational inference has been used to fit complex coupled hidden Markov models (Reyes-Gomez et al., 2004) and switching dynamic systems (Deng, 2004). ...
Article
One of the core problems of modern statistics is to approximate difficult-to-compute probability distributions. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about the posterior. In this paper, we review variational inference (VI), a method from machine learning that approximates probability distributions through optimization. VI has been used in myriad applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of distributions and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this widely-used class of algorithms.
... where ψ is the digamma function, and α A→φ is the prior of the rule A → φ (Kurihara and Sato, 2006). It has previously been used for ITG learning (Zhang et al., 2008;Saers and Wu, 2013), but only with uniform priors. ...
... We approximate the parsing model with the Variational Bayesian EM Algorithm[31],[32]. We follow the approach of Kurihara and Sato's[33]variational version of the InsideOutside Algorithm for approximating the model parameters π in Eq. (4), because it was shown to be less data-overfitting than the standard Inside-Outside one. Let us summarize the variational Inside-Outside Algorithm as follows. ...
Article
Full-text available
Developing a practical and accurate statistical parser for low-resourced languages is a hard problem, because it requires large-scale treebanks, which are expensive and labor-intensive to build from scratch. Unsupervised grammar induction theoretically offers a way to overcome this hurdle by learning hidden syntactic structures from raw text automatically. The accuracy of grammar induction is still impractically low because frequent collocations of non-linguistically associable units are commonly found, resulting in dependency attachment errors. We introduce a novel approach to building a statistical parser for low-resourced languages by using language parameters as a guide for grammar induction. The intuition of this paper is: most dependency attachment errors are frequently used word orders which can be captured by a small prescribed set of linguistic constraints, while the rest of the language can be learned statistically by grammar induction. We then show that covering the most frequent grammar rules via our language parameters has a strong impact on the parsing accuracy in 12 languages.
... In this paper we study the ability of three models to predict reading difficulty as measured by either eye-fixation or reading times -the full-parsing model, implemented by Dirichletmultinomial probabilistic context-free grammars (DMPCFG) (Kurihara and Sato, 2006;Johnson et al., 2007), the full-listing mode, implemented by maximum a posteriori adaptor grammars (MAG) (Johnson et al., 2006), and the inference-based model, implemented by fragment grammars (FG) (O'Donnell, 2015). All three models start with the same underlying base system-a context-free grammar (CFG) specifying the space of possible syntactic derivations-and the same training data-a corpus of syntactic trees. ...
... In a Bayesian PCFG one puts Dirichlet priors Dir(α) on the rule probability vector θ, such that there is one Dirichlet parameter α A → α for each rule A → α ∈ R. There are Markov Chain Monte Carlo (MCMC) and Variational Bayes procedures for estimating the posterior distribution over rule probabilities θ and parse trees given data consisting of terminal strings alone (Kurihara and Sato, 2006;Johnson et al., 2007a). ...
Conference Paper
Full-text available
Inspired by experimental psychological findings suggesting that function words play a special role in word learning, we make a simple modification to an Adaptor Grammar based Bayesian word segmentation model to allow it to learn sequences of monosyllabic "function words" at the beginnings and endings of collocations of (possibly multi-syllabic) words. This modification improves unsupervised word segmentation on the standard Bernstein- Ratner (1987) corpus of child-directed English by more than 4% token f-score compared to a model identical except that it does not special-case "function words", setting a new state-of-the-art of 92.4% token f-score. Our function word model assumes that function words appear at the left periphery, and while this is true of languages such as English, it is not true universally. We show that a learner can use Bayesian model selection to determine the location of function words in their language, even though the input to the model only consists of unsegmented sequences of phones. Thus our computational models support the hypothesis that function words play a special role in word learning.