Figure 5 - uploaded by Matt Selway
Content may be subject to copyright.
11: Definition of “base” from The Free Dictionary (Farlex 2010) 

11: Definition of “base” from The Free Dictionary (Farlex 2010) 

Source publication
Thesis
Full-text available
The majority of current Natural Language Processing (NLP) methods and research is based on statistical analysis and machine learning. While these methods are quite effective, they are not perfect and the search continues for better and more accurate methods for NLP. The work of Kleiner et al. (Kleiner et al. 2009) takes a very different approach. T...

Context in source publication

Context 1
... ability of The Free Dictionary to link words to their base word has turned out to be a blessing as well as a curse, as definitions of the base words are being added to its derived words that they does not belong to. For example, the categories for the word “based” include noun and determiner when the only category that should apply is transitive verb . In this case, both of the additional categories are taken from the word “base” (as an be seen from its definition in Figure 5.11). By updating the parser to correctly identify the categories associated with a particular form of a word, these superfluous usages could be excluded and would remove not only a number of additional categories but a some of the words with additional categories entirely. Most of the additional categories for the SBVR word list stem from the correct category being more specific than the category that was retrieved from The Free Dictionary. For example, “each” and “exactly” were associated with the category Determiner, when they needed to be associated with the categories quantified unvalued determiner and quantified valued determiner , respectively. Other additional categories in the SBVR word list were caused by the issue with the duplication of verb categories and a couple of cases with legitimate multiple uses. While it appears that the additional categories can be almost eliminated in the SBVR word list using the methods discussed in Section 5.2, the small sample size is likely playing a factor in overstating the improvements that can be made. The evaluation of a larger sample of SBVR text should be an important aspect of future work with the Lexicon Builder. Unfortunately most of the words with additional categories have legitimate multiple uses. Although some small improvements can be made the majority of the additional categories cannot be eliminated by improving the parsing of The Free Dictionary’s web pages. As mentioned in Section 5.3.2, it may be possible to use the fact that a large number of words have only one possible category in order to cull some of the superfluous categories. It may not be possible to reduce all of the words to a single category, but it may be able to reduce the amount of searching required in the Configuration by eliminating some possibilities immediately. This is something else to look at in future ...

Similar publications

Technical Report
Full-text available
This paper presents the development of FProvW3C framework, next to a multi-agent system where the framework is responsible for collecting and storing the provenance data. We describe the data structures of the framework, following the W3C PROV model, the UML diagrams of the framework and the application developed with the use of the framework, the...