Fig 1 - uploaded by Xuri Tang
Content may be subject to copyright.
Successive View of Semantic Change 

Successive View of Semantic Change 

Source publication
Conference Paper
Full-text available
The prevalence of creativity in the emergent online media language calls for more effective computational approach to semantic change. This paper advocates the successive view of semantic change and proposes a successive framework for automatic semantic change detection. The framework measures Word Status of a word in a time unit with entropy, form...

Context in source publication

Context 1
... on this understanding possess two distinctive features, as compared with juxtaposition view of semantic change. The first lies in the type of data used for investigation. Successive view considers linguistic data from the successive stages over a historical period of time. Thus to account for the semantic change of words like toumin , a successive sequence of data (depicted in Figure 1) collected from corpora ranging from 1951 to 2001 should be investigated. But the juxtaposition view may just focus on two or more individual years, which may or may not be adjacent. In this aspect, the advantage of successive view over juxtaposition view is obvious, as the former gives a detailed and faithfully record of the phenomenon. The second distinctive feature is more important. As is discussed in [15], it is possible for successive view of semantic change to reveal the intimate inter-connection by spatial, causal and other relations between the successive members, which gives it the power to portray the pattern of change for a word. This can’t be done with juxtaposition view. It has omitted a lot details in-between and its ability to detect change pattern is weak. Studies like [9, 6, 8, 7] opt for this view of change. But they use the successive data only for human analyses or evaluation purposes, not for automatic detection. For example, [7] adopts methods from Information Visualization and Visual Analytics to visualize the context in which the words occur so as to guide researches by generating new hypotheses about development of semantic change; [6] makes use of correlations between frequency change and some ranking to look for some trend of change; [8] plots the sequence of change between the rise of semantic density and the percentage of use measured by human beings to show the feasibility of semantic density based approach. This usage of human analyses can be illustrated in Figure 2. The principle of uniformi- tarianism [16, 17] allows for confident prediction that the word is undergoing a change, but it does not tell the type of change it is undergoing, nor relate the type of change to the change in its inner semantic structure. This paper believes that successive view of semantic change can be further exploited to infer inner structural change of the word and predict its tendency in future by examining the process in details. Based on the successive view of semantic change, this section constructs a framework to characterize the change pattern and relates the type of change to change in inner structures. The framework, depicted in Figure 3, is word oriented. It takes as input three things: the word to be studied, the historical time span under investigation, and the corpora. To start with, the corpora should be firstly divided in a successive mode into a series, with each two adjoining time units in temporal order. For each time-unit corpus, the concept of Word Status is proposed to denote the state-of-affairs of the word’s inner semantic structure in the time unit, which is measured quantitatively with sense entropy. The Word Statuses over all time-unit corpora are then obtained and formed into a time series data. The Change Pattern Detection is then performed over the time series data to obtain parameters that characterize the trend of change, which are then used for categorization. Due to the fact that Word Status is a reflection of the word’s inner sense structure, and that the word’s inner sense structure is associated with word’s denotation, the obtained trend of change has to be denotation-related, which is useful in predicting how the word is used. The components of Time-Unit Based Word Status Measurement and Word change Pattern Detection are explained in this section. Denotation-Related Word Change Categorization is illustrated in section 4. Semantic change is the change in a word’s inner semantic structure and outward usage of the senses, called Word Status in this paper. For a target word T , two important issues needs to be considered to obtain its Word Status: (1) representation of senses for T and (2) accounting for usage of the senses in a time-unit corpus. Sense Representation This paper represents word senses with Word-Context Model, one of the subtypes of Vector Space Models [18]. The model is an explicit form of the Distributional Hypothesis, originating from [19–22], in which word senses are distin- guished and represented by the context in which it occurs. According to the model, a sense of a word in a sentence can be a tuple c = < T, W max > , in which T is the target word and W is the co-occurring word with the strongest association strength within a window size of 9 in the sentence. To better represent the denotation of the word, this paper has narrowed the “Word” into “Noun” and employs the Noun-Word- Context model, denoted by a tuple < T, N max > , to represent the target word’s senses in every sentences in corpora. In the tuple, N max is the noun with the maximum association strength. The association strength is computed via Likelihood Ratio Test [23], the method of which is widely used for collocation extraction. Formula 1 gives the null hypothesis for the distributions of two words, and Formula 2 gives the alternative ...

Similar publications

Article
Full-text available
The prevalence of creativity in the emergent online media language calls for more effective computational approach to semantic change. Two divergent metaphysical understandings are found with the task: juxtaposition-view of change and succession-view of change. This paper argues that the succession-view better reflects the essence of semantic chang...

Citations

... Temporal syntactic and semantic shifts are called diachronic changes [1]. Several probabilistic approaches tackle the problem of modeling the temporal evolution of a vocabulary by converting a set of timestamped documents into a latent variable model [15][16][17][18]. Other approaches model diachronic changes using Parts of Speech features [19] or using graphs where the edges between nodes (that represent words) are stronger based on context information [20]. ...
Article
Full-text available
Semantics in natural language processing is largely dependent on contextual relationships between words and entities in a document collection. The context of a word may evolve. For example, the word “apple” currently has two contexts—a fruit and a technology company. The changes in the context of words or entities in text data such as scientific publications and news articles can help us understand the evolution of innovation or events of interest. In this work, we present a new diffusion-based temporal word embedding model that can capture short- and long-term changes in the semantics of entities in different domains. Our model captures how the context of each entity shifts over time. Existing temporal word embeddings capture semantic evolution at a discrete/granular level, aiming to study how a language developed over a long period. Unlike existing temporal embedding methods, our approach provides temporally smooth embeddings, facilitating prediction and trend analysis better than those of existing models. Extensive evaluations demonstrate that our proposed temporal embedding model performs better in sense-making and predicting relationships between words and entities in the future compared to other existing models.