Figure 5 - uploaded by Philippe Cudre-Mauroux
Content may be subject to copyright.
Example showing the prediction of the number of signatures of one petition, after 3 days of observation.

Example showing the prediction of the number of signatures of one petition, after 3 days of observation.

Source publication
Conference Paper
Full-text available
Applying classical time-series analysis techniques to online content is challenging, as web data tends to have data quality issues and is often incomplete, noisy, or poorly aligned. In this paper, we tackle the problem of predicting the evolution of a time series of user activity on the web in a manner that is both accurate and interpretable, using...

Context in source publication

Context 1
... accuracy. Figure 5 shows an example of an actual time series for signatures and the result of predictions with our models and the baselines. We show the advantage of incorporating information from social media in terms of generating a prediction that follows more closely the actual evolution of the number of signatures. ...

Similar publications

Conference Paper
Full-text available
This study presents a workflow allowing 3D geological soil modelling using a combination of AEM data, borehole information, and soil maps. The workflow aims at utilizing all available information in a time effective way. The overall goal of the workflow is to define a bedrock surface and in turn model the geology between terrain and this model, con...
Preprint
Full-text available
To obtain interpretable machine learning models, either interpretable models are constructed from the outset - e.g. shallow decision trees, rule lists, or sparse generalized linear models - or post-hoc interpretation methods - e.g. partial dependence or ALE plots - are employed. Both approaches have disadvantages. While the former can restrict the...
Preprint
Full-text available
The widespread use of machine learning models, especially within the context of decision-making systems impacting individuals, raises many ethical issues with respect to fairness and interpretability of these models. While the research in these domains is booming, very few works have addressed these two issues simultaneously. To solve this shortcom...
Conference Paper
Full-text available
In the Software Defined Networking (SDN) and Network Function Virtualization (NFV) era, it is critical to enable dynamic network access control. Traditionally, network access control policies are statically predefined as router entries or firewall rules. SDN enables more flexibility by re-actively installing flow rules into the switches to achieve...
Article
Full-text available
Vortex-induced vibrations (VIVs) have been observed on a long-span suspension bridge. The nonstationary wind in the field characterized by the time-varying mean wind speed is likely to lead to time-varying aerodynamics of the wind-bridge system during VIVs, which is different from VIVs induced by stationary or even steady wind in wind tunnels. In t...

Citations

... Previous works have dealt with multi-modal posts [41], images [42], movies [1], petitions [35], etc. and have utilized user-behavior based markers such as the number of comments [32,36], shares [25,38], etc. as surrogates of social-media popularity. There has also been limited work on timeaware [14] and time-series prediction of social-media popularity [20,29]. ...
Preprint
Full-text available
Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available: https://github.com/sayarghoshroy/InfoPopularity
... Large-scale studies aiming to increase accuracy and the spatiotemporal resolution of responses require advanced survey techniques. In recent years, human activity has been increasingly mediated by digital devices, leaving footprints that can be exploited to assess the population's health and opinions [13][14][15][16][17]. In the context of COVID-19, social media data have been used to predict the number of new cases (incidence) [18,19] and to interpret the public perception of the pandemic [20]. ...
Article
Full-text available
Background: Vaccines are promising tools to control the spread of COVID-19. An effective vaccination campaign requires government policies and community engagement, sharing experiences for social support, and voicing concerns about vaccine safety and efficiency. The increasing use of online social platforms allows us to trace large-scale communication and infer public opinion in real-time. Objective: This study aims to identify the main themes in COVID-19 vaccine-related discussion on Twitter in Japan and track how the popularity of the tweeted themes evolved during the vaccination campaign. Furthermore, we aim to understand the impact of critical social events on the popularity of the themes. Methods: We collected more than 100 million vaccine-related tweets written in Japanese and posted by 8 million users (approx. 6.4% of the Japanese population) from January 1 to October 31, 2021. We used the Latent Dirichlet Allocation to perform automated topic modeling of tweet texts during the vaccination campaign. In addition, we performed an interrupted time series regression to evaluate the impact of four critical social events on public opinion. Results: We identified 15 topics grouped into 4 themes: Personal issue, Breaking news, Politics, and Conspiracy and humour. The evolution of the popularity of themes revealed a shift in public opinion, initially sharing the attention over personal issues (individual aspect), collecting information from the news (knowledge acquisition), and government criticisms towards focusing on personal issues. Our analysis showed that the Tokyo Olympic Games affected public opinion more than other critical events but not the course of the vaccination. Public opinion about politics was significantly affected by various social events, positively shifting the attention in the early stages of the vaccination campaign and negatively later. Conclusions: This study showed a striking shift in public interest in Japan, with users splitting their attention to various themes early in the vaccination campaign and then focusing only on personal issues, as trust in vaccines and policies built up. An interrupted time series regression analysis showed that the vaccination rollout to the general population (under 65) increased the popularity of tweets about practical advice and personal vaccination experience and the Tokyo Olympic Games disrupted public opinions but not the course of the vaccination campaign. The methodology developed here allowed us to monitor the evolution of public opinion and evaluate the impact of social events on the public opinion from large-scale Twitter data.
... Time series segmentation (TSS). The essence of traffic data is periodic time series data [28]. According to the relevant literature in the spatiotemporal field [29], a small number Time series segmentation (TSS). ...
... According to the relevant literature in the spatiotemporal field [29], a small number Time series segmentation (TSS). The essence of traffic data is periodic time series data [28]. According to the relevant literature in the spatiotemporal field [29], a small number of critical data can determine a small number of future data. ...
Article
Full-text available
The regularity and demand predictions of shared cycling are very necessary and challenging for the management and development of urban pedestrian and bicycle traffic. The bicycle-sharing system has the problem of spatial and temporal demand fluctuations and presents a very complex nonlinear regularity. The demand for shared bicycles is affected by many factors, including time, space, weather and the situation of COVID-19. This study proposes a new bicycle-sharing demand forecasting model (USTARN) based on the impact of COVID-19, which combines urban computing and spatiotemporal attention residual network. USTARN consists of two parts. In the first part, a spatiotemporal attention residual network model is established to learn the temporal correlation and spatial correlation of shared bicycle demand. The temporal characteristic branches of each spatial small region are trained, respectively, to predict the shared bicycle demand in batches in different regions and periods according to the historical data. In order to improve the prediction accuracy of the model, the second part of the model adjusts and redistributes the prediction results of the first part by learning other information of the city, such as the severity of COVID-19, weather, temperature, wind speed and holidays. It can predict the demand for shared bicycles in different urban areas in different periods and different severities of COVID-19. This study uses the order data of shared bicycles during the period of COVID-19 in 2020 obtained from the open data platform of Shenzhen municipal government as verification, analyzes the spatiotemporal regularity of the system demand and discusses the impact of the number of newly diagnosed patients and the daily minimum temperature on the demand for shared bicycles. The results show that USTARN can fully reflect time, space, the epidemic situation, weather and temperature, and the prediction results of the impact of wind speed and other factors on the demand for shared bicycles are better than the classical methods.
... The increasing volume of online user activity provides vital new opportunities to measure and understand the collective behavior of social and economic evolutions such as influenza prediction [17], the impact of individual performance [16], user activity modeling [36], and other problems [28,44,57]. A record of online user activity can play the role of a social sensor and offers important insights into people's decision-making. ...
Preprint
Large quantifies of online user activity data, such as weekly web search volumes, which co-evolve with the mutual influence of several queries and locations, serve as an important social sensor. It is an important task to accurately forecast the future activity by discovering latent interactions from such data, i.e., the ecosystems between each query and the flow of influences between each area. However, this is a difficult problem in terms of data quantity and complex patterns covering the dynamics. To tackle the problem, we propose FluxCube, which is an effective mining method that forecasts large collections of co-evolving online user activity and provides good interpretability. Our model is the expansion of a combination of two mathematical models: a reaction-diffusion system provides a framework for modeling the flow of influences between local area groups and an ecological system models the latent interactions between each query. Also, by leveraging the concept of physics-informed neural networks, FluxCube achieves high interpretability obtained from the parameters and high forecasting performance, together. Extensive experiments on real datasets showed that FluxCube outperforms comparable models in terms of the forecasting accuracy, and each component in FluxCube contributes to the enhanced performance. We then show some case studies that FluxCube can extract useful latent interactions between queries and area groups.
... Large-scale studies aiming to increase accuracy and the spatio-temporal resolution of responses require advanced survey techniques. In recent years, Human activity has been increasingly mediated by digital devices, leaving footprints that can be exploited to assess the population health and opinions [13][14][15][16][17] . In the context of COVID-19, social media data have been used to predict the number of new cases (incidence) 18,19 and to interpret the public perception of the pandemics 20 . ...
Preprint
Full-text available
Vaccines are promising tools to control the spread of COVID-19. An effective vaccination campaign requires government policies and community engagement, sharing experiences for social support, and voicing concerns to vaccine safety and efficiency. The increasing use of online social platforms allows us to trace large-scale communication and infer public opinion in real-time. We collected more than 100 million vaccine-related tweets posted by 8 million users and used the Latent Dirichlet Allocation model to perform automated topic modeling of tweet texts during the vaccination campaign in Japan. We identified 15 topics grouped into 4 themes on Personal issue, Breaking news, Politics, and Conspiracy and humour. The evolution of the popularity of themes revealed a shift in public opinion, initially sharing the attention over personal issues (individual aspect), collecting information from the news (knowledge acquisition), and government criticisms, towards personal experiences once confidence in the vaccination campaign was established. An interrupted time series regression analysis showed that the Tokyo Olympic Games affected public opinion more than other critical events but not the course of the vaccination. Public opinion on politics was significantly affected by various events, positively shifting the attention in the early stages of the vaccination campaign and negatively later. Tweets about personal issues were mostly retweeted when the vaccination reached the younger population. The associations between the vaccination campaign stages and tweet themes suggest that the public engagement in the social platform contributed to speedup vaccine uptake by reducing anxiety via social learning and support.
... A central question of computational social science is to understand the mechanisms by which individuals, taken as groups, exhibit collective behaviours (Lazer et al., 2009). This relationship is particularly striking when considering the emergence and subsequent decline in popularity, or success, on the web and in social media (Szabo and Huberman, 2010;Goel et al., 2010;Bandari, Asur, and Huberman, 2012;Proskurnia et al., 2017;Candia et al., 2019). Take the adoption of certain hashtags, instead of others, when confronted with new social phenomena (Lin et al., 2013); or the fact that certain songs become extremely popular while most remain in the dark (Salganik, Dodds, and Watts, 2006). ...
... In this study, we investigate the question of how people respond to planned events in an online setting. While several previous studies have focused on the dynamics of popularity, they have mostly considered specific events, e.g., movie release (Mestyán, Yasseri, and Kertész, 2013), elections (Yasseri and Bright, 2016), and airplane crashes (García-Gavilanes, Tsvetkova, and Yasseri, 2016;García-Gavilanes et al., 2017), or, looking for universal patterns, have not paid attention to the type of events (Crane and Sornette, 2008;Matsubara et al., 2012;Zhao et al., 2015;Kobayashi and Lambiotte, 2016;Proskurnia et al., 2017). For these reasons, it remains unclear how event-related information (e.g., category and outcome of an event) influences its peaks of popularity. ...
... Important works include Matsubara et al. (2012), proposing a time series model, SpikeM, that incorporates an exponential rise, power-law decay, and circadian rhythms. In Proskurnia et al. (2017), the authors proposed a time-series model that incorporates reinforcement and circadian rhythms, allowing to predict the popularity dynamics on thepetitionsite.com. While the previous works developed time series models for predicting popularity dynamics, our work additionally exploits a model in order to investigate the relationship between event-related information and popularity dynamics. ...
Preprint
Full-text available
The dynamics of popularity in online media are driven by a combination of endogenous spreading mechanisms and response to exogenous shocks including news and events. However, little is known about the dependence of temporal patterns of popularity on event-related information, e.g. which types of events trigger long-lasting activity. Here we propose a simple model that describes the dynamics around peaks of popularity by incorporating key features, i.e., the anticipatory growth and the decay of collective attention together with circadian rhythms. The proposed model allows us to develop a new method for predicting the future page view activity and for clustering time series. To validate our methodology, we collect a corpus of page view data from Wikipedia associated to a range of planned events, that are events which we know in advance will have a fixed date in the future, such as elections and sport events. Our methodology is superior to existing models in both prediction and clustering tasks. Furthermore, restricting to Wikipedia pages associated to association football, we observe that the specific realization of the event, in our case which team wins a match or the type of the match, has a significant effect on the response dynamics after the event. Our work demonstrates the importance of appropriately modeling all phases of collective attention, as well as the connection between temporal patterns of attention and characteristic underlying information of the events they represent.
... A central question of computational social science is to understand the mechanisms by which individuals, taken as groups, exhibit collective behaviours (Lazer et al. 2009). This relationship is particularly striking when considering the emergence and subsequent decline in popularity, or success, on the web and in social media (Szabo and Huberman 2010;Goel et al. 2010;Bandari, Asur, and Huberman 2012;Proskurnia et al. 2017;Candia et al. 2019). Take the adoption of certain hashtags, instead of others, when confronted Copyright c 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). ...
... In this study, we investigate the question of how people respond to planned events in an online setting. While several previous studies have focused on the dynamics of popularity, they have mostly considered specific events (e.g., movie release: Mestyán, Yasseri, and Kertész (2013), elections: Yasseri and Bright (2016), and airplane crashes: García-Gavilanes, Tsvetkova, and ; García-Gavilanes et al. (2017)) or, looking for universal patterns, have not paid attention to the type of events (Crane and Sornette 2008;Matsubara et al. 2012;Zhao et al. 2015;Kobayashi and Lambiotte 2016;Proskurnia et al. 2017). For these reasons, it remains unclear how event-related information (e.g., category and outcome of an event) influences its peaks of popularity. ...
... Important works include Matsubara et al. (2012), proposing a time series model, SpikeM, that incorporates an exponential rise, power-law decay, and circadian rhythms. In Proskurnia et al. (2017), the authors proposed a time-series model that incorporates reinforcement and circadian rhythms, allowing to predict the popularity dynamics on thepetitionsite.com. While the previous works developed time series models for predicting popularity dynamics, our work additionally exploits a model in order to investigate the relationship between event-related information and popularity dynamics. ...
Conference Paper
The dynamics of popularity in online media are driven by a combination of endogenous spreading mechanisms and response to exogenous shocks including news and events. However, little is known about the dependence of temporal patterns of popularity on event-related information, e.g. which types of events trigger long-lasting activity. Here we propose a simple model that describes the dynamics around peaks of popularity by incorporating key features, i.e., the anticipatory growth and the decay of collective attention together with circadian rhythms. The proposed model allows us to develop a new method for predicting the future page view activity and for clustering time series. To validate our methodology, we collect a corpus of page view data from Wikipedia associated to a range of planned events, that are events which we know in advance will have a fixed date in the future, such as elections and sport events. Our methodology is superior to existing models in both prediction and clustering tasks. Furthermore, restricting to Wikipedia pages associated to association football, we observe that the specific realization of the event, in our case which team wins a match or the type of the match, has a significant effect on the response dynamics after the event. Our work demonstrates the importance of appropriately modeling all phases of collective attention, as well as the connection between temporal patterns of attention and characteristic underlying information of the events they represent.
... Analysis and modeling of popularity dynamics of an online content has been an active area of research [1][2][3][4][5][6][7][8][9][10][11]. A popular method for extracting a topic is to collect all the tweets that mentioned a specific word (keyword) or hashtag and analyze the temporal patterns [1,2,7,10]. ...
Article
Full-text available
During a disaster, social media can be both a source of help and of danger: Social media has a potential to diffuse rumors, and officials involved in disaster mitigation must react quickly to the spread of rumor on social media. In this paper, we investigate how topic diversity (i.e., homogeneity of opinions in a topic) depends on the truthfulness of a topic (whether it is a rumor or a non-rumor) and how the topic diversity changes in time after a disaster. To do so, we develop a method for quantifying the topic diversity of the tweet data based on text content. The proposed method is based on clustering a tweet graph using Data polishing that automatically determines the number of subtopics. We perform a case study of tweets posted after the East Japan Great Earthquake on March 11, 2011. We find that rumor topics exhibit more homogeneity of opinions in a topic during diffusion than non-rumor topics. Furthermore, we evaluate the performance of our method and demonstrate its improvement on the runtime for data processing over existing methods.
... For example, Matsubara et al. [16] proposed SpikeM to reproduce temporal activities on blogs, Google Trends, and Twitter. In addition, Proskurnia et al. [17] proposed a time series model that considers a promotion effect (e.g., promotion through social media and the front page of the petition site) to predict the popularity dynamics of an online petition. A point process model describes the posted times in a probabilistic way by incorporating the self-exciting nature of information spreading [18,19]. ...
Article
Full-text available
Fake news can have a significant negative impact on society because of the growing use of mobile devices and the worldwide increase in Internet access. It is therefore essential to develop a simple mathematical model to understand the online dissemination of fake news. In this study, we propose a point process model of the spread of fake news on Twitter. The proposed model describes the spread of a fake news item as a two-stage process: initially, fake news spreads as a piece of ordinary news; then, when most users start recognizing the falsity of the news item, that itself spreads as another news story. We validate this model using two datasets of fake news items spread on Twitter. We show that the proposed model is superior to the current state-of-the-art methods in accurately predicting the evolution of the spread of a fake news item. Moreover, a text analysis suggests that our model appropriately infers the correction time, i.e., the moment when Twitter users start realizing the falsity of the news item. The proposed model contributes to understanding the dynamics of the spread of fake news on social media. Its ability to extract a compact representation of the spreading pattern could be useful in the detection and mitigation of fake news.
... The majority of work on modeling petition popularity has focused on predicting popularity growth over time based on an initial popularity trajectory (Hale et al., 2013;Yasseri et al., 2017;Proskurnia et al., 2017), e.g. given the number of signatures a petition gets in the first x hours, predicting the total number of signatures at the end of its lifetime. ...
... given the number of signatures a petition gets in the first x hours, predicting the total number of signatures at the end of its lifetime. Since the popularity of a petition also depends on its author's campaign strategies, Asher et al. (2017) and Proskurnia et al. (2017) examined the impact of sharing petitions on Twitter, as a time series regression task. However, none of this work analyzed the petition's content, which is a primary focus in this work, in addition to making the prediction at the time of submission rather than based on early social indicators. ...
... In each case, we regress over the petition signatures, and use fully-connected hidden layers with a ReLU activation function before the final output layer. Note that we log-transform the signature count, consistent with previous work (Elnoshokaty et al., 2016;Proskurnia et al., 2017;Subramanian et al., 2018). ...
Conference Paper
Full-text available
Speech visualisations are known to help language learners to acquire correct pronunciation and promote a better study experience. We present a two-step approach based on two established techniques to display tongue tip movements of an acoustic speech signal on a vowel space plot. First, we use Energy Entropy Ratio to extract vowels; and then, we apply the Linear Predictive Coding root method to estimate Formant 1 and Formant 2. We invited and collected acoustic data from one Modern Standard Arabic (MSA) lecturer and four MSA students. Our proof of concept was able to reflect differences between the tongue tip movements in a native MSA speaker to those of a MSA language learner at a vocabulary level. This paper addresses principle methods for generating features that reflect bio-physiological features of speech and thus, facilitates an approach that can be generally adapted to languages other than MSA.