ArticlePDF Available

Abstract and Figures

We study the dynamics of interactions between a traditional medium, the New York Times journal, and its followers in Twitter, using a massive dataset. It consists of the metadata of the articles published by the journal during the first year of the COVID-19 pandemic, and the posts published in Twitter by a large set of followers of the @nytimes account along with those published by a set of followers of several other media of different kind. The dynamics of discussions held in Twitter by exclusive followers of a medium show a strong dependence on the medium they follow: the followers of @FoxNews show the highest similarity to each other and a strong differentiation of interests with the general group. Our results also reveal the difference in the attention payed to U.S. presidential elections by the journal and by its followers, and show that the topic related to the “Black Lives Matter” movement started in Twitter, and was addressed later by the journal.
This content is subject to copyright. Terms and conditions apply.
1
Vol.:(0123456789)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports
Understanding who talks
about what: comparison
between the information
treatment in traditional media
and online discussions
Hendrik Schawe
1, Mariano G. Beiró
2,3, J. Ignacio Alvarez‑Hamelin
2,3, Dimitris Kotzinos
4 &
Laura Hernández
1*
We study the dynamics of interactions between a traditional medium, the New York Times journal,
and its followers in Twitter, using a massive dataset. It consists of the metadata of the articles
published by the journal during the rst year of the COVID‑19 pandemic, and the posts published
in Twitter by a large set of followers of the @nytimes account along with those published by a set
of followers of several other media of dierent kind. The dynamics of discussions held in Twitter by
exclusive followers of a medium show a strong dependence on the medium they follow: the followers
of @FoxNews show the highest similarity to each other and a strong dierentiation of interests with
the general group. Our results also reveal the dierence in the attention payed to U.S. presidential
elections by the journal and by its followers, and show that the topic related to the “Black Lives
Matter” movement started in Twitter, and was addressed later by the journal.
e debate about the inuence of mass media on social opinion has shown peaks of interest each time that a
technological breakthrough modied the media ecosystem, mainly by increasing the amount of people that
can be reached by broadcasters1. e rst important one, the invention of the printing press by Gutenberg, has
indeed played an important role in the rapid expansion of Calvinism in Europe2, although its general inuence
on the formation of social opinion was mitigated by the fact that most of the population was illiterate. Later,
around the beginning of the 20th century, when the wireless radio transmissions appeared and rapidly became
a popular entertaining medium, discussions about the foreseeable consequences of the popularization of this
new medium were carried in the written press, which by that time had become a traditional one. A review in
the New York Times from May 7th 1899 entitled “Future of Wireless Telegraphy” warned: All the nations of
the earth would be put upon terms of intimacy and men would be stunned by the tremendous volume of news and
information that would ceaselessly pour in upon them”3. Needless to say that the same kind of debates took place
at the arrival of TV broadcasting4.
e rapid growth of digital media certainly triggered again the same kind of discussions but this time, with a
major dierence: the massive data accumulated on social media platforms allows us to perform measurements
about the opinion evolution of large amounts of people. A countless number of articles have addressed dierent
aspects of opinion dynamics based on social networks. A few recent ones are the study of opinion evolution on
dierent selected topics5,6, and the characterisation of structural properties of the interaction networks that result
from the dierent functionalities oered by the platforms (like mentions, retweets, follower-friend in Twitter)7,8.
In particular, there is a recent interest on the formation of information bubbles and echo chambers—strongly con-
nected clusters of people that communicate only weakly with others911. Special attention has also been given
to the diusion of rumours and fake news in relation to the COVID-19 pandemic12, to the extent that the term
infodemics was coined to highlight the parallelism with the diusion of the virus1315.
OPEN
1Laboratoire de Physique Théorique et Modélisation, UMR-8089 CNRS, CY Cergy Paris Université, 95300 Paris,
France. 2Universidad de Buenos Aires, Facultad de Ingeniería, Paseo Colón 850, C1063ACV Buenos Aires,
Argentina. 3CONICET, Universidad de Buenos Aires, INTECIN, Paseo Colón 850, C1063ACV Buenos
Aires, Argentina. 4ETIS UMR 8051 CY Cergy Paris Université, ENSEA, CNRS, 95300 Paris, France. *email:
Laura.Hernandez@cyu.fr
Content courtesy of Springer Nature, terms of use apply. Rights reserved
2
Vol:.(1234567890)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
Nowadays, it seems clear that if media exert an inuence on social opinion it is mainly by setting the terms
of debate or, in the words of B. Cohen16, the press may not be successful much of the time in telling people what to
think, but it is stunningly successful in telling its readers what to think about. is notion is known as the Agenda
Setting Problem17 and is a long-lasting subject of discussion in Political Sciences, Communication, Social Psychol-
ogy , Cognitive Sciences, and Media studies. In particular, an open debate concerns the relationship between the
notion of issues –the subjects that are addressed– and that of frames –the attributes assigned to these subjects
when they are addressed–1820.
In this work we investigate the agenda setting problem, by studying the dynamics of the dierent topics
treated by a traditional medium, e New York Times (NYT) journal, and their relationship with the dynamics
of the public discussions that take place in Twitter among its followers. Here, the term topic designates the sub-
jects treated in both media without attempting to dierentiate between issues and frames. is is the standard
meaning given in textual corpora analysis which has also been used to address the agenda setting problem21,22.
We center our study in the rst year of the COVID-19 pandemics, which by its very nature can be expected
to become an important driver of public attention. Several works studied the evolution of the opinion in Twitter
(and other platforms) during this period, mainly focusing on discussions directly related to health issues, or
public policies related with them2325. Here, on the contrary, we aim at understanding how the dierent topics
that interested the society during this period were addressed both by the media and by the public that is in direct
relation to them, without assuming a priory the existence of any inuence on either direction.
While some recent studies have compared how traditional media and social networks treat a particular topic
of discussion2630, in this work we search for global patterns characterizing each of them. We have collected a large
amount of tweets corresponding to a randomized sample of the over 46M followers of the New York Times (NYT)
ocial Twitter account (@nytimes), during the rst year of the pandemic, along with the metadata of the articles
published by the journal during that period. is sampling guarantees that we are reaching the topics discussed
by users that have expressed an interest in that journal by following its Twitter account. In order to compare with
the behaviour of the followers of dierent media, we have also collected a sample of the tweets published by the
followers of other important media of dierent kinds: written press, radio, television, press agencies.
With this data, we build a semantic network representative of the discussion taking place in social media,
based in the co-occurrence of hashtags -tagging words starting with the symbol “#”-, chosen by Twitter users.
By community detection on this semantic network, we identify the topics of interest discussed in the platform.
Additionally, the keywords chosen by the journalists to tag their articles allow us to identify the topics treated
by the journal.
In the general context of agenda-setting, the extracted topics from a text corpus might operationalize either
frames or social issues, depending on each specic context or dataset, as suggested by dierent works that use this
approach: Danneret al. specically study the correlation between media and public agendas related to organic
food, assuming that each extracted topic is a sub-issue inside the general ‘organic food’ issue31; Albaneseet al.
perform non-negative matrix factorization (NMF) of the document-term matrix to study the coverage of dif-
ferent issues during the 2016 US presidential campaign32; Barberáet al. use LDA to study the issues discussed
by members of Congress, ordinary citizens, and media outlets on Twitter, a-priori assuming that the extracted
topics represent issues33.
With a dierent approach, topic detection has also been used to identify frames: Bleiet al. use LDA to analyze
8000 articles about the US government support for arts between 1986 and 1997, assuming that “when applied
to corpora that cover specic issue domains (like government funding for the arts), topic modeling has some
decisive advantages for rendering operational the idea of frame in media research34; Walter and Ophir apply a
two-step approach based on LDA and community detection to extract frames on three domain-specic corpus35.
is diers from our case study, where we are not interested in a specic subject but we investigate the whole
news’ treatment during a period.
Our general aim is to look for dierences in patterns of information treatment between traditional media
and social media. is will be done by analyzing global measures such as issue salience, attention diversity, rank
diversity, and reaction times.
By an extensive analysis of these data we aim at getting an insight into the following questions:
Who talks about what? Do people that follow a journal talk about the same subjects that are published in the
journal? In that case, is it possible to quantify to what extent?
How the attention that the followers of the journal pay to dierent topics compares to the attention that fol-
lowers of other media pay to those topics?
Can we observe any evidence about the agenda setting problem? If so, in which sense?
As discussed by Scheufele etal., agenda setting is an inherently causal theory, however, the research designs and
statistical methods employed to study it are seldom suited to make causal inferences36. ere is no mystery for
this, as determining causal inferences in a non stationary time series of events is a dicult problem and we will
not address it here. However, one can ask whether the online discussions of the followers of a media are similar,
in general and in its temporal evolution, to the salience of the dierent subjects as treated by the media itself.
Moreover, we show that the comparison of the time evolution of the treatment of information in a top-down
media like the NYT with the discussions of its followers occurring on line allows us to detect if the salience of a
given subject in the NYT precedes or not its follower’s discussions about it.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3
Vol.:(0123456789)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
Results
We collected data for over a year, starting on January 2020, before the outbreak of the pandemic, from follow-
ers of the @nytimes in Twitter, and also from Twitter users who follow Twitter accounts of other media, like @
washingtonpost, @WSJ (e Wall Street Journal), @TIME, continuous information television channels like @
FoxNews, or @CNN, and also press agencies like @AP (Associated Press) and @Reuters. During the same period,
we have also collected the metadata of the publications of the NYT journal, in particular the articles’ headers
(see “Methods”).
In order to automatically determine the topics of discussion in Twitter, we build a hashtag network where two
hashtags are connected if they appear in the same tweet (see “Methods”). is link is weighted by the number of
dierent users that used that pair of hashtags, which diminishes the potential inuence of automated accounts.
is semantic network relies on a single assumption: if two hashtags appear in the same tweet, they are likely
to refer to the same subject. As a given subject may be addressed to by dierent hashtags, the topics of discussion
in the platform are automatically obtained by community detection in the semantic network37,38, and we consider
that each community constitutes a topic of discussion in the platform (see “Methods”). e topics treated in the
NYT articles are labelled by the keywords given by the journalists to characterize each article.
Topic dynamics. e entropy of the vocabulary (hashtags for Twitter and keywords for the NYT journal)
allows for a global comparison of the dynamics of the discussion in both media (see “Methods”). Entropy is a
physical quantity widely used in Statistical Physics to characterize the width of a probability distribution, and
therefore, the information obtained by the observation of a given event of such distribution. is notion is useful
in dierent disciplinary elds and has already been used to capture attention diversity in previous agenda setting
studies21,38. Low values of entropy indicate that the discussion is concentrated around few hashtags or that the
information in the journal can be tagged by a few keywords, while high values reveal that a variety of hashtags
or keywords are being used.
Figure1 (top) shows the temporal evolution of the entropy of the hashtags used by the followers of dierent
media. e dates of important events are marked as temporal references. As the entropy is an extensive variable,
it is not surprising to see that the corresponding values are, in general, larger for Twitter than for the NYT, since
the number of hashtags is much larger than the number of keywords.
Also, the entropy curves corresponding to the @nytimes and @CNN followers, who are signicantly more
numerous than those following the remaining media accounts, are globally larger than the other curves, as more
users naturally lead to more hashtag usages. e unexpected, opposed situation is observed for the @FoxNews
followers, whose entropy is always the lowest in spite of the fact that this is not the smallest group, revealing that
9.0
9.2
9.4
9.6
9.8
10.0
10.2
10.4
10
.
6
Jan Mar MayJul Sep Nov
Entropy
@AP
@nytimes
@CNN
@FoxNews
@Reuters
@TIME
@washingtonpost
@WSJ
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
Jan Mar MayJul Sep Nov
Entropy
Lockdown of Wuhan
first European
diesofCOVID-19
Italianlockdown
NYClockdown
killing of
GeorgeFloyd
Biden nominated
first TV debate
US presidential
elections
Figure1. Entropy of the vocabulary as a function of time. Top panel: Time variation of the hashtags
distribution entropy corresponding to the frequencies of usages of each hashtag by the Twitter followers of the
accounts of the media listed in Table3. Bottom panel: Time variation of the keywords’ distribution entropy
corresponding to the keyword usages in the articles of the New York Times. e vertical lines indicate as a
reference, the time location of important events during the studied period. In both cases the computation has
been done daily with a 7 day rolling averaging (3 days before, 3 days aer) to remove the eect of weekdays.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
4
Vol:.(1234567890)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
@FoxNews followers use fewer hashtags than expected by their number. is is not due to a particular event that
could have interested these followers, but it is constant in time, which indicates a characteristic of those users.
Figure1 (bottom) shows a dierent dynamics for the evolution of the entropy of keywords of the NYT jour-
nal. Although the publications in both supports are naturally attached to real life events, a detailed inspection of
the most popular topics in both media conrms also structural dierences. For instance, the discussions about
the ‘Black Lives Matter movement’ notably give an earlier signal in the entropy of Twitter, while its inuence is
hardly detectable aer the killing of George Floyd in the entropy of the NYT keywords. is observation may
be related to the fact that, unlike Twitter, a journal follows editorial policies which mostly lead to a balanced
reporting about dierent topics.
As expected, the ”Coronavirus” topic dominated both online discussions and also the journal articles, captur-
ing the attention during a long period, as shown by the wide entropy decrease in March-April. As COVID-19
inuences most aspects of life, it appears in many sections of the journal and has considerable inuence on the
entropy.
e ‘Presidential Elections’ topic, visible in both entropy panels, shows a steeper valley for the NYT curve
(deeper than the dominating ”Coronavirus” topic). is reects the importance of the covering of elections by
the journal, as NYT publishes, among others, one article of the election results for each of about 400 districts of
the United States, really focusing on the subject during this period.
A closer look at the topics’ evolution allows us to conrm that the remarkable decrease observed in the
entropy curves around the dates of important social events are indeed caused by topics related to them.
Figure2 shows the evolution of the eight most popular topics which are labelled by the most used hashtags in
the corresponding community. We can also detect the eect that the pandemic had on the public discussion about
subjects that seem a priori completely unrelated to it. For example, the topic labelled by the hashtag #newpro-
lepic, includes other hashtags related to locations and also the hashtag ‘#ashbackfriday’ which is used to tag
pictures. We nd that this topic becomes connected to the coronavirus pandemic via the #stayhome hashtag,
presumably because of the changing nature of pictures posted under these hashtags due to lock-down period.
e fact that our method lets the topics emerge, instead of following a set of hashtags or keywords chosen a
priori, reveals interesting facts. We nd a topic whose popularity may appear as surprising in US society, labelled
by the ‘#endsars’ hashtag. In fact, this topic refers to the demonstrations against police violence sparked by videos
showing brutality of the Nigerian police organization SARS (Special Anti-Robbery Squad, not to be confused
with the SARS-Coronavirus). Aer detecting the users who talked about this subject we found that most of their
accounts were tagged abroad (see Supplementary Material).
e dynamics of the topics treated by the NYT journal, is shown in Fig.3. As for Twitter, we also observe top-
ics reporting events, like ‘Presidential Election’ or ‘Coronavirus’ and those which correspond to regular reporting
of dierent aspects of social life like ‘Books and Literature’. As signaled by the entropy curves, the ‘Black lives
matters’ topic which dominates in Twitter (cf. Fig.2) is not even among the leader keywords depicted here. On
the contrary, the ‘Presidential Election’ and ‘Elections’ keywords, which also includes articles about the results
of the presidential election for all districts, show that this subject dominates the journal attention even in the
background of the pandemics, while it is much less important for its followers which refer to it by the topics
labelled by #maga and #trump.
Rank diversity. e rank-diversity measure was introduced to study the evolution of languages, which takes
place over long periods,39 by analysing the evolution of the rank of single words or n-grams40. By denition (see
Methods”), it has a low value when few hashtags or keywords have occupied the observed rank during the cor-
Jan
Mar May Jul Sep Nov
#co
vid19, #coronavirus
#newprofilepic, #stayhome
#blacklivesmatter, #georgefloyd
#maga, #usa, #america
#love, #quote, #life
#trump, #biden, #gop
#endsars, #endswat
#nfl,#superbowl
2000
4000
6000
8000
10000
12000
14000
16000
counts
Figure2. Dynamics of the eight largest topics of the discussion in Twitter by the followers of @nytimes
account. e topics are identied in the semantic network of co-occurring hashtags by Infomap (one level down,
i.e., path length of 2). ey are labelled by the most used hashtags belonging to each topic. e vertical axis
shows the number of unique users using a hashtag belonging to the community of the corresponding topic on a
given day, smoothed with a rolling average over seven days to eliminate the cycles introduced by weekends.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
5
Vol.:(0123456789)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
responding period. For the rst ranks, this low value reects that the leading subjects were represented by few
hashtags or keywords.
e plots of Fig.4,which concern a much shorter time-scale, dier from the sigmoid curves characteristic
of language evolution. In Twitter and in this particular period, one could have expected to have rst ranks
completely dominated the few variations of COVID-19 hashtags. However, this is not exactly the case. Only
rank one and two have a diversity below
d(r)<0.5
(with
d(1)0.3
and
d(2)0.45
), while the remaining
ones have
d(r)>0.5
. is implies that even the rst ranks are occupied by many dierent hashtags (notice that
d(r)=0.5
corresponds to 180 dierent hashtags in position r; see “Methods”). is doesn’t necessarily mean
that the users were ignoring the pandemic in their discussions but, as it aected very dierent aspects of society,
it can be addressed to by many dierent hashtags. In fact, we have found that the COVID-19 topic is composed
of about 700 hashtags (see Supplementary Material), several of which are very popular and contribute to the
relative variability of the rst ranks.
e rank diversity clearly captures the structural dierence between these two media, showing a completely
dierent shape for the keywords of the NYT journal. Since the keywords are curated, the bottom panel of Fig.4
reveals that dominant topics of the journal are addressed by very few keywords. is shows that the NYT has a
narrower focus on a selected group of topics than their followers on Twitter.
Reaction of the followers of NYT to its publications. e measurement of reposting latency between
the initial issue of a piece of news in a media and the reaction of the receivers has been studied in several contexts
as a proxy of dierent aspects of social behaviour. e interpretations of the measured lag depend on the context
of the study, for example it may relate to cognitive processing speed, associative strength in memory, and spon-
Jan
Mar May Jul Sep Nov
Coron
avirus
Pres.Election
Books and Literature
Elections
Real Estate and Housing
Movies
US Politics andGovernment
Television
100
200
300
400
500
600
700
counts
Figure3. Dynamics for the 8 most used primary keywords tagging the NYT articles (shortened names used in
the labels). e vertical axis shows the number of primary keyword usages on each day, smoothed with a rolling
average over 7 days to eliminate structure introduced by weekends (see Supplementary Material).
0
0.2
0.4
0.6
0.8
02
0
0
0.2
0.4
0.6
0.8
1
02
0
d
r
r
d
r
Figure4. Daily rank diversity d(r) for the 50 most used hashtags (top panel) and keywords (bottom panel).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
6
Vol:.(1234567890)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
taneous cognitive formation of a construct41, but it can also reveal properties of the media where the information
is diused like email and online forums42, or social media43,44. Reaction times have also been related to the public
issue attention, given that the public has a limited capacity to follow the dierent debates that take place in the
public arena where an issue chases another45,46.In the context of agenda setting theory, shorter/longer reaction
times have been associated to cognitive congruence/dissonance, giving rise to the agenda melding process47;
also, a very recent statistical study of a sample of Twitter users showed that multiple issues could distract user’s
attention, thus leading to the low reposting speed48.
Here we investigate the patterns that characterize the reactions of the followers of the @nytimes account to
the articles and tweets published by the journal. We observe two dierent kinds of reaction: a direct one takes
place directly on the Twitter platform, when the followers retweet, quote or reply to the tweets published by the
journal’s account. An indirect reaction, instead, takes place by the means of the ‘share on Twitter’ button of the
website nytimes.com where the followers of the journal can tweet a link to the article of their choice. ese two
kinds of interaction between the journal and its followers are characterized by dierent regular patterns.
Figure5 shows the distribution of reaction times,
t
, the delay between either the tweet of the @nytimes
account and the direct reaction of the user, or the delay between the publication of an article online and the
tweet published by the follower using the website button. e rst observation is the very broad range spanned
by the reaction times, going from seconds to a week. ere is a striking dierence in the shape of the curves cor-
responding to direct reactions, where the distribution of reaction delays seem to t a power-law with a breaking
of the slope around 10 hours, and that of indirect reactions which are much slower and start in general by a very
broad shallow peak followed by a power law decrease again around 10 hours.
e numerous retweets happening within the rst second of the original tweet, suggest the presence of auto-
mated users. e qualitative behaviour of the reaction times is the same for the three direct reactions curves.
Fitting a power law aer the maxima of the distributions, we nd a breaking of the slope from
t1
to a fastest
decrease,
t2.5
.
e extremely similar behavior of all direct interactions suggests that this process may strongly be inu-
enced by the way in which the platform presents the tweets of followed users, where older tweets are pushed
out quickly from a user’s timeline by newer tweets. In this case, the rst retweets of an article may trigger more
retweets from the users that might have lost the original tweet from @nytimes in their timeline, in a manner of
a self exciting process49,50.
e longer reaction times observed for indirect reactions are expected, assuming that followers which are
on the NYT website are more likely to read the article before sharing it, such that the most probable reaction
time is shied to multiple minutes or hours. Also here, we observe a strong decrease at the
10
hour mark. An
important dierence with the direct reactions is that the distribution of response times for link sharing does not
look universal, showing a dierent shape for dierent sections (see Supplementary Material).
Despite the extremely similar shape of the delay distributions of the direct Twitter interactions, the median
delay time uctuates by more than a factor of two for dierent sections, as shown in Table1. Links to articles
1·1010
1·109
1·108
1·107
1·106
1·105
1·104
1
·
103
sec
min.
hour
day
we
ek
t1
t2.5
1·109
1·108
1·107
1·106
1·105
1·104
1·103
sec
min.
hour
day
we
ek
p
(∆
t
), Section:US
t
link
RT
reply
quote
p
(∆
t
), Section:Opinion
t
link
RT
reply
quote
Figure5. Distribution of the delay times,
t
: Elapsed time between the direct reaction of the users (retweet,
reply or quote) and the issue of a tweet from the @nytimes account or between the indirect reaction, retweet of
an the article via the “share” button in the website, and the appearing of the article in the journal (violet lines,
labeled link in the gure.) We show here the distributions corresponding to two sections of the journal in which
the corresponding articles appeared. Top: “U.S.” section Bottom: “Opinion” section (for other sections, see
Supplementary Material).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
7
Vol.:(0123456789)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
about books and art are posted for longer time (median delay of over one day), than national (U.S.) or interna-
tional (World) news (median delay of about half a day), which seems expected considering that book reviews
should remain of interest for longer times than the typical everyday news item. However, the behaviour of the
direct reactions shows the opposite tendency: Books and Art are the sections with the shortest median delays
before retweets, while national and international news are amongst the slowest sections regarding retweets. We
remark that we only consider those tweets that reply directly to the original tweet of @nytimes and not replies
to other replies. erefore, the shorter median delay observed for replies is not caused by fast back and forth
discussions.
Characterization of the users. e dynamical study of the discussion taking place in Twitter during the
considered period shows that some groups of users synchronize in phase or in anti-phase at some particular
moments, revealing that most of them are discussing about the same subject or talking about completely dier-
ent ones, respectively.
A dynamical topic vector, whose dimension is equal to the total number of detected topics, is associated to
each user. Each component of this vector indicates whether the corresponding user has tweeted more or less
than the average population about the corresponding topic, as a function of time (see “Methods”). As we have
determined the communities (topics) in a semantic network that includes the tweets of the followers of all
considered media, this procedure ensures that we do not miss topics which might be of scarce importance for
followers of @nytimes, but relevant for followers of other media.
Users are divided into groups according to the media they are following, among the 10 most followed media
in US, listed in Table3 (see “Methods”). Figure6 shows that many users follow more than a single medium.
Users following a single medium are called exclusive followers. Most of the media considered here hold a neutral
or liberal position on the political spectrum with a similar entropy of their vocabulary, as shown in Fig.1; the
exception being @FoxNews, which is considered politically conservative and whose entropy is the lowest as
discussed in subsection ‘Topic dynamics’.
Users in each media group are compared by measuring how similar their dynamical topic vectors are at
a point in time; this self similarity measure is described in the “Methods. e top panel of Fig.7 shows two
remarkable peaks in the self similarity curves, one by the end of March 2020 which corresponds to New York
city’s lockdown and the other by the end of October 2020, the exception being the self similarity of the curve of
exclusive followers of @FoxNews, which has only one. e bottom panel, shows the self-similarity recomputed
suppressing the “COVID” topic from the topic vectors and the disappearing of the peak of March 2020 conrms
that the synchronization of the discussion corresponds to this event.
Due to the large overlap of followers of dierent media, illustrated in Fig.6, it is not surprising that the self-
similarity curves of non-exclusive followers of dierent media show a qualitatively similar behaviour. However,
scrutinizing the exclusive followers of @nytimes and the exclusive followers of @FoxNews we observe they
behave dierently. When the “COVID” component has been removed from the topic vectors of the users, the
self-similarity of the exclusive followers of @FoxNews is higher than that of the rest of the users (including that
of the exclusive followers of @nytimes), except for the large peak at the end of October that we will discuss later.
Remarkably, the top panel shows that while the followers of @nytimes undergo the synchronization period related
to “COVID” topic, those of @FoxNews on the contrary, decrease their similarity, indicating that the “COVID”
topic does not act as a synchronizing event for them.
Table 1. Median of the delay
t
in minutes for dierent sections and dierent types of interactions shown for
the eleven largest sections sorted by decreasing number of articles assigned to the sections. Despite the shape
of the distribution being very similar (see Supplementary Material), the median delay uctuates by more than
a factor of two depending on the section. e number in parentheses species the standard error in units of the
least signicant digit, obtained via bootstrap resampling51.
Section
tRT
treply
tquote
tlink
U.S. 81.9(5) 52.3(6) 71.0(9) 643(3)
Wor ld 90.5(9) 45.9(7) 73.4(17) 827(6)
Opinion 68(3) 38(2) 83(7) 952(4)
Arts 53(2) 37(2) 46(2) 1526(18)
Business Day 75(1) 49(1) 89(4) 814(7)
Sports 42(2) 27(2) 44(4) 713(15)
Ne w Yor k 85(1) 45(1) 73(2) 597(6)
Books 45(2) 36(3) 40(5) 1646(28)
Style 103(5) 50(3) 107(7) 1065(19)
Movies 53(3) 34(2) 52(4) 1259(35)
Real Estate 29(4) 40(7) 62(20) 1735(38)
All articles 76.7(3) 47.0(3) 67.8(5) 858(2)
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8
Vol:.(1234567890)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
It should not be concluded that exclusive @FoxNews followers at this time do not talk about the #covid topic,
but rather that the selection of topics they talk about becomes more inhomogeneous. In fact, we have found that
the covid topic is not the most used one of this subset of users (see Supplementary Material) .
e very high peak of the end of October is present in all the curves in Fig.7, it corresponds to the #endsars
topic, mentioned above, and it disappears when the corresponding topic is suppressed from the topic vectors.
e cross-similarity curves between exclusive followers of @FoxNews and a randomized sample of all users
is near zero most of the time, and becomes negative around the end of March, where all the other self similari-
ties were increasing. Aer July and before the the #endsars related peak mentioned above, the cross-similarity
approaches zero and so do all the self-similarities curves, with the exception of the @FoxNews exclusive followers.
Showing again that those users talk, in general, about the same topics in the same terms, regardless the external
events that may drive the attention of other users.
Discussion
We have studied the dynamics of interactions between the information agenda of a traditional medium, e
New York Times, and the discussions that its followers hold on Twitter. We also compare with the discussion
held by the followers of other media among the most followed in U.S. involving TV news chains, newspapers,
bi-weekly magazines, and press agencies.
Building a semantic network of hashtags with the only assumption that two hashtags used in the same tweet
refer to the same subject, we are able to automatically detect the topics discussed in Twitter by community
Figure6. Venn diagram of “followship” relations showing the intersection among the followers of @nytimes
(pink) and @FoxNews (green) and the other media (blue).
0
0.05
0.1
Jan MarMay Jul Sep Nov
Similarity
@nytimes excl.
@FoxNews excl.
all ×@FoxNews excl.
@nytimes
@FoxNews
all
0
0.05
0.1
Jan Mar MayJul SepNov
Similarity
@nytimes excl.
@FoxNews excl.
all ×@FoxNews excl.
@nytimes
@FoxNews
all
Figure7. Dynamics of self and cross-similarities corresponding to sub-populations that follow dierent
media accounts in Twitter. For clarity we concentrate on the curves involving the followers of @nytimes and @
FoxNews, along with a randomized sample that includes followers of all media, labelled “all” (more curves in the
Supplementary Material). e labels ‘@nytimes excl.’ and ‘@FoxNews excl.’ refer to the sub-populations whose
members only follow the cited medium. ‘all
×
@FoxNews excl.’ is the cross similarity between the exclusive
followers of @FoxNews in our dataset and all users (including the followers of @FoxNews) in our dataset. Top
panel: Self-similarities of the dierent sub-populations along with the cross-similarity of exclusive followers of
@FoxNews against the set of all users. Bottom panel: Same data recomputed aer the suppression of the #covid
topic from the topic vectors.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9
Vol.:(0123456789)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
detection in this network. For the NYT journal, the topics are identied by the keywords chosen by the journal-
ists to label their articles.
e entropy of hashtags and keywords usages captures the structural dierences among these two kind of
media: the curves of the entropy of the vocabulary used by the followers of all the media in Twitter show very
similar dynamics including minor details, but all of them show a dynamical behaviour that is dierent from
that of the NYT journal. We observe that the journal is much more concerned with political news than its own
followers, as shown by the sudden decrease of keyword entropy located around key political dates, for example,
during the electoral period. Our results show that the entropy of the vocabulary of the set of @FoxNews followers
is signicantly lower than for any other media at any time.
Regarding the agenda setting question, a relevant signal is found around the hashtag #Blacklivesmatter, refer-
ring to the killing of a black citizen during a police intervention. We show that this discussion was originated on
line and was treated by the journal short aerwards.
e analysis of rank diversity of hashtags and keywords uncovers a counter intuitive result: instead of nding
the rst ranks completely dominated by the few forms of COVID-19 hashtags in Twitter, a high variability of
the used hashtags dominates, and only the two rst ranks have relatively low variability, which is nevertheless
high enough so as to contain hundreds of hasthags. e situation is completely dierent for the journal, which
shows a slowly growing rank diversity of keywords, starting by very low values. is dierence is expected as
keywords, unlike hashtags, are curated and correspond to the sections of the journal that obey to a hierarchical
order. Interestingly, the rank diversity in Twitter is also very dierent from that observed in Weibo (the Chinese
version of Twitter)52, which looks more like the rank diversity in the journal where keywords are curated.
e interaction between the journal and its followers has also been explored by studying the patterns observed
in the distribution of time delays of direct and indirect responses of the followers, to the articles and tweets posted
by the journal. e main observation is the broad spectrum spanned by the time delays of the responses going
from seconds up to a week, which may be surprising given the continuous ow of posts in Twitter.
Similar heavy tail behaviour has been identified in studies of the distribution of delays in cascading
processes53,54, where the models proposed to explain these patterns mainly combine preferential attachment
mechanisms with queuing processes55,56. However, here we identify a similar distribution of response times in
a dierent setup: instead of following a single cascading eect triggered by an initial seed, which requires for
the source tweet to be detected by the users who will potentially retweet (hence the preferential attachment
mechanism proposed), we study the behaviour of users who are in principle, automatically exposed to each of
the source tweets because they have decided to follow the journal’s account. is questions the pertinence of the
preferential attachment hypothesis to explain this observed pattern.
On the contrary, the extremely similar behavior of all direct interactions suggests that this process may
strongly be inuenced by a queuing process in the users’ timeline, where older tweets might be pushed out
quickly by newer tweets. In this case, the rst retweets of an article may trigger more retweets from the users
that might have lost the original tweet from @nytimes in their timeline, in a manner of a self exciting process49,50.
It is not straightforward to foresee a single general hypothesis to explain the heavy tailed shape of the delay
times distribution. A detailed analysis conditioned on the section of the NYT in which the articles were pub-
lished, shows a dependence of the delay times on the sections, suggesting that some types of news have longer
lifetimes than others. On the other hand, our analysis of indirect reactions, where users post tweets containing
links to articles of the NYT, i.e., by clicking the ‘share via Twitter’ button on the NYT website, shows reaction
times that are as expected, much slower.
Finally, the dynamical similarity among groups of users allows to detect that, while most of the users syn-
chronize their discussions around the date of lockdown, a singular behaviour is observed for exclusive followers
of @nytimes and of @FoxNews. e similarity of the former, although increasing in this period, is sensitively
lower than the similarity of the global population, while for the latter, it shows in this period, the only long last-
ing decrease of similarity (about a month).
e relatively high and constant values of similarity (except for the large peak related to #endsars) along with
the low entropy of the vocabulary of the exclusive followers of @FoxNews strongly suggest that this group con-
stitutes an echo chamber. Moreover the cross-similarity among exclusive users of @nytimes and @FoxNews, is
almost always negative (except for the singular #endsars peak), which is an objective measurement of the strong
separation of the subjects of interests of these two groups.
Conclusion
We present a dynamical study of the interactions between a traditional medium, the NYT journal, and its fol-
lowers in Twitter, and we compare them with the behaviour of Twitter users who follow other media of dierent
kinds (written press, television, and press agencies). It is important to stress that we are not interested in the
behaviour of a random sample of Twitter users but we are focusing instead on Twitter users that are interested
in news, who could be thought to be a priori more susceptible to media inuence.
Our results show that as long as the users follow dierent media, the similarity among them is almost inde-
pendent of the media sources they follow. On the contrary, the similarity becomes signicantly dierent when
observing sub-populations of exclusive users, those who follow one medium account exclusively. We also show
that this dierence between sub-populations is dominant around the rst wave of COVID in the U.S., which in
spite of being a public health topic that aects all populations, induces a dierential behaviour on sub-populations
who exclusively follow dierent media.
One important feature of our study is the fact that we avoid introducing selection bias by choosing a priori
some group of words. Here we keep the whole discussion as it is and we let the topics emerge from the community
detection process on the semantic networks.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
10
Vol:.(1234567890)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
Finally, we cannot stress enough the importance of choosing dierent independent quantities to analyse the
data: it is the combination of the entropy of vocabulary with the similarity among the users which allows to
objectively show the singularity of the exclusive followers of @FoxNews with respect to the baseline population.
In the same way, the comparison of the dynamics of entropy, topic evolution, and similarity, shows that although
#elections is a hot topic for the journal, the synchronization of its followers around it, although measurable, is
relatively lower compared with the #Blacklivesmatter topic. Moreover, in spite of the general diculty of detect-
ing causality, the comparison of the dynamics of entropy and topic evolution shows that the latter originated on
Twitter before being treated by the NYT.
In summary, we present an automatic detection method of discussion topics on social networks, which along
with a set of independent measures on the obtained data, brings a lot of information with a minimum of assump-
tions (here the semantic link among hashtags and among keywords), and should be the entrance gate to more
detailed analysis that could focus on the treatment of specic topics or the detailed behaviour of specic groups.
Methods
In this section we present the data set used in this work, explaining the rationale leading to this particular choice,
along with the procedure used for its collection from dierent data sources. We dene the semantic networks built
with these data and we explain how we automatically detect the set of topics under discussion and the evolution
of the attention each user pays to them.
Moreover we also give the mathematical denition of the observables used to characterize the dynamics of
discussion in Twitter and that of the treatment of the news by the NYT over the studied period.
Data collection. Data from Twitter.
We rst recall briey the standard vocabulary used to name dierent elements of the Twitter micro-blogging
platform. Users can engage on many dierent levels with each other. Each user has a Twitter handle, which starts
with ‘@’. ey can write tweets, short messages consisting of up to 280 characters, which may also contain images,
videos or sound, and which are shown to their followers -other users subscribing to the their accounts- on their
timeline, the list of latest posted tweets. However, even non-followers can see and interact with them (except for
private tweets which are not part of our dataset). Users can retweet the tweet of another user, which means that
they share this tweet with all their followers. ey can quote a tweet, meaning that they republish the original
tweet with a comment. Finally, they can reply to a tweet, which starts a discussion connected to the original tweet.
Tweets may contain hashtags, which are arbitrary strings of characters prexed by the character ‘#’ , oen used
to tag the tweet. Tweets can contain a URL, which typically links to an external website.
Due to the very large number of followers (about 46 million) of the @nytimes, the ocial Twitter account of
the NYT, we have chosen for this study a random sample of them, according to the following procedure:
We rst obtained the list of the user ids of all followers of @nytimes, using the Twitter’s ocial REST API.
is list was collected over a few days in the last week of June 2020.
We randomized the obtained list.
On July 1st 2020, we requested up to the last 3200 tweets (this number is a limitation of the Twitter API) of
a sample of about 8M of these accounts.
Roughly every 2 months we requested, for all users in our sample for which we already found tweets for the
year 2020, the new tweets they published since our last query.
Table2 gives the main characteristics of the data used for this study.
At the beginning of March 2021, we had collected up to half a billion tweets published by more than 8M
(8,151,587) followers.
As it is well known that only a minority of Twitter users include their geolocalization in their prole, we
have chosen not to control for this variable so as to avoid articially diminishing the number of collected users.
However, since the US is the largest market both for Twitter and NYT, we expect that most followers are indeed
located in US. As a consequence, although we cannot rule out that the dataset contains tweets of users living
abroad, we will naturally focus on events that are relevant to the US in order to tag the chronology of the study.
e pertinence of this choice is supported by the fact that topics which are popular in the US are dominating
the discussion, and we show that it is possible to identify the rare exceptions.
Table 2. Data collected from Twitter. Top panel: random sample of the about 46M followers of the NYT.
Bottom panel: followers of the other media described in the text.
@nytimes Total collected users 8’151’587
Total collected tweets 502’647’015
Number of tweets with # 83’237’523
Number of distinct # 12’937’293
Number of users quoting/rt/reply 226’630
Other media Total collected users 1’771’170
Total collected tweets 96’551’331
Content courtesy of Springer Nature, terms of use apply. Rights reserved
11
Vol.:(0123456789)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
is dataset, in the form of user and tweet ids, is available at57.
Although our method enables us to collect a large sampling of a specic subpopulation of Twitter, avoiding
biases that may be introduced by ltering, for example by hashtags, we discuss below some limitations that might
still remain in this data set, along with an estimation of their potential inuence in our study.
Due to the limit set by the API (it delivers only the last 3200 Tweets of the requested user), we risk to system-
atically miss tweets of very active accounts: those who would have tweeted more than 3200 tweets between
January 1st and July 1st or those who would have exceeded that limit during the
2
month period of each
collection step hold aer July 1st 2020 until the end. Although most of such accounts are automated (bots)
or institutional ones, like @nytimes itself, one cannot rule out a priori the existence of accounts of very active
individuals. Notice that such users need to write at least about 18 tweets per day, on average, in the rst six
months and many more in the following collection periods (every two months), which is certainly possible
but not typical of the standard user. Nevertheless, in order to evaluate to what extent our sample is likely to
contain incomplete users -accounts for which we could not get the full set of the content they published- we
set a conservative criterion to detect them. We count the number of users for which we collected more than
3000 tweets. is strict bound leaves a generous room for deleted tweets, which although not downloaded,
still count against the 3200 limit. Since we can collect at most 3200 tweets at each point in time, we cannot
exclude a priori, that a user wrote all these tweets and even more during one of our collections cycles (and
few or none in the other cycles). However, we do not observe such inhomogeneous behavior, in spite of the
fact that our sample contains users who exceed 18000 tweets in all the period. We are therefore condent
that the strict bound set here overestimates the fraction of incomplete users considerably. According to this
strict criterion, we estimate that only less than
0.4%
of all accounts are incomplete. us, the potential error
should be small, in particular considering that our study makes a stronger usage of the number of unique
users rather than the number of tweets.
e list of followers was xed at the beginning of the study, such that we do not include users which started
following @nytimes aer July 1st 2020, slightly underestimating the inuence of new and short lived accounts.
In the same way we cannot exclude that some accounts we sampled stopped following @nytimes at some
point during our period of study.
Naturally, we do not consider in our sample tweets from deleted, suspended and private accounts.
Following a similar technique, we also collected a smaller sample of about a million users who do not follow @
nytimes but who follow at least one of other seven most followed US news media accounts. We do not include
followers of secondary accounts (e.g., those of “breaking news”, like @CNNbreaking).
Table3 describes the dierent sources from where we have collected the sample of Twitter users interested
in US news that we have studied in this work.
We collected a uniform sample of these users proportional to the number of followers each medium has, in
the last weeks of March 2021. is means that the problem of missing tweets from very active accounts is worse
for this data set. However, the fraction of incomplete accounts remains small
<0.3%
(even smaller than for the
@nytimes dataset, because we only had one cycle causing fewer false positives). Again, this dataset in the form
of user and tweet ids is available at58.
Finally, we also collected all tweets of the @nytimes account for the period, referenced by retweets, quotes
or replies of their followers.
In this study we only use metadata of the tweets: hashtags normalized to lower case (i.e., we treat #covid-19
and #COVID-19 as the same hashtag) and URLs. We do not extract further data from the remainder of the tweet,
neither text nor images nor videos. Nevertheless, we will show that this minimalist information contained in the
tweets already provides a rich image of the public discussion in the platform.
Data from the NYT. Table4 describes the main gures involved in the analysis of the publications of the NYT
journal during the same period.
In addition to the data from Twitter users, we collected the metadata of all articles published by the NYT either
in print or online using their archive API. is dataset includes in particular, a set of keywords for each article,
which lists subjects, persons and locations referred to in the article. Moreover, it provides unique identiers,
Table 3. Number of followers of the Twitter accounts of the studied media.
Name Media type Followers
CNN TV news 53,242,242
FoxNews TV news 20,121,721
Reuters news agency 23,238,148
Associated Press news agency 15,127,593
TIME bi-weekly magazine 18,065,949
Wall Street Journal newspaper 18,705,760
e Washington Post newspaper 17,791,609
e New York Times newspaper 46,808,154
Content courtesy of Springer Nature, terms of use apply. Rights reserved
12
Vol:.(1234567890)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
which we used to connect URLs encountered in tweets, to a NYT article, an otherwise non trivial task, since an
article can have multiple valid URLs.
e dataset that indicates which tweets link to which articles is also available at57.
Observables. We detail in this section the quantities or observables that we used in this study.
Entropy. e entropy of the hashtag distribution over a timeframe t is dened as:
where
pt(i)
is the probability distribution calculated as the ratio between the number of unique users that have
used hashtag i and the number of dierent pairs (hashtag, user) within the time frame t. By considering unique
users we diminish the inuence of very active accounts (e.g., spammers). We calculate the entropy daily with a
rolling time frame of seven days to remove the well known inuence of the lower activity on weekends.
Topic detection. In this study we are interested in comparing the dynamics of subjects published by a traditional
medium, like the NYT, where professionals choose the information to be issued, with the dynamics of discussion
that its followers hold on the Twitter platform. To do so one needs to identify the topics that are discussed in both
media. e literature on topic modeling is quite extense, and several unsupervised models exist that can extract
topics from textual corpora, either based on semantic network analysis and/or topic modeling59,60.
In Twitter we can adopt hashtags, which are used to tag the messages, as a proxy for the subject of the tweet.
However, multiple hashtags may address the same topic. A common strategy to follow the discussion about a
topic is to pre-select the hashtags that are supposed to be related to the topic. Here we use a dierent approach
where the topics emerge from a semantic network of hashtags37,61. e vertices of this network are the hashtags
found in our dataset, and the weighted edge between two nodes represents the number of dierent users that used
those hashtags together in at least one of their tweets. In clear, if the same user publishes many tweets including
the same pair of hashtags, it contributes to the weight only once. e rationale behind this construction is that
two hashtags used in the same tweet refer to the same subject; in fact, previous work has shown that hashtag
co-occurrences in tweets are mostly coupled with semantic relations62. Finally, in order to avoid spurious rela-
tionships we set a threshold for the link to be meaningful and we prune all edges whose weight is below 10. In
this way, hashtags talking about the same topic should be strongly connected and synonymous hashtags, which
only seldom appear in the same tweet, should be strongly connected to the same common nodes.
By performing community detection on the semantic network, we detect the groups of hashtags that are more
tightly connected among them than with the rest63. We identify each community with a topic of discussion in
the platform.
is topic-community identication may suer from some ambiguities because some hashtags can belong
to multiple topics. For example, if we use OSLOM264 for community detection, which allows for community
overlap, we nd that #covid19 which inuences most aspects of life, is associated with more than 10 communi-
ties. Since overlapping communities are hard to interpret, we nally chose a community detection algorithm,
Infomap65, which provides a disjoint partition. In this case #covid19 will be assigned to one topic. To illustrate
the density of this network, a small fraction of it (the induced subgraphs of
1.5%
of the most co-used hashtags)
is represented in Fig.S5 of the Supplementary Material.
For keywords obtained from NYT articles, we do not need to perform such a topic analysis, since they are
manually curated to already describe topics.
Rank diversity. e rank r of an entity (here either hashtags or keywords) is its position in the list of all enti-
ties occurring within a time period sorted by decreasing number of usages. Following39,40,52, we dene the rank
diversity d(r) over a time frame
with a time resolution
δ
as the number of of dierent entities occupying rank
r over the
k
=
�/δ
time spans normalized by k. It therefore can assume values in [1/k,1], where 1/k signals that
only a single entity was observed on the corresponding rank and 1 that the entity changed for every period. Here,
we study the
=366
days of 2020 and use a resolution of
δ=1
day (starting at 0:00 UTC).
is measures how consistent topics of interests are. Low values signal little uctuations in the importance of
the entities, while high values suggest high uctuation. If d(r) increases with the rank r, it signals that the really
important topics are more consistent than minor topics. A decrease could happen if the entities are articially
curated, e.g., limited to a certain number.
(1)
S
t=−
i
pt(i)ln pt(i)
.
Table 4. Data concerning the publications of the NYT journal in the considered period.
Number of articles 62,138
Number of tweets posted by @nytimes 33,446
Links to articles in @nytimes tweets 20,496
Number of distinct keywords 45,016
Content courtesy of Springer Nature, terms of use apply. Rights reserved
13
Vol.:(0123456789)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
User similarity. To study how similar users are in regard to the interest they pay to dierent topics, we applied
the method used in61. We describe the interests of each user i by means of a user description vector
di
of dimen-
sion
NT
, the number of topics (communities) found, which informs about the topic preferences of user i.
is description vector is computed in the following way:
1. We build a user-topic matrix, U, where each element,
uij
, gives the absolute number of times that user i has
used a hashtag that belongs to the community identied as topic j.
2. We compute the global topic vector
T
=
N
iui
, w here
ui
is the i-th row vector in the user-topic matrix, and
N the size of the population. is vector gives the total number of times that each topic has been used by all
the users in the dataset.
3. We dene the vector
vi
which gives the dierence between the frequency of usage of the topic by user i and
its global frequency of usage in the population.
Here the norm
||.||1
must be understood as the sum over all the components in the space of dimension
NT
.
e vectors of Eq.(2) thus inform about whether user i has addressed each of the identied topics more or
less than on average.
4. As we are only interested in the orientation of the description vectors, they are normalized as:
where
||vi||2
is the standard euclidean norm in the topic hyperspace of dimension
NT
.
en, in order to track the evolution of the users’ interests we apply the aforementioned procedure to sliding
time windows of 7 days, thus producing a series of matrices
Ut
, one for each day. We shall call
di
t
the description
vector for user i at discrete time t.
We dene the similarity between a pair of users i and j as the cosine similarity between the corresponding
description vectors. As the latter are normalized, the similarity reduces to the inner product:
We also dene the average description vector of a group of users G, of cardinality |G|:
Now we can introduce two indices measuring collective similarities:
e cohesion of a group of users, intra-group similarity or self-similarity, s(G,G), dened as the average simi-
larity between all its users, and computed in the following way:
e cross-group similarity is the average similarity between members of dierent groups
G1
and
G2
, namely
s(G1,G2)
:
Data availability
Dataset of user and tweet ids of followers of @nytimes is available at: https:// doi. org/ 10. 5281/ zenodo. 47366 51.
Dataset of user and tweet ids of followers of other news media is available at: https:// doi. org/ 10. 5281/ zenodo.
47368 16.
Received: 30 August 2022; Accepted: 21 February 2023
References
1. Hard, W. Radio and public opinion. Ann. Am. Acad. Polit. Soc. Sci. 177, 105–113 (1935).
2. Gilmont, J.-F. La Réforme et le livre : l’Europe de l’imprimé (1517-v. 1570) (Paris: Les Editions du Cerf, 1990). https:// www. persee.
fr/ doc/ bec_ 0373- 6237_ 1991_ num_ 149_1_ 450612_ t1_ 0174_ 0000_ 001.
(2)
v
i=
u
i
||ui||1
T
||T||1
.
(3)
d
i=
v
i
||vi||2
,
(4)
s
(
i,j
)=
di,dj
.
(5)
D
G=
iG
d
i
|G|
.
(6)
s
(G,G)=
i,jG
s(i,j)
|G|
2=iGdi,DG
|G|
(7)
=DG
,
DG=||DG||2,
(8)
s
(G1,G2)=
iG1,jG2
s
(
i,j)
|G1|·|G2|
(9)
=
iG1
d
i
,D
G2
|G1|
=DG1,DG2
.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
14
Vol:.(1234567890)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
3. https:// www. nytim es. com/ 1899/ 05/ 07/ archi ves/ future- of- wirel ess- teleg raphy. html.
4. Douglas, S. J. Public radio and television in America: A political history. Public Opin. Q. 63, 439–441 (1999).
5. Gaumont, N., Panahi, M. & Chavalarias, D. Reconstruction of the socio-semantic dynamics of political activist Twitter networks-
Method and application to the 2017 French presidential election. PLoS ONE 13, 1–38 (2018).
6. Boutet, A., Kim, H. & Yoneki, E. What’s in Twitter, I know what parties are popular and who you are supporting now!. Soc. Netw.
Anal. Min. 3, 1379–1391 (2013).
7. Himelboim, I., Smith, M. & Shneiderman, B. Tweeting apart: Applying network analysis to detect selective exposure clusters in
Twitter. Commun. Methods Meas. 7, 195–223 (2013).
8. Barberá, P. Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Polit. Anal. 23, 76–91
(2015).
9. Nikolov, D., Oliveira, D. F., Flammini, A. & Menczer, F. Measuring online social bubbles. PeerJ Comput. Sci. 1, e38 (2015).
10. Cinelli, M., Morales, G. D. F., Galeazzi, A., Quattrociocchi, W. & Starnini, M. e echo chamber eect on social media. Proc. Natl.
Acad. Sci. USA 118, e2023301118 (2021).
11. Choi, D. et al. Rumor propagation is amplied by echo chambers in social media. Sci. Rep. 10, 1–10 (2020).
12. Lazer, D. M. J. et al. e science of fake news. Science 359, 1094–1096 (2018).
13. Gallotti, R., Valle, F., Castaldo, N., Sacco, P. & De Domenico, M. Assessing the risks of infodemics in response to COVID-19
epidemics. Nat. Hum. Behav. 4, 1285–1293 (2020).
14. Yang, K.-C. et al. e COVID-19 Infodemic: Twitter versus Facebook. Big Data Soc. 8, 20539517211013860 (2021).
15. Shahi, G. K., Dirkson, A. & Majchrzak, T. A. An exploratory study of COVID-19 misinformation on Twitter. Online Soc. Netw.
Media 22, 100104 (2021).
16. Cohen, B. C. Press and Foreign Policy (Princeton University Press, 2015).
17. McCombs, M. E. & Shaw, D. L. e agenda-setting function of mass media. Public Opin. Q. 36, 176–187. https:// doi. org/ 10. 1086/
267990 (1972).
18. Price, V. & Tewksbury, D. News values and public opinion: A theoretical account of media priming and framing. Prog. Commun.
Sci. 13, 173–212 (1997).
19. McCombs, M. & Valenzuela, S. e agenda-setting theory. Cuadernos de información 44–50 (2007).
20. Aruguete, N. Agenda setting y framing: un debate teórico inconcluso. Más Poder Local (2017).
21. Pinto, S., Albanese, F., Dorso, C. O. & Balenzuela, P. Quantifying time-dependent media agenda and public opinion by topic
modeling. Physica A 524, 614–624 (2019).
22. Dehler-Holland, J., Schumacher, K. & Fichtner, W. Topic modeling uncovers shis in media framing of the German renewable
energy act. Patterns 2, 100169 (2021).
23. Sacco, P. L., Gallotti, R., Pilati, F., Castaldo, N. & Domenico, M. D. Emergence of knowledge communities and information cen-
tralization during the COVID-19 pandemic. Soc. Sci. Med. 285, 114215 (2021).
24. Cinelli, M. et al. e COVID-19 social media infodemic. Sci. Rep. 10, 16598 (2020).
25. Ferrara, E., Cresci, S. & Luceri, L. Misinformation, manipulation, and abuse on social media in the era of covid-19. J. Comput. Soc.
Sci. 3, 271–277 (2020).
26. Vargo, C. J., Basilaia, E. & Shaw, D. L. Event versus issue: Twitter reections of major news, a case study. Stud. Media Commun. 9,
215–239 (2015).
27. Morris, D. S. Twitter versus the traditional media: A survey experiment comparing public perceptions of campaign messages in
the 2016 U.S. presidential election. Soc. Sci. Comput. Rev. 36, 456–468 (2018).
28. Bridgman, A. et al. Infodemic pathways: Evaluating the Role at Traditional And Social Media Play In Cross-national Informa-
tion Transfer. Front. Polit. Sci. 3, 20 (2021).
29. Su, Y. & Borah, P. Who is the agenda setter? Examining the intermedia agenda-setting eect between twitter and newspapers. J.
Inf. Technol. Polit. 16, 236–249 (2019).
30. Ceron, A. Twitter and the traditional media: Who is the real agenda setter? In APSA 2014 Annual Meeting Paper (2014). https://
ssrn. com/ abstr act= 24543 10.
31. Danner, H., Hagerer, G., Pan, Y. & Groh, G. e news media and its audience: Agenda setting on organic food in the United States
and Germany. J. Clean. Prod. 354, 131503 (2022).
32. Albanese, F., Pinto, S., Semeshenko, V. & Balenzuela, P. Analyzing mass media inuence using natural language processing and
time series analysis. J. Phys. 1, 025005 (2020).
33. Barberá, P. et al. Who leads? who follows? Measuring issue attention and agenda setting by legislators and the mass public using
social media data. Am. Polit. Sci. Rev. 113, 883–901 (2019).
34. DiMaggio, P., Nag, M. & Blei, D. Exploiting anities between topic modeling and the sociological perspective on culture: Applica-
tion to newspaper coverage of us government arts funding. Poetics 41, 570–606 (2013).
35. Walter, D. & Ophir, Y. News frame analysis: An inductive mixed-method computational approach. Commun. Methods Meas. 13,
248–266 (2019).
36. Scheufele, D. A. & Tewksbury, D. Framing, agenda setting, and priming: e evolution of three media eects models. J. Commun.
57, 9–20 (2007).
37. Cardoso, F. M., Meloni, S., Santanchè, A. & Moreno, Y. Topical alignment in online social systems. Front. Phys. 7, 58 (2019).
38. Boydstun, A. E., Bevan, S. & omas, H. F. III. e importance of attention diversity and how to measure it. Policy Stud. J. 42,
173–196 (2014).
39. Cocho, G., Flores, J., Gershenson, C., Pineda, C. & Sánchez, S. Rank diversity of languages: Generic behavior in computational
linguistics. PLoS ONE 10, e0121898. https:// doi. org/ 10. 1371/ journ al. pone. 01218 98 (2015).
40. Morales, J. A. et al. Rank dynamics of word usage at multiple scales. Front. Phys. 6, 45 (2018).
41. Fazio, R.H. A practical guide to the use of response latency in social psychological research. Research Methods in Personality and
Social Psychology 74–97 (1990).
42. Avrahami, D. & Hudson, S.E. Responsiveness in instant messaging: predictive models supporting inter-personal communication.
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2006).
43. Fan, C., Jiang, Y., Yang, Y., Zhang, C. & Mostafavi, A. Crowd or hubs: Information diusion patterns in online social networks in
disasters. Int. J. Disast. Risk Reduct. 46, 101498 (2020).
44. Zhu, X., Kim, Y. & Park, H. Do messages spread widely also diuse fast? Examining the eects of message characteristics on
information diusion. Comput. Hum. Behav. 103, 37–47 (2020).
45. Newig, J. Public attention, political action: e example of environmental regulation. Ration. Soc. 16, 149–190 (2004).
46. Ripberger, J. T. Capturing curiosity: Using internet search trends to measure public attentiveness. Policy Stud. J. 39, 239–259 (2011).
47. Aruguete, N. & Calvo, E. Time to# protest: Selective exposure, cascading activation, and framing in social media. J. Commun. 68,
480–502 (2018).
48. Guan, L., Liang, H. & Zhu, J. J. Predicting reposting latency of news content in social media: A focus on issue attention, temporal
usage pattern, and information redundancy. Comput. Hum. Behav. 127, 107080 (2022).
49. Hawkes, A. G. Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 83–90 (1971).
50. Rizoiu, M.-A., Lee, Y., Mishra, S. & Xie, L. A Tutorial on Hawkes Processes for Events in Social Media.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
15
Vol.:(0123456789)
Scientic Reports | (2023) 13:3809 | https://doi.org/10.1038/s41598-023-30367-8
www.nature.com/scientificreports/
51. Young, P. Everything You Wanted to Know About Data Analysis and Fitting but Were Afraid to Ask. SpringerBriefs in Physics.
(Springer International Publishing, 2015).
52. Cui, Hao & Kertész, János. Attention dynamics on the Chinese social media Sina Weibo during the COVID-19 pandemic. EPJ
Data Sci. 10, 8 (2021).
53. Lu, Y., Zhang, P., Cao, Y., Hu, Y. & Guo, L. On the frequency distribution of retweets. Procedia Comput. Sci. 31, 747–753 (2014).
https:// www. scien cedir ect. com/ scien ce/ artic le/ p ii/ S1877 05091 40050 06. 2nd International Conference on Information Technology
and Quantitative Management, ITQM 2014.
54. Zhao, Q., Erdogdu, M.A., He, H.Y., Rajaraman, A. & Leskovec, J. Seismic: A self-exciting point process model for predicting tweet
popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’15,
1513–1522 (Association for Computing Machinery, 2015). https:// doi. org/ 10. 1145/ 27832 58. 27834 01.
55. Mathews, P., Mitchell, L., Nguyen, G. & Bean, N. e nature and origin of heavy tails in retweet activity. In Proceedings of the 26th
International Conference on World Wide Web Companion, WWW ’17 Companion, 1493–1498 (International World Wide Web
Conferences Steering Committee, Republic and Canton of Geneva, CHE, 2017).
56. Crane, R. & Sornette, D. Robust dynamic classes revealed by measuring the response function of a social system. Proc. Natl. Acad.
Sci. USA 105, 15649–15653 (2008).
57. Schawe, H. Dataset of User and Tweet Ids of Followers of @nytimes (2021). https:// zenodo. org/ record/ 47366 51.
58. Schawe, H. Dataset of User and Tweet Ids of Followers of News Outlet Media Accounts (2021). https:// zenodo. org/ record/ 47368 16.
59. Landauer, T. K., Foltz, P. W. & Laham, D. An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998).
60. Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).
61. Reyero, T. M., Beiró, M. G., Alvarez-Hamelin, J. I., Hernández, L. & Kotzinos, D. Evolution of the political opinion landscape
during electoral periods. EPJ Data Sci. 10, 31 (2021).
62. Türker, İ & Sulak, E. E. A multilayer network analysis of hashtags in twitter via co-occurrence and semantic links. Int. J. Mod. Phys.
B 32, 1850029 (2018).
63. Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010).
64. Lancichinetti, A., Radicchi, F., Ramasco, J. J. & Fortunato, S. Finding statistically signicant communities in networks. PLoS ONE
6, 1–18 (2011).
65. Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci.
USA 105, 1118–1123 (2008).
Acknowledgements
e authors acknowledge the OpLaDyn grant obtained in the 4th round of the Trans-Atlantic Platform Digging
into Data Challenge (2016-147 ANR OPLADYN TAP-DD2016). H.S. acknowledges grant Labex MME-DII (Grant
No. ANR reference 11-LABEX-0023). J.I.A.H. and M.G.B. acknowledge the nancial support of UBACyT-2018
20020170100421BA and the OpLaDyn grant HJ-253570 Annex IF-2017-14123506-APN-DNCEII#MCT.
Author contributions
L.H., M.G.B., J.I.A.H proposed the research questions and the methodology, H.S. collected, curated, processed
data, H.S. and M.G.B wrote the code for processing data, H.S., L.H., M.G.B., J.I.A.H, and D.K. performed result
analysis, L.H., M.G.B and H.S wrote the manuscript. All authors reviewed the manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Supplementary Information e online version contains supplementary material available at https:// doi. org/
10. 1038/ s41598- 023- 30367-8.
Correspondence and requests for materials should be addressed to L.H.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the articles Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
© e Author(s) 2023
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... One of the classic research lines in sociophysics is the temporal dynamics of the public agenda, which involves tracking topics across various domains-economics, politics, culture-that emerge from processing large sets of articles published in the media (Pinto et al., 2019;Gozzi et al., 2020;Schawe et al., 2023). At the other end of the scale, the temporal consumption of individual words, such as X (ex Twitter) hashtags, has also been studied (Altmann et al., 2013;Lorenz-Spreen et al., 2019;Lin et al., 2021;Pardo Pintos et al., 2022). ...
Article
Full-text available
The digital revolution has transformed the exchange of information between people, blurring the traditional roles of sources and recipients as active and passive entities. To study this, we build on a publicly available database of quotes, organized as units of information flowing through media and blogs with minimal distortion. Building on this, we offer an innovative interpretation of the observed temporal patterns through a minimal model with two ingredients: a two-way feedback between sources and recipients, and a delay in the media’s response to activity on blogs. Our model successfully fits the variety of observed patterns, revealing different attention decays in media and blogs, with rebounds of information typically occurring between 1 and 4 days after the initial dissemination. More important perhaps, the model uncovers a functional relationship between the rate of information flow from media to blogs and the decay of public attention, suggesting a simplification in the mechanisms of information exchange in digital media. Although further research is required to generalize these findings fully, our results demonstrate that even a bare-bones model can capture essential mechanisms of information dynamics in the digital environment.
Article
Full-text available
We present a study of the evolution of the political landscape during the 2015 and 2019 presidential elections in Argentina, based on data obtained from the micro-blogging platform Twitter. We build a semantic network based on the hashtags used by all the users following at least one of the main candidates. With this network we can detect the topics that are discussed in the society. At a difference with most studies of opinion on social media, we do not choose the topics a priori, they emerge from the community structure of the semantic network instead. We assign to each user a dynamical topic vector which measures the evolution of her/his opinion in this space and allows us to monitor the similarities and differences among groups of supporters of different candidates. Our results show that the method is able to detect the dynamics of formation of opinion on different topics and, in particular, it can capture the reshaping of the political opinion landscape which has led to the inversion of result between the two rounds of 2015 election.
Article
Full-text available
The COVID-19 pandemic has occurred alongside a worldwide infodemic where unprecedented levels of misinformation have contributed to widespread misconceptions about the novel coronavirus. Conspiracy theories, poorly sourced medical advice, and information trivializing the virus have ignored national borders and spread quickly. This information spread has occurred despite generally strong preferences for domestic national media and social media networks that tend to be geographically bounded. How, then, is (mis)information crossing borders so rapidly? Using social media and survey data, we evaluate the extent to which consumption and propagation patterns of domestic and international traditional news and social media can help inform theorizing about cross-national information spread. In a detailed case study of Canada, we employ a large multi-wave survey and a massive data set of Canadian Twitter users. We show that the majority of misinformation circulating on Twitter that is shared by Canadian accounts is retweeted from U.S.-based accounts. Moreover, exposure to U.S.-based media outlets is associated with COVID-19 misperceptions and increased exposure to U.S.-based information on Twitter is associated with an increased likelihood to post misinformation. We thus theorize and empirically identify a key globalizing infodemic pathway: disregard for national origin of social media posting.
Article
Full-text available
Significance We explore the key differences between the main social media platforms and how they are likely to influence information spreading and the formation of echo chambers. To assess the different dynamics, we perform a comparative analysis on more than 100 million pieces of content concerning controversial topics (e.g., gun control, vaccination, abortion) from Gab, Facebook, Reddit, and Twitter. The analysis focuses on two main dimensions: 1) homophily in the interaction networks and 2) bias in the information diffusion toward like-minded peers. Our results show that the aggregation in homophilic clusters of users dominates online dynamics. However, a direct comparison of news consumption on Facebook and Reddit shows higher segregation on Facebook.
Article
Full-text available
During the COVID-19 pandemic, social media has become a home ground for misinformation. To tackle this infodemic, scientific oversight, as well as a better understanding by practitioners in crisis management, is needed. We have conducted an exploratory study into the propagation, authors and content of misinformation on Twitter around the topic of COVID-19 in order to gain early insights. We have collected all tweets mentioned in the verdicts of fact-checked claims related to COVID-19 by over 92 professional fact-checking organisations between January and mid-July 2020 and share this corpus with the community. This resulted in 1500 tweets relating to 1274 false and 226 partially false claims, respectively. Exploratory analysis of author accounts revealed that the verified twitter handle(including Organisation/celebrity) are also involved in either creating(new tweets) or spreading(retweet) the misinformation. Additionally, we found that false claims propagate faster than partially false claims. Compare to a background corpus of COVID-19 tweets, tweets with misinformation are more often concerned with discrediting other information on social media. Authors use less tentative language and appear to be more driven by concerns of potential harm to others. Our results enable us to suggest gaps in the current scientific coverage of the topic as well as propose actions for authorities and social media users to counter misinformation.
Article
Full-text available
Understanding attention dynamics on social media during pandemics could help governments minimize the effects. We focus on how COVID-19 has influenced the attention dynamics on the biggest Chinese microblogging website Sina Weibo during the first four months of the pandemic. We study the real-time Hot Search List (HSL), which provides the ranking of the most popular 50 hashtags based on the amount of Sina Weibo searches. We show how the specific events, measures and developments during the epidemic affected the emergence of different kinds of hashtags and the ranking on the HSL. A significant increase of COVID-19 related hashtags started to occur on HSL around January 20, 2020, when the transmission of the disease between humans was announced. Then very rapidly a situation was reached where COVID-related hashtags occupied 30-70% of the HSL, however, with changing content. We give an analysis of how the hashtag topics changed during the investigated time span and conclude that there are three periods separated by February 12 and March 12. In period 1, we see strong topical correlations and clustering of hashtags; in period 2, the correlations are weakened, without clustering pattern; in period 3, we see a potential of clustering while not as strong as in period 1. We further explore the dynamics of HSL by measuring the ranking dynamics and the lifetimes of hashtags on the list. This way we can obtain information about the decay of attention, which is important for decisions about the temporal placement of governmental measures to achieve permanent awareness. Furthermore, our observations indicate abnormally higher rank diversity in the top 15 ranks on HSL due to the COVID-19 related hashtags, revealing the possibility of algorithmic intervention from the platform provider. Supplementary information: The online version contains supplementary material available at 10.1140/epjds/s13688-021-00263-0.
Article
Full-text available
Renewable energy policies have been recognized as a cornerstone in the transition toward low-emission energy systems. Media reports are an important variable in the policy-making process, interrelating politicians and the public. To understand the changes in media framing of a pioneering renewable energy support act, we collected 6,645 articles from five Germany-wide newspapers between 2000 and 2017 on the German Renewable Energy Act. We developed a structural topic model based on a change-point analysis to assess the temporal patterns of newspaper coverage. We introduced the notion of topic sentiment to elucidate the emotional content of topics. The results show that after its enactment, optimism about renewable energies dominated the media agenda. After 2012, however, the Renewable Energy Act was more associated with its costs. Such shifts in renewable energy policy framing may limit political leverage to reach ambitious climate and energy targets.
Article
What are the agenda-setting effects between the news media and its audience regarding organic food? This longitudinal text-mining study investigates the relationship between topics mentioned in news articles and reader comments published the online news outlets nytimes.com (USA) and spiegel.de (Germany) from 2007 to 2020. Topics are modeled using a neural network approach based on clustered multilingual sentence embeddings. Results show that the salience of topics in news articles significantly influences their salience in reader comments but not vice versa. Metrics for agenda distance and agenda diversity confirm the media’s agenda-setting role and additionally point out periods of time when events caused the media and public attention to diverge. The news media drives public opinion on organic food in the US and Germany by determining the discussion topics and is thus an important player in the promotion of organic food consumption to be considered by marketers and policy makers.
Article
Social media platforms are increasingly being used as important sources for obtaining various types of information in the current digital age. While an increasing number of studies have investigated the factors that influence user's news content sharing behavior, few have paid attention to the reposting latency of online news contents. Reposting latency refers to the delay of interval time between original post publish time and repost time. Reposting activity on social media is an important type of user feedback behavior to the message received. The speed of the response could reflect user's processing efficiency and capacity. This study examined the possible factors that may influence users' reposting latency of news contents on social media. In doing so, we employed a multilevel negative binomial model to examine the impacts of issue attention, temporal usage pattern, and information redundancy. Our findings show that multiple issues could distract user's attention, thus leading to the low reposting speed. We also found a distributed temporal usage pattern could help shorten reposting time, while information redundancy and information overload could increase the reposting latency of news contents on social media. The findings of this study can contribute to advancing the understanding of news consumption behavior on social media. The conclusions have the potential to help in explaining and further predicting the success of news diffusion.
Article
Background As COVID-19 spreads worldwide, an infodemic – i.e., an over-abundance of information, reliable or not – spreads across the physical and the digital worlds, triggering behavioral responses which cause public health concern. Methods We study 200 million interactions captured from Twitter worldwide during the early stage of the pandemic, from January to April 2020, to understand its socio-informational structure. Findings The network is characterized by knowledge groups, hierarchically organized in sub-groups with well-defined geo-political and ideological characteristics. Communication is mostly segregated within groups and driven by a small number of subjects: 0.1% of users account for up to 45% and 10% of activities and news shared, respectively, centralizing the information flow. Interpretation Contradicting the idea that digital social media favor active participation and co-creation of online content, our results imply that public health policy strategies to counter the effects of the infodemic must not only focus on information content, but also on the social articulation of its diffusion mechanisms, as a given community tends to be relatively impermeable to news generated by non-aligned sources.
Article
The global spread of the novel coronavirus is affected by the spread of related misinformation—the so-called COVID-19 Infodemic—that makes populations more vulnerable to the disease through resistance to mitigation efforts. Here, we analyze the prevalence and diffusion of links to low-credibility content about the pandemic across two major social media platforms, Twitter and Facebook. We characterize cross-platform similarities and differences in popular sources, diffusion patterns, influencers, coordination, and automation. Comparing the two platforms, we find divergence among the prevalence of popular low-credibility sources and suspicious videos. A minority of accounts and pages exert a strong influence on each platform. These misinformation “superspreaders” are often associated with the low-credibility sources and tend to be verified by the platforms. On both platforms, there is evidence of coordinated sharing of Infodemic content. The overt nature of this manipulation points to the need for societal-level solutions in addition to mitigation strategies within the platforms. However, we highlight limits imposed by inconsistent data-access policies on our capability to study harmful manipulations of information ecosystems.