Conference PaperPDF Available

SocialTruth Project Approach to Online Disinformation (Fake News) Detection and Mitigation

Authors:

Abstract and Figures

The extreme growth and adoption of Social Media, in combination with their poor governance and the lack of quality control over the digital content being published and shared, has led information veracity to a continuous deterioration. Current approaches entrust content verification to a single centralised authority, lack resilience towards attempts to successfully "game" verification checks, and make content verification difficult to access and use. In response, our ambition is to create an open, democratic, pluralistic and distributed ecosystem that allows easy access to various verification services (both internal and third-party), ensuring scalability and establishing trust in a completely decentralized environment. In fact, this is the ambition of the EU H2020 SocialTruth project. In this paper, we present the innovative project approach and the vision of effective online disinformation detection for various practical use-cases.
Content may be subject to copyright.
SocialTruth Project Approach to Online
Disinformation (Fake News) Detection and Mitigation
Michał Choraś
Marek Pawlicki
Rafał Kozik
UTP University of Science and
Technology
Bydgoszcz, Poland
Konstantinos Demestichas
Pavlos Kosmides
ICCS, National Technical University
of Athens
Athens, Greece
Manik Gupta
London South Bank University
London, United Kingdom
ABSTRACT
The extreme growth and adoption of Social Media, in com-
bination with their poor governance and the lack of quality
control over the digital content being published and shared,
has led information veracity to a continuous deterioration.
Current approaches entrust content verication to a sin-
gle centralised authority, lack resilience towards attempts
to successfully "game" verication checks, and make con-
tent verication dicult to access and use. In response, our
ambition is to create an open, democratic, pluralistic and
distributed ecosystem that allows easy access to various ver-
ication services (both internal and third-party), ensuring
scalability and establishing trust in a completely decentral-
ized environment. In fact, this is the ambition of the EU
H2020 SocialTruth project. In this paper, we present the in-
novative project approach and the vision of eective online
disinformation detection for various practical use-cases.
KEYWORDS
pattern recognition, security, safety, detection, fake news,
networks.
ACM Reference Format:
Michał Choraś, Marek Pawlicki, Rafał Kozik, Konstantinos Demes-
tichas, Pavlos Kosmides, and Manik Gupta. 2019. SocialTruth Project
Approach to Online Disinformation (Fake News) Detection and
Mitigation. In Proceedings of the 14th International Conference on
Availability, Reliability and Security (ARES 2019) (ARES ’19), August
26–29, 2019, Canterbury, United Kingdom. ACM, New York, NY, USA,
10 pages. https://doi.org/10.1145/3339252.3341497
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear
this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request
permissions from permissions@acm.org.
ARES ’19, August 26–29, 2019, Canterbury, United Kingdom
©2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-7164-3/19/08. . . $15.00
https://doi.org/10.1145/3339252.3341497
1 INTRODUCTION
During the last decade, there has been an unprecedented
revolution in how people interconnect and socialize. From
the early days of Facebook to today’s proliferation of Social
Media, people have been embracing this new form of social-
ization. Social networks, media and platforms are becoming
the usual way of how our societies operate for the purposes of
communication, information exchange, conducting business,
co-creation, learning and knowledge acquisition. However,
the extreme growth and adoption of Social Media, in combi-
nation with the lack of control over the digital content being
published and shared, has led the information veracity to
be in dispute. Establishing synergies with innovative infor-
mation and communication technologies (such as semantic
analysis tools, blockchains, emotional descriptors, lifelong
learning) can enhance the auditability, reliability and accu-
racy of the information being shared in Social Media, leading
to a more veritable society. The key to this situation is to
safeguard the distributed and open nature of Social Media,
strengthening pluralism and participation and mitigating
censorship.
According to a recent MIT study [
20
], false information is
spread six times faster than truth and reaches more people
than true stories, often with devastating impacts. A single
rumour spread in 2013 by a compromised Associated Press
account on Twitter resulted in an estimated $136.5 billion
drop in the S&P 500 index [
10
]. Over the past two years,
Fake and Hoaxed news have gained tremendous proportions,
particularly with Donald Trump’s presidential campaign in
the United States, as many people used the social networks
as a distribution system to spread highly inaccurate or com-
pletely erroneous stories[
14
]. As the fake news cases are
becoming countless [
11
], motives for their spreading are
often nancial or political.
In the Freedom of the Net 2017 report [
19
], Freedom House
is led to the same conclusion. The report studied 65 countries
worldwide between June 2016 and May 2017 and found out
that online manipulation and disinformation tactics played
an important role in elections in at least 18 out of 65 countries
during this period, including the United States.
ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.
Because of this high rate of false information spread, large
media organisations face increasing pressure to respond
quickly and accurately to breaking news stories. Although
established workows and editorial structures, such as the
use of copytasters, have been able to deal with the task in the
past, the challenge has severely increased, as news sources
have multiplied both in numbers and diversity in the era
of social media. In such an environment, the publishing or-
ganizations face the increasingly dicult task to identify
early a breaking news story, conrm its accuracy, provide
appropriate background, and publish it or broadcast it as
quickly as possible, thus providing high-quality journalism
[17].
Therefore in this paper we overview the approach adopted
by the EU H2020 SocialTruth project. The projects develops
innovative tools to ght with online disinformation and tests
those at specic use-cases, namely:
(1)
journalists and news editors (the solution will be tested
at ADNKRONOS, Italy)
(2)
search engines (the solution will be tested at QWANT,
France)
(3)
citizens and web users (the solution will be supported
by Infocons, Romania)
(4)
teaching material provider (the solution will be tested
by De-Agostini, Italy)
The paper is structured as follows: in Section II the challenges
and vision on how to solve the problem are presented. Sec-
tion III details the SocialTruth project means and techniques
to counter online disinformation problem. Conclusions are
given thereafter.
2 STATE OF AFFAIRS, CHALLENGES AND VISION
Facebook plans to use improved machine learning methods
to identify potential fake news articles, which can be passed
on to external fact checkers. Other attempts to deal with the
problem also exist, namely those of FakeBox [
1
], FightHoax
[
2
] and Truly Media [
3
]. However, after examining current
approaches, SocialTruth advocates that:
a)
content verication cannot be entrusted to a single
centralised authority;
b)
the aim should not be to devise the "single most perfect
verication algorithm", since even the most sophisti-
cated deep learning classication model is optimized
at the time it is created - and as a result its accuracy
deteriorates as new sources of fake news arise every
day and the writing style of fake news changes, in
order to successfully "game" and bypass verication
checks;
c)
content verication should be easy and exible to use
"as a service" by individual users and professional or-
ganisations alike.
In response to these unmet challenges, it is necessary to
take into consideration the existing approaches with their
limitations, but also strongly focus on creating an open,
democratic, pluralistic and distributed ecosystem that en-
ables easy access to various verication services, ensuring
scalability and establishing trust in a completely decentral-
ized environment. An ideal system uniquely combines sev-
eral cutting-edge ICT technologies, including social mining,
multimodal content analytics, blockchains and lifelong learn-
ing machines, enabling deep analysis, contextualization and
understanding of mass-volume digital content, collected in
real time from dierent social networks and web sources. By
bringing together and incorporating business insights not
just from the ICT domain but also from several other elds,
such as digital journalism, mass communication media and
social media, a system like that promises a exible, robust
and pluralistic digital content verication and trust estab-
lishment solution that is open and easily adoptable by the
concerned stakeholders, service providers and user commu-
nities alike.
A. Fake news data analysis strategies and sources
The analyzed news can contain variety of data types such as
unstructured text, images/videos, references to other sources,
etc. One of the methods for dealing with textual informa-
tion is proposed in [
15
]. Therefore, one should consider the
following principles to address fake news detection:
Indexing and gathering of information published in
the Internet in order to cross-reference current news
with previous ones (e.g. to detect duplicates or pic-
tures/photos used in dierent context). We consider:
State-of-the-art image processing techniques for con-
tent analysis and images comparison
State-of-the-art text analysis techniques (e.g. docu-
ment term matrices, etc.)
Reputation scoring to identify reliability of person
and/or information source providing the news. We
propose to consider reputation evaluation of:
Webpage providing/forwarding news
Person publishing information/news via social net-
work, etc.
Reputation of the content
Comparison, of the similar news published by dierent
information sources
Machine learning techniques for content feature anal-
ysis
Analysis of semantics by means of applied ontologies
The problem of machine learning in such a complex
and dynamic environment as fake news detection, re-
quire eective techniques and the ecient analysis of
heterogeneous data sources as presented in Fig. 1. [
8
]
SocialTruth Project Approach to Online Disinformation (Fake News) Detection and MitigationARES ’19, August 26–29, 2019, Canterbury, United Kingdom
Figure 1: Online Disinformation and Fake News Detection data analysis strategies and sources
The advantages of an open, democratic, pluralistic and dis-
tributed ecosystem are straightforward:
No "one size ts all" solution and no vendor lock-in:
open ecosystem that provides access to congurable
combinations of content analytics and verication ser-
vices (with support for text, image and video content)
through standard Application Programming Interfaces
(APIs).
Distributed trust and reputation establishment pow-
ered by blockchain technology, strengthening auditabil-
ity and revealing information cascades.
Integration of Lifelong Learning Machines (LLM) that
constantly accumulate experience and learn new paradigms
of fake news.
Digital Companion for convenient everyday access of
individual users to verication services from within
their browsers. In this context, ICT engineers, data sci-
entists and blockchain experts need to work together
with media specialists and user communities, in an
eective and constructive manner to co-create such
an open and innovative content verication solution.
This will help:
Individual users to verify the validity of Social Media
content and stop the spreading of false information.
Besides this, individual users will be able to check
and verify the original author/source of the content.
Media organisations, story writers, content authors
and journalists to boost their investigative and cre-
ative capabilities by enabling them to cross-check
and combine various multimedia information sources,
retrieve and use relevant and veriable background
information, and maintain a stream of real-time up-
dates.
Search engines, Social media platforms and Online
advertising networks to improve information verac-
ity and contribute into a healthier and more sustain-
able web and social media ecosystem.
A system performing these functions needs to full the
following set of objectives:
Develop a distributed content verication solution
with a complex-free Digital Companion for online cred-
ibility verication of digital content found on web and
social media.
Compose a digital content analytics and verication
ecosystem with support for text, image and video, open
to third-party service providers.
Leverage blockchain technologies to establish distributed
reputation and trust in digital content sharing.
Deploy a distributed and thoroughly validated archi-
tecture (TRL-7) for the delivery of the credibility eval-
uation services.
ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.
Introduce innovative business models for news, web
and social media stakeholders, and provide support to
the EU strategic agendas and policies.
The main ambition of the project is on providing multiple,
extensible and reusable capabilities for verifying the credi-
bility of digital content and detecting hoaxes and chains of
news-scams spread in social media. A distributed architec-
ture would allow to scan and process vast amounts of digital
contents from social media and the Web to identify fake news
and to provide individuals and professionals with a certain
degree of condence about their accuracy and credibility.
One of the crucial components is the Digital Companion
- an open source browser plugin that enables individuals
to invoke one or more verication services and customise
their use. When multiple verication services are to be com-
bined, an open design software engine is supposed to assist
in carrying out the meta-verication process. Along with its
Digital Companion it increases the credibility of the content
shared in social media and improves the reliability of the
search engines, online advertising and click stream analytics
by limiting click percentages in hoaxes and falsied stories.
The project enables third-party service providers to plug
into the ecosystem and make available their own content an-
alytics and verication services through standard APIs. The
system exploits a distributed architecture and the blockchain
technology in order to provide a decentralised mechanism
for content verication. This decentralised approach will
radically facilitate the assessment of the shared content itself
and will pave the way for a wider adoption of decentralised
and community-based approaches in the next generation
Social Media platforms.
This foundation enhances the role of prosumers, commu-
nities and small businesses, mastering technological barriers,
introducing innovative and participatory forms of quality
journalism, and using various data in a secure manner. The
focus is on a series of advanced technologies to signicantly
enhance the production, management, use and reuse of digi-
tal content in Social Media, including social mining, lifelong
learning machines, blockchains, multi-level content analytics
and multimedia verication services. This enables the ex-
ploitation of the dierent data sources in order to introduce
a novel and eective content verication mechanism. Such
mechanism promote high-quality journalism, enhancing the
role of prosumers, communities and small businesses in the
eld of media.
The use of novel social mining technologies, such as emo-
tional content descriptors, advanced machine learning tech-
niques, multimedia verication algorithms and blockchain
technologies enables the emergence of a highly-ecient
and highly-distributed solution for identifying falsied infor-
mation and discovering information cascades across Social
Media. The scope of the project is to provide a distributed
solution in order to avoid the current situation where the
information is collected in a centralized approach to big data
companies outside Europe. Such companies are harvesting
individual users’ data in order to provide personalised or
easily "clickable" content to the users, thus increasing their
revenue when the users click on the provided content.
As to the development of intermediary-free solutions ad-
dressing information veracity for Social Media - the solutions
contribute to the understanding of information cascades, the
spreading of information and the identication of informa-
tion sources, the openness of algorithms and users’ access to
and control of their personal data (such as proles, images,
videos, biometrical, geolocation data and local data).
Personal data is better protected by limiting clicks on dubi-
ous digital content and suspicious web sources. Furthermore,
the solution is based on open algorithms (namely, those of
the expert meta-verication engines, lifelong learning mod-
els, as well as part of the verication services), in order for
the community of developers to be able to evolve them to
tackle the future evolution of fake news industries. The en-
tire ecosystem is open to third-party verication service
providers through open interfaces.
C. Design Principles
The distributed architecture we propose follows an open and
modular design, embodying the following core capabilities:
Digital Companion: This is an easy-to-use browser
plugin that allows a non-professional user to invoke
a metaverication process upon some form of digi-
tal content (e.g. article), passing its URI as input to a
meta-verication engine. In case of nonprofessional
use, the Digital Companion can be used by the author
of the digital article, by a reproducer (who shares the
article in Social Media) or even by a simple reader of
the article, who wishes to get an estimation of the cred-
ibility of the content before or after reading it. In case
of professional use, the Digital Companion is a web
front-end for medium/large organisations (e.g. news
agencies, search engines, etc.) that allows several calls
per day to the APIs of the meta-verication engine(s).
SocialTruth follows a user-centred design approach
for the Digital Companion.
Distributed Verication Services: This is a set of het-
erogeneous verication services each one providing a
specic type of content analytics (e.g. for text, image,
video) or verication-relevant functionality (e.g. emo-
tional descriptors, social inuence mapping. Each ser-
vice can be deployed at a dierent hosting facility (e.g.
dierent servers or clouds), hence there is no imposed
SocialTruth Project Approach to Online Disinformation (Fake News) Detection and MitigationARES ’19, August 26–29, 2019, Canterbury, United Kingdom
centralization. All of them use the same standard in-
terfaces to allow them to be easily accessible, reusable
and interchangeable. The registrar of service providers
and the services they oer is stored and maintained in
the blockchain.
Expert Meta-Verication Engine(s): Much like a meta-
search engine combines and presents results from mul-
tiple search engines, the expert metaverication en-
gine combines verication results from various sources
to compute a meta-score that reects the credibility of
the digital content under consideration. It follows an
open design, open algorithms and an expert-systems
approach. It uses open algorithms while most of its
settings and weights (e.g. which verication services
to prefer or to avoid, with which priority, etc.) can
optionally be congured through its standard web-
service interfaces. An example of a fake-news verica-
tion engine based on a two-level convolutional neural
network with reasoning based on collective user intel-
ligence can be found in [16]
Blockchain: The blockchain is used as a distributed
system of record with respect to digital content veri-
cation history. Since the complex web and social media
landscape is characterised by several competing con-
tent creators and distributors, each with their own mo-
tives, interests, strategies and practices, the blockchain
is an ideal tool to establish reputation and trust with-
out the need of a central authority or intermediary
(thus also avoiding to centralise even more regulatory
power to the US Internet giants, such as Facebook or
Google). Hence, a public distributed ledger provides an
auditable and immutable trail of verication actions
and reputation scores. The blockchain stores article
identication information, article descriptors (e.g. hash
codes for digital content integrity), author identica-
tion information, verication and meta-verication
scores, as well as identication information for the
verication services that have been used to calculate
them. It also holds the registrar of verication service
providers and the services they oer.
D. Meta-Verification process
For our purposes, the digital content is dened as text, photo
and video, produced, uploaded and/or edited by individuals,
journalists, reporters, writers, bloggers etc., in news sites,
social networks and web channels. The concept is to break
down the digital content into its constituent elements and
subsequently call individual verication services, ultimately
combining their results.
The verication process will scan the content shared through
Social Media and will identify its sources, metadata, media
elements, writing style, as well as how it has spread across
the Internet. It will attempt to verify these elements and will
ag the content as fake or not with a degree of certainty on
its accuracy, providing relevant background if applicable, in
a process that is partially similar to [
13
] In this light, the
verication process encompasses the following capabilities:
Seamless data crawling and social mining from mul-
tiple, heterogeneous external web sources and media.
Streamlined multi-layer semantic analysis, forgery de-
tection of multimedia content and indexing of corre-
sponding sources.
Deep understanding and analysis of authors’ writing
styles and content semantics based on multidimen-
sional contextual reasoning and inference. The proce-
dure will integrate these capabilities through an intelli-
gent meta-verication algorithm and make them easily
usable through a exible end-user tool (Digital Com-
panion), designed via an ecient user-centred method-
ology. By integrating and interfacing with advanced
services in digital content analysis, it promises to open
new applications and opportunities to wider audiences
and stakeholders, including but not limited to digital
content production professionals, aggregation and sup-
ply professionals, journalists and editors, online ad-
vertising, search engine and e-commerce companies,
content prosumers, social media sharing networks, etc.
3 SOCIALTRUTH APPROACH, TECHNIQUES AND
METHODS FOR ONLINE DISINFORMATION
DETECTION
This section provides insights on specic aspects of content
verication employed on the basis of the general architecture
and the meta-verication concept presented earlier.
A. Understanding Author and Writing Style
The semantic technologies are about understanding a story’s
author and writing style. This helps in the credibility evalua-
tion process. To address this, the solution uses as a starting
point and extends the semantic technology of an existing
tool developed by Expert System (ESF), namely COGITO,
which facilitates deep understanding of language.
The solution extends this capability through advanced and
innovative features like "writeprint" or stylometric analysis,
which makes it possible to analyse the style of writing behind
each story acquired from the web, social networks and any
other textual source, with the scope to:
Understand if an individual who is publishing a story
has a style of writing that can be related and mapped
to another style from a historical database.
Understand if contents published on the web using
dierent accounts or nicknames are actually related
ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.
to the same person (i.e. clustering of dierent virtual
IDs).
This stylometric analysis is based on a series of techni-
cal parameters, such as usage of short words, conjunctions,
vocabulary richness and complexity, lexical dierentiation,
etc., which are strictly related to specic human factors and
behaviour, thus eectively dening a sort of "ngerprint"
(writeprint) of the writer.
B. Semantic analysis and Clustering of similar news
Using the semantic analysis capability of COGITO as a start-
ing point, the goal is to identify the primary and secondary
subjects of articles and stories, as well as other elements rel-
evant for classication and entity extraction (such as People,
Organizations and Places). Thus, the solution aims at being
able to classify the text according to a detailed, customizable
tree of categories, enabling modication according to the
end-user’s requirements. These capabilities allow to:
(1)
Make the information management automatic, more
ecient and independent of subjective criteria.
(2)
Immediately identify useful information and reduce
search time, by simplifying access to content and en-
abling search by subject/topic.
(3)
Cluster information according to customizable tax-
onomies and categories.
Clustering using categorisation and entities extraction is one
point. Extracting more semantic tags, such as relations, verbs
and events, can lead the proposed solution to nd correla-
tions based on dierent analogy or similarity criteria. This
is valuable, since, for example, the following two sentences
are dierent from the statistical point of view, but similar
from the semantics point of view: "John is driving his car to
home" versus "John is reaching home by his vehicle".
C. Sentiment/Emotional Analysis
The solution needs to have the capability to extract a full set
of emotions from textual content, not just using a standard
positive/negative/neutral evaluation approach (sentiment)
but also providing the capability to provide a ner granu-
larity of the dierent kinds of feelings (i.e. stress, fear, trust,
anger, etc.). When applied to sources like social networks
and the web, this feature can be used, amongst others, as
a bias estimator (e.g. detecting users that may have bad in-
tention towards an event, a person, an infrastructure, etc.)
or for assessing the public opinion about a specic topic of
interest.
D. Natural Language Processing baseline
For the implementation of the aforementioned advanced Nat-
ural Language Processing (NLP) and semantic functionalities,
the solution uses the semantic engine COGITO. This allows it
to provide advanced semantic solutions, including semantic
search, text analytics, ontology and taxonomy management,
automatic categorization, automated self-help solutions, ex-
traction of unstructured information and natural language
processing. COGITO is powered by two main components:
The Disambiguator, a multi-level linguistic engine able
to disambiguate the meaning of a word by recognising
the context of occurrence of that word. Disambigua-
tion can be described as the process of resolving con-
icts that arise when a term can express more than
one meaning, leading to dierent interpretations of the
same string of text. The ultimate aim of such a process
it to associate each term with the author’s intended
use of the term itself.
The Sensigrafo, a semantic network that represents
and stores the dierent semantic relations between
the words of a language. Unlike traditional dictionar-
ies where words are listed in alphabetical order, words
contained in this database are arranged in groups of
items expressing identical or similar meaning. These
groups are connected to each other by millions of logi-
cal and language-related links.
The COGITO semantic engine, starts from this consoli-
dated architecture, but will require further development to
customize the semantic engine in the domain of disinfor-
mation and fake news, within the scope, in terms of tax-
onomies/categories denition, entities to be extracted, as
well as emotions and writeprint to be identied from text.
In order to apply this dedicated tuning to the semantic net-
work, a manual or semiautomatic approach can be used. The
second approach is preferable and is related to the use of
machine learning techniques to create new ontologies au-
tomatically, which can be validated by human beings. This
is a unique technique that can be used also in the opposite
direction, so that the output of the semantic analysis serves
as an input to the machine learning algorithms; these bidi-
rectional processes allow the creation of a truly high-quality
hybrid engine.
E. Image/Photo/Video Verification
Nowadays, photo and video content is an essential part of
media coverage, often attracting more the consumer’s at-
tention than the actual information provided as text. Thus,
image and video data can be used as powerful tools to cre-
ate untruthful information or to unreasonably strengthen a
message provided to a recipient. For example, fake images
illustrating the textual content can be considered as common
elements of fake news. Fake photos can be linked to both
factual (true) news as well as totally fake information shared
online. Fabricated photos might be posted due to propaganda
reasons, but also for strengthening the intended message and
SocialTruth Project Approach to Online Disinformation (Fake News) Detection and MitigationARES ’19, August 26–29, 2019, Canterbury, United Kingdom
for inducing emotions. Often, publishers highlight the tex-
tual content of news by means of a random photo stored
in the publisher’s archives that is contextually related to an
ongoing situation, however not taken at the time and place
of the presented or analysed event. The other possible indi-
cators of fake photos are: excessively drastic and unrealistic
scenes, emotional messages and time elapsed between the
described event and photo publication (sometimes seconds).
Sometimes, poor quality of the image/video content in rela-
tion to the context (e.g. excessively low or excessively high
quality) can indicate that the image/video could not be cap-
tured during the given situation. Hence, image verication
becomes crucial for fake news analysis and can be used to de-
termine whether a given image is similar or not to previously
posted images. To verify the trustworthiness of photos, one
can employ and combine approaches such as the following:
Person re-identication - This relates to the problem
of identifying people across images that have been
taken using dierent cameras or across time using
a single camera. Due to the variations of viewpoint,
pose, illumination, expression, aging, cosmetics, and
occlusion - a given individual may appear consider-
ably dierent across dierent camera views. Hence,
re-identication is an important capability for fake im-
age analysis. The problem of deciding if two patches
correspond to each other or not is quite a challenging
and dicult problem, because large variations in view-
point and lighting across dierent views can cause two
images of the same person to look quite dierent and
can cause images of dierent people to look very sim-
ilar. Typically, methods for re-identication include
two components: a method for extracting invariant
and discriminative features from input images, and a
similarity metric for comparing those features across
images. Research on re-identication will focus either
on nding an improved set of features, nding an im-
proved similarity metric for comparing features, or a
combination of both.
Contextual Image analysis - Inconsistencies in ele-
ments such as actual weather or season, architecture
of depicted place, time indicators in which the photo
was taken (e.g. clocks) can be helpful to verify that
the presented photo was taken at another place or at
another time than it pretends. Therefore, techniques
like image geo localization that use recognition tech-
niques from computer vision can be used to estimate
the location (at city, region, or global scale) of ordinary
ground level photographs and can be used to deter-
mine the actual context within which the image was
taken. Additionally, GPS information and metadata
annotations can be useful in geolocating images.
Verication of the publisher - Examining the history
of posts/news can be helpful in anomaly detection
in publication history and possible identication of
an invalid publisher, since a fake piece of news or
photo is often the rst piece of content published by
a malevolent user, or even his/her rst online activity
ever.
Analysis of image features - Services such as
http://fotoforensics.com allow for online analysis of
modications done to a given photo, using Error Level
Analysis (ELA) of submitted jpeg les. In general, these
algorithms detect the modied regions in a photo, due
to the fact that these regions generate more errors
in jpeg (lossy) compression during re-savings of the
photo.
Reverse image searching - Finding similar or slightly
modied photos in relation to the searched photo,
including photos with changed ratio and resolution,
colours, horizontally ipped, trimmed, etc. Reverse im-
age search engines also allow for ordering the search
results depending on the time of online publication,
therefore it is possible to nd that the photo illustrat-
ing the current events is shared on the web already
for a number of months or years. The most popular
online tools used for reverse image searching include
google.images.com and tineye.com. On the other hand,
there is a limited number of online tools available for
analysis of video similarities, and thus for detection
of modied, fake video content. One such example
is the YouTube Data Viewer developed by Amnesty
International [
4
], allowing for detection of older ver-
sions of the same video. It can be used to determine
the original video when multiple copies of the same
video from the same date are available. Given an ar-
bitrary target object marked with a bounding box at
the beginning of a video, the goal of visual tracking
is to localize this target in subsequent video frames.
An ideal tracker should be fast, accurate and robust
to the appearance change of the target and this is a
challenging problem that needs to be addressed as part
of the video verication innovation activities.
4 SOCIALTRUTH - DETECTION SYSTEM DESIGN
A. Distributed reputation and trust through
Blockchain technologies
Blockchain is a security architecture allowing dierent users
to share sensitive data in a secure and decentralised manner,
without a central authority. This architecture will be used
to share the reputation or credibility (verication scores) of
digital content between the dierent end-users in a trusted
way. These end-users, interested in the details behind this
ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.
scoring, will be able to read the verication history and to
decide if they trust or not the specic content. The solution
bases its blockchain algorithm on a public blockchain. With
that, it is be able to use directly a complete, functional infras-
tructure, but with a specically developed part on top of it.
The solution has to manage simultaneously remuneration
and data transactions. These data transactions will integrate
data and metadata (like Article ID, Author ID, Article De-
scriptors and dierent verication results) with dierent
links between them. To that end, a blockchain implementa-
tion with smart contract or chaincode is mandatory (such
as Ethereum or Hyperledger). One also has to take into ac-
count privacy (including General Data Protection Regulation
- GDPR compliance) and condentiality requirements (for
example to cypher any sensitive data). Dierent solutions
exist but need to be adapted for this kind of data (like the
Zk-snarks solution used for the ZCash blockchain). One can
complement the solution with an access control mechanism,
also based on blockchain. This mechanism will be able to
manage the access control to the data. For that, one can use
and extend the innovative Access Control - BlockChain (AC-
BC), based on the Ethereum Blockchain. Another important
aspect is the auditability, i.e. to use the data and the logs of
access control stored publicly in the blockchain. These data
will not only enable inference of abnormal behaviors, but
could also be used by one or several Expert Meta-verication
Engines to improve the data processing performed on the
multimedia data.
In conclusion, the proposed innovation strikes a balance
between the dierent blockchain aspects (public vs private,
Proof-of-Work consensus vs Proof-of-Concept consensus,
smart contract vs chaincode) to reach a commercially viable
solution.
B. Security and privacy by design
The design of the solution architecture is performed in ac-
cordance to the Security-by-Design and Privacy-by- Design
principles to:
implement basic security controls at the design phase,
so as to minimise the number of possible discovered
vulnerabilities in later phases, decreasing their poten-
tial impact on the prototype system’s security, and
ensure the privacy of users whose data is analysed
using the system (open data available on the web, social
media data, end-user data).
For the purposes of system security assurance, important
Security-by-Design principles dened by OWASP15 are im-
plemented, by means of establishing appropriate security
defaults, minimising attack surface area by reducing the
number of authorised users for any given functionality, re-
alising the so-called "least privilege" principle by assigning
only minimum amount of user rights to operate the services
and applying separation of duties (considering various user
roles with various levels of trust). Also, the use of third-party
components will be thoroughly analysed in terms of secu-
rity and its impact on the overall level of protection will be
evaluated.
In the design phase, applicable Privacy-by-Design princi-
ples [
5
] are also implemented, including privacy preserving
means as default system features, embedded in its design,
end-to-end data protection and privacy preservation man-
agement through the entire data lifecycle, and considering
privacy and data protection without an impact on the plat-
form functionalities. From the technical viewpoint, encryp-
tion by default will be considered, in order to mitigate the
security concerns associated with the unauthorised access
to the data, integrating it into data workows in an auto-
matic and seamless manner, whereas mechanisms for secure
destruction or removal of stored data will be established.
The implementation of appropriate Privacy Enhancing Tech-
nologies (PETs) are also considered, such as: use of privacy
keys, digital signatures, secures authentication and adop-
tion of secure communication protocols. For the purposes
of disassociation of the data / generated content that might
allow the detection of the identity of the data owner, appli-
cable techniques will be employed, such as anonymisation,
pseudonymisation and data clustering to prevent the correla-
tion of Personally Identiable Information (PII) for purposes
other than detection of fake news considered in the project.
Also, the data gathered, stored and analysed (e.g. during use
cases execution) will be used only for the predened pur-
poses according to the dened goals of use cases. On the
other hand, ethics management in the project will ensure
that the stored or analysed data will not be subject to any
commercial use and will not be shared publicly without the
consent of the data owner.
C. Web and social media data crawling
Within the project, work in the area of web and social media
data crawling uses the relevant software tools and services.
For its search engine, hardware and software architecture
hosted in its Data Centers located in Paris could be employed.
This architecture allows to process every day over 70 million
queries, to crawl 0.5 billion Web pages and run a 16 billion
Web pages index. The architecture is scalable and can grow
rapidly.
D. Verification in social media through deep learning
With the increasing popularity of the various social media
platforms, detecting and dealing with misinformation and
their creators becomes a critical problem. Malicious users
can be tracked down and their online linguistic expressions
can be used to infer their personal attributes and diverse
SocialTruth Project Approach to Online Disinformation (Fake News) Detection and MitigationARES ’19, August 26–29, 2019, Canterbury, United Kingdom
geographical, sociological and political demographics. Deep
learning models can be used to construct behavioural repre-
sentations for user personality identication and both private
traits/attributes can be predicted from digital records of hu-
man behaviour. Moreover, social network connections and
ow observations can be examined to detect misinformation
and false content can be found out using geosemantic infor-
mation. Similarly, the problem of detecting bots has gained
tremendous attention, since bots can be used to automate so-
cial media accounts and post fake content hampering the data
credibility. For example, bots can be used to sway political
elections by distorting the online discourse, manipulate the
stock market, or push antivaccine conspiracy theories that
might cause health epidemics. There are techniques to detect
bots at the account level, by processing large amount of so-
cial media posts, and leveraging information from network
graph structure, temporal dynamics, sentiment analysis, etc.
A repository of fake news with in-depth analysis of this kind
of data can be found in [
18
] Deep neural networks based on
long short-term memory (LSTM) architecture can be used
to exploit both content and metadata to detect bots at the
tweet level.
E. Lifelong Learning Intelligent Systems
One of the current challenges in machine learning is to de-
velop intelligent systems that are able to learn consecutive
tasks and to transfer knowledge from previously learnt basis
to learn new tasks. Such capability is termed as Lifelong
Learning [
12
] and tries to mimic "human learning". In prin-
ciple, humans are capable of accumulating knowledge from
the past and use it to solve future problems. Currently, the
existing classical machine learning algorithms are not able
to achieve that. Our ambition is to adapt Lifelong Machine
Learning techniques in order to gradually improve detection
models and incrementally update the knowledge, so that the
system can learn faster (reusing historical knowledge) e.g. by
transferring the knowledge (e.g. from dierent news topics
and tasks) [
9
]. Infact, there is an existing line of research [
6
]
showing promising results (also in the area of text mining).
Moreover, there are also competitions (such as DARPA L2M
- Lifelong Learning Machines) aiming at fostering research
in these directions. Therefore, the project’s ambition is to
use the Lifelong Learning paradigm to address challenges
in fake news detection. In particular, hybrid classier sys-
tems (HCS) and ensemble learning will be considered, as
these have already been successfully applied to solve com-
plex machine learning problems in various other domains.
In fact, the multi-classier paradigm is akin to some of the
algorithms proposed for Lifelong Learning that build and
maintain a kind of reservoir of latent models/modules that
may be useful or reused in the future.
5 PROTOTYPE
In [
7
] a prototype of a solution geared towards detecting
forged images is presented. The proposed method revolves
around image assessment, with the supposition that if the
image is forged, the whole piece might be fake. Three fac-
tors are taken into account - ELA analysis, copycat search
and meta data analysis. A publicly available dataset of 800
original and over 900 forged images was used. The detec-
tion of pasted elements is performed through dividing the
image into overlapping blocks and conducting the SURF and
FLANN agorithms. Meta-data allows to spot if the image
was modied with any image processing tools. A fusion of
decisions in our method allows to spot image forgeries with
64% accuracy.
6 CONCLUSION
The "fake news" phenomenon is a serious issue in modern
media and communication, with respect to false information
spreading within the society about current events and in-
cidents. The classication of fake news is challenging due
to its vague denition and tensions related to freedom of
speech.
Therefore the EU H2020 SocialTruth project tries to tackle
this important problem facing the modern society. In this
paper we have presented the project approach, design prin-
ciples and the used techniques. Our idea is that the selected
hands-on trials will cover a wide range of requirements and
operational conditions, allowing for a multi-faceted system
evaluation from the perspective of various end-users as enu-
merated in the paper.
ACKNOWLEDGMENTS
This work is funded under SocialTruth project, which has
received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement
No. 825477
REFERENCES
[1]
[n. d.]. FakeBox project homepage. https://machinebox.io/docs/fakebox
accessed= 24-Mar-2019.
[2]
[n. d.]. Fighthoax project website. http://ghthoax.com Accessed=
24-Mar-2019.
[3]
[n. d.]. Truly Media tackles fake news ahead
of German elections. https://www.truly.media/
truly-media- tackles-fake-news-ahead- of-german-elections/
Accessed= 24-Mar-2019.
[4]
[n. d.]. Youtube DataViewer. http://www.amnestyusa.org/sites/default/
customscripts/citizenevidence/ Accessed= 24-Mar-2019.
[5]
Ann Cavoukian. 2012. Operationalizing Privacy by Design: A Guide to
Implementing Strong Privacy Practices. http://www.ontla.on.ca/library/
repository/mon/26012/320221.pdf.
[6]
Zhiyuan Chen, Nianzu Ma, and Bing Liu. 2018. Lifelong Learning for
Sentiment Classication. (01 2018).
ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.
[7]
Michał Choraś, Agata Giełczyk, Konstantinos Demestichas, Damian
Puchalski, and Rafał Kozik. 2018. Pattern Recognition Solutions for Fake
News Detection: 17th International Conference, CISIM 2018, Olomouc,
Czech Republic, September 27-29, 2018, Proceedings. 130–139. https:
//doi.org/10.1007/978-3- 319-99954- 8_12
[8]
Michal Choras, Agata Gielczyk, Konstantinos P. Demestichas, Damian
Puchalski, and Rafal Kozik. 2018. Pattern Recognition Solutions for
Fake News Detection. In Computer Information Systems and Industrial
Management - 17th International Conference, CISIM 2018, Olomouc,
Czech Republic, September 27-29, 2018, Proceedings. 130–139. https:
//doi.org/10.1007/978-3- 319-99954- 8_12
[9]
Michał Choraś, Rafał Kozik, Rafal Renk, and Witold Hołubowicz. 2017.
The Concept of Applying Lifelong Learning Paradigm to Cybersecurity.
663–671. https://doi.org/10.1007/978-3- 319-63315- 2_58
[10]
Patti Domm. [n. d.]. False Rumor of Explosion at White House Causes
Stocks to Briey Plunge; AP Conrms Its Twitter Feed Was Hacked.
https://www.cnbc.com/id/100646197.
[11]
Chloe Farand. [n. d.]. French social media awash with fake news stories
from sources ’exposed to Russian inuence’ ahead of presidential election.
https://tinyurl.com/y5gdzvz4.
[12]
Patxi Galán-García, José Gaviria de la Puerta, Carlos Laorden, Igor
Santos, and Pablo García Bringas. 2013. Supervised machine learning
for the detection of troll proles in twitter social network: application
to a real case of cyberbullying. Logic Journal of the IGPL 24 (2013),
42–53.
[13]
Mykhailo Granik and Volodymyr Mesyura. 2017. Fake news detection
using naive Bayes classier. 2017 IEEE First Ukraine Conference on
Electrical and Computer Engineering (UKRCON) (2017), 900–903.
[14]
MATHEW INGRAM. [n. d.]. Google’s Fake News Problem Could
Be Worse Than on Facebook. http://fortune.com/2017/03/06/
google-facebook- fake-news/.
[15]
Zhang Jiawei, Limeng Cui, Yanjie Fu, and Fisher B. Gouza. 2018. Fake
News Detection with Deep Diusive Network Model.
[16]
Feng Qian, Chengyue Gong, Karishma Sharma, and Yan Liu. 2018.
Neural User Response Generator: Fake News Detection with Collective
User Intelligence. https://doi.org/10.24963/ijcai.2018/533
[17]
Kevin Rawlinson. [n. d.]. How newsroom pressure is letting fake stories
on to the web. https://www.theguardian.com/media/2016/apr/17/
fake-news-storiesclicks-fact- checking.
[18]
Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and
Huan Liu. 2018. FakeNewsNet: A Data Repository with News Content,
Social Context and Dynamic Information for Studying Fake News on
Social Media.
[19]
Olivia Solon. [n. d.]. Tim Berners-Lee: we must regulate tech rms to
prevent ’weaponised’ web.
[20]
Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true
and false news online. Science 359, 6380 (2018), 1146–1151. https:
//doi.org/10.1126/science.aap9559
... SocialTruth project [20] integrates its content verification services which provide a specific type of content analytics (e.g. for text, image, video) and verification-relevant functionality (e.g. emotional descriptors, social influence mapping) with various platforms such as web search, journalist tools and a browser add-on. ...
... Recently, a project called WeVerify [24] is developing cross-modal disinformation detection and content verification tools. Similar to SocialTruth [20], WeVerify allows only professionals (journalists/Fact checkers) to verify and determine the reliability of the content without providing the users details about professionals and their reputation. So a leap of faith is required as users cannot check the level of trustworthiness of the people involved. ...
Chapter
Full-text available
Disinformation has become a worrisome phenomenon at a global scale, spreading rapidly thanks to the growth of social media and frequently causing serious harm. For instance, it can perplex and manipulate users, fuel scepticism on crucial issues such as climate change, jeopardize a variety of human rights, such as the right to free and fair elections, the right to health, to non-discrimination, etc.Among the most used tools and techniques to spread disinformation are social bots, deep-fakes, and impersonation of authoritative media, people, or governments through false social media accounts. To deal with these issues, in this paper, we suggest TruthSeekers Chain, a platform which add a layer on top of the existing social media networks where I) the feed is augmented with new functionalities and reliable information retrieved from a blockchain II) a bot screening mechanism is used to allow only human generated content and engagement to be posted, III) the platform is open to integration of 3rd-party content verification tools helping the user to identify the manipulated or tampered content and IV) a self sovereign identity model is used to ensure accountability and to contribute building a reliable portable reputation system.KeywordsFake newsSocial MediaReputation systemSSINFTCAPTCHABlockchain
... The platform requires the consent of the user to have the posts in their social media analyzed for trustworthiness [44,43]. SOMA members are also given access to SocialTruth, which provides individuals with access to "fake news" detection based on AI technology and content verification trust and integrity based on blockchain technology [45,46]. SocialTruth integrates its content verification with various platforms such as web search, journalist tools, content development, and a browser add-on [45,46]. ...
... SOMA members are also given access to SocialTruth, which provides individuals with access to "fake news" detection based on AI technology and content verification trust and integrity based on blockchain technology [45,46]. SocialTruth integrates its content verification with various platforms such as web search, journalist tools, content development, and a browser add-on [45,46]. WeVerify aims to address content verification challenges through participatory verification, open source algorithms, human-in-the-loop machine learning, and visualizations [47,48]. ...
Article
There is an abundance of misinformation, disinformation, and “fake news” related to COVID-19, leading the director-general of the World Health Organization to term this an ‘infodemic’. Given the high volume of COVID-19 content on the Internet, many find it difficult to evaluate veracity. Vulnerable and marginalized groups are being misinformed and subject to high levels of stress. Riots and panic buying have also taken place due to “fake news”. However, individual research-led websites can make a major difference in terms of providing accurate information. For example, the Johns Hopkins Coronavirus Resource Center website has over 81 million entries linked to it on Google. With the outbreak of COVID-19 and the knowledge that deceptive news has the potential to measurably affect the beliefs of the public, new strategies are needed to prevent the spread of misinformation. This study seeks to make a timely intervention to the information landscape through a COVID-19 “fake news”, misinformation, and disinformation website. In this article, we introduce CoVerifi, a web application which combines both the power of machine learning and the power of human feedback to assess the credibility of news. By allowing users the ability to “vote” on news content, the CoVerifi platform will allow us to release labelled data as open source, which will enable further research on preventing the spread of COVID-19-related misinformation. We discuss the development of CoVerifi and the potential utility of deploying the system at scale for combating the COVID-19 “infodemic”.
... To date, many different works on the topic have been developed-research has focused mainly on linguistic analyses through natural language processing techniques [93], or network and behavior analysis [94]. Other proposed solutions commonly applied include direct application of ML techniques [95,96]. ...
Article
The development of data-driven Artificial Intelligence systems has seen successful application in diverse domains related to social platforms; however, many of these systems cannot explain the rationale behind their decisions. This is a major drawback, especially in critical domains such as those related to cybersecurity, of which malicious behavior on social platforms is a clear example. In light of this problem, in this paper we make several contributions: (i) a proposal of desiderata for the explanation of outputs generated by AI-based cybersecurity systems; (ii) a review of approaches in the literature on Explainable AI (XAI) under the lens of both our desiderata and further dimensions that are typically used for examining XAI approaches; (iii) the Hybrid Explainable and Interpretable Cybersecurity (HEIC) application framework that can serve as a roadmap for guiding R&D efforts towards XAI-based socio-technical systems; (iv) an example instantiation of the proposed framework in a news recommendation setting, where a portion of news articles are assumed to be fake news; and (v) exploration of various types of explanations that can help different kinds of users to identify real vs. fake news in social platform settings.
... For example, a project called SocialTruth proposed a decentralized model to combat with fake news [2]. Key idea is, social media like Facebook, Twitter implemented their own fake news strategy using machine language technology. ...
Conference Paper
Now a days, people spend most of their times in social media. Due to availability of news and also for the free scope of sharing, most of the time rumors are being extensive in a short period of time. Detecting and preventing rumors and false information remains a significant challenge for social network. The introduction of blockchain technology has paved the way for the development of decentralized apps in order to address this issue. In this technology any information is recorded permanently. We will explore a strategy to eliminate bogus news on social media by utilizing the benefits of peer-to-peer network ideas. By issuing non-fungible token content rating we can detect and ensure appropriate news. The findings revealed that the suggested technique has a satisfactory performance and efficiency in recognizing rumors and preventing their spread.
... Still, disinformation events in recent years have transformed this discipline and focused it on false content propagated mainly through the Internet (Choi & Haigh, 2019). This raises the ambition to create a democratic, open, and pluralistic information verification system that is decentralized, as Choras et al. (2019) stated. According to these authors, this ecosystem should combine the latest technological advances -such as data mining and machine learning-, index information, cross-reference content, consider websites' the publisher websites, compare publications, other publications, and use semantic analysis to detect fake news. ...
Chapter
Fact-checkers have grown recently, facing the decline of journalism and the acceleration of disinformation flows on the internet. Due to the recent scholarly attention to these journalistic outlets, some authors have pointed to diverse critics such as the political bias and the low impact of fact-checking initiatives. In line with the research approaching the weaponization of disinformation in politics, this chapter reflects on the instrumentalization of verifying practices as a fact to consider when studying fact-checking. The investigation applies a combined methodology to compare Bendita and Maldita initiatives. While the latter is internationally recognized as an entity of fact-checking, the second one arises as an imitation of it and lacks recognition and scholarly attention. Conclusions suggest that fact-checking implies more complex activities than refuting specific facts, while alt-right positions can instrumentalize fact-checking for political objectives. The authors call for the importance of definitions that exclude this type of misuse of verification.
... " [5]. As a result, the EU have funded a range of FP7, H2020 and other projects to combat misinformation and disinformation including WeVerify [6,7], SocialTruth [8], PHEME [9,10], EUNOMIA [11] Fandango [12,13] and the European Digital Media Observatory (EDMO) [14]. Many other international organisations have also identified misinformation and disinformation as a threat and have increased efforts to combat it. ...
Preprint
Full-text available
The threat posed by misinformation and disinformation is one of the defining challenges of the 21st century. Provenance is designed to help combat this threat by warning users when the content they are looking at may be misinformation or disinformation. It is also designed to improve media literacy among its users and ultimately reduce susceptibility to the threat among vulnerable groups within society. The Provenance browser plugin checks the content that users see on the Internet and social media and provides warnings in their browser or social media feed. Unlike similar plugins, which require human experts to provide evaluations and can only provide simple binary warnings, Provenance's state of the art technology does not require human input and it analyses seven aspects of the content users see and provides warnings where necessary.
... Even though the tweets' content is mostly analysed in order to uncover the fake pieces of information (as presented in [10]), the authors of [11] based the results' prediction on the tweets' sentiment analysis. ...
Article
Full-text available
Background: the machine learning (ML) techniques have been implemented in numerous applications, including health-care, security, entertainment, and sports. In this article, we present how the ML can be used for building a professional football team and planning player transfers. Methods: in this research, we defined numerous parameters for player assessment, and three definitions of a successful transfer. We used the Random Forest, Naive Bayes, and AdaBoost algorithms in order to predict the player transfer success. We used realistic, publicly available data in order to train and test the classifiers. Results: in the article, we present numerous experiments; they differ in the weights of parameters, the successful transfer definitions, and other factors. We report promising results (accuracy = 0.82, precision = 0.84, recall = 0.82, and F1-score = 0.83). Conclusion: the presented research proves that machine learning can be helpful in professional football team building. The proposed algorithm will be developed in the future and it may be implemented as a professional tool for football talent scouts.
Chapter
Due to its potential for causing considerable social and national harm, fake news is a growing problem in today’s media landscape, especially on social media. Previously, it has been the focus of much research. A supervised machine learning method for classifying fake news as genuine or false is developed utilising Python scikit-learn and natural language processing (NLP) for textual analysis, as detailed in this paper’s research on the identification of fake news. Python scikit-learn module contains utilities like Count Vectorizer and Tiff Vectorizer that may help with text data tokenization and feature extraction. Based on the findings from the confusion matrix, we will use feature selection methods to examine and pick the most accurate features to research and select the best features. Internet and social media have made it easier for anyone to get their hands on a wealth of information. While these tools have made communication and information flow simpler and quicker, they have also threatened the authenticity of the news that is being disseminated. Fake news has had such an effect on society that it even influenced the 2016 USA presidential election. In our model, we compare different models to find out which model is providing highest accuracy for detecting fake news. It can be done by using the sklearn module.KeywordsMachine learningRandom forestNaïve BayesFake news detection
Article
Countering the fake news phenomenon has become one of the most important challenges for democratic societies, governments and non-profit organizations, as well as for the researchers coming from several domains. This is not a local problem and demands a holistic approach to analyzing heterogeneous data and storing the results. The research problem we face in this paper is the proposition of an innovative distributed architecture to tackle the above-mentioned problems. The architecture uses state-of-the-art technologies with a focus on efficiency, scalability and also openness, so that community-created components and digital content analyzers could be added. Moreover, we prove the usability of the prototype on Kaggle Fake News dataset. In particular, we consider different configurations of the proposed deep neural network and present the results reflecting the effectiveness, scalability and transferability of the proposed solution.
Article
Full-text available
Recent progress in the area of modern technologies confirms that information is not only a commodity but can also become a tool for competition and rivalry among governments and corporations, or can be applied by ill-willed people to use it in their hate speech practices. The impact of information is overpowering and can lead to many socially undesirable phenomena, such as panic or political instability. To eliminate the threats of fake news publishing, modern computer security systems need flexible and intelligent tools. The design of models meeting the above-mentioned criteria is enabled by artificial intelligence and, above all, by the state-of-the-art neural network architectures, applied in NLP tasks. The BERT neural network belongs to this type of architectures. This paper presents Transformer-based hybrid architectures applied to create models for detecting fake news.
Conference Paper
Full-text available
Fake news on social media is a major challenge and studies have shown that fake news can propagate exponentially quickly in early stages. Therefore, we focus on early detection of fake news, and consider that only news article text is available at the time of detection, since additional information such as user responses and propagation patterns can be obtained only after the news spreads. However, we find historical user responses to previous articles are available and can be treated as soft semantic labels, that enrich the binary label of an article, by providing insights into why the article must be labeled as fake. We propose a novel Two-Level Convolutional Neural Network with User Response Generator (TCNN-URG) where TCNN captures semantic information from article text by representing it at the sentence and word level, and URG learns a generative model of user response to article text from historical user responses which it can use to generate responses to new articles in order to assist fake news detection. We conduct experiments on one available dataset and a larger dataset collected by ourselves. Experimental results show that TCNN-URG outperforms the baselines based on prior approaches that detect fake news from article text alone.
Article
Full-text available
The use of new technologies along with the popularity of social networks has given the power of anonymity to the users. The ability to create an Alter-Ego with no relation to the actual user, creates a situation in which no one can certify the match between a profile and a real person. This problem generates situations, repeated daily, in which users with fake accounts, or at least not related to their real identity, publish news, reviews or multimedia material trying to discredit or attack other people who may or may not be aware of the attack. These acts can have great impact on the affected victims' environment generating situations in which virtual attacks escalate into fatal consequences in real life. In this article, we present a methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analysing the content of the comments generated by both profiles. Accompanying this approach we also present a successful real life use case in which this methodology was applied to detect and stop a cyberbullying situation in a real elementary school.
Conference Paper
Full-text available
This paper proposes a novel lifelong learning (LL) approach to sentiment classification. LL mimics the human continuous learning process, i.e., retaining the knowledge learned from past tasks and use it to help future learning. In this paper, we first discuss LL in general and then LL for sentiment classification in particular. The proposed LL approach adopts a Bayesian optimization framework based on stochastic gradient descent. Our experimental results show that the proposed method outperforms baseline methods significantly, which demonstrates that lifelong learning is a promising research direction.
Chapter
Full-text available
The use of new technologies along with the popularity of social networks has given the power of anonymity to the users. The ability to create an alter-ego with no relation to the actual user, creates a situation in which no one can certify the match between a profile and a real person. This problem generates situations, repeated daily, in which users with fake accounts, or at least not related to their real identity, publish news, reviews or multimedia material trying to discredit or attack other people who may or may not be aware of the attack. These acts can have great impact on the affected victims' environment generating situations in which virtual attacks escalate into fatal consequences in real life. In this article, we present a methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analysing the content of the comments generated by both profiles. Accompanying this approach we also present a successful real life use case in which this methodology was applied to detect and stop a cyberbullying situation in a real elementary school.
Article
Lies spread faster than the truth There is worldwide concern over false news and the possibility that it can influence political, economic, and social well-being. To understand how false news spreads, Vosoughi et al. used a data set of rumor cascades on Twitter from 2006 to 2017. About 126,000 rumors were spread by ∼3 million people. False news reached more people than the truth; the top 1% of false news cascades diffused to between 1000 and 100,000 people, whereas the truth rarely diffused to more than 1000 people. Falsehood also diffused faster than the truth. The degree of novelty and the emotional reactions of recipients may be responsible for the differences observed. Science , this issue p. 1146
Article
This paper proposes a novel lifelong learning (LL) approach to sentiment classification. LL mimics the human continuous learning process, i.e., retaining the knowledge learned from past tasks and use it to help future learning. In this paper, we first discuss LL in general and then LL for sentiment classification in particular. The proposed LL approach adopts a Bayesian optimization framework based on stochastic gradient descent. Our experimental results show that the proposed method outperforms baseline methods significantly, which demonstrates that lifelong learning is a promising research direction.
Conference Paper
One of the current challenges in machine learning is to develop intelligent systems that are able to learn consecutive tasks, and to transfer knowledge from previously learnt basis to learn new tasks. Such capability is termed as lifelong learning and, as we believe, it matches very well to counter current problems in cybersecurity domain, where each new cyber attack can be considered as a new task. One of the main motivations for our research is the fact that many cybersecurity solutions adapting machine learning are concerned as STL (Single Task Learning problem), which in our opinion is not the optimal approach (particularly in the area of malware detection) to solve the classification problem. Therefore, in this paper we present the concept applying the lifelong learning approach to cybersecurity (attack detection).