Conference PaperPDF Available

SocialTruth Project Approach to Online Disinformation (Fake News) Detection and Mitigation

August 2019

August 2019

DOI:10.1145/3339252.3341497

Conference: the 14th International Conference

Authors:

Marek Pawlicki

University of Technology and Life Sciences in Bydgoszcz

Rafał Kozik

University of Technology and Life Sciences in Bydgoszcz

Konstantinos Demestichas

National Technical University of Athens

Show all 6 authorsHide

The extreme growth and adoption of Social Media, in combination with their poor governance and the lack of quality control over the digital content being published and shared, has led information veracity to a continuous deterioration. Current approaches entrust content verification to a single centralised authority, lack resilience towards attempts to successfully "game" verification checks, and make content verification difficult to access and use. In response, our ambition is to create an open, democratic, pluralistic and distributed ecosystem that allows easy access to various verification services (both internal and third-party), ensuring scalability and establishing trust in a completely decentralized environment. In fact, this is the ambition of the EU H2020 SocialTruth project. In this paper, we present the innovative project approach and the vision of effective online disinformation detection for various practical use-cases.

Online Disinformation and Fake News Detection data analysis strategies and sources

…

Figures - uploaded by Manik Gupta

Content may be subject to copyright.

Content uploaded by Manik Gupta

Content may be subject to copyright.

SocialTruth Project Approach to Online

Disinformation (Fake News) Detection and Mitigation

Michał Choraś

Marek Pawlicki

Rafał Kozik

UTP University of Science and

Technology

Bydgoszcz, Poland

Konstantinos Demestichas

Pavlos Kosmides

ICCS, National Technical University

of Athens

Athens, Greece

Manik Gupta

London South Bank University

London, United Kingdom

ABSTRACT

The extreme growth and adoption of Social Media, in com-

bination with their poor governance and the lack of quality

control over the digital content being published and shared,

has led information veracity to a continuous deterioration.

Current approaches entrust content verication to a sin-

gle centralised authority, lack resilience towards attempts

to successfully "game" verication checks, and make con-

tent verication dicult to access and use. In response, our

ambition is to create an open, democratic, pluralistic and

distributed ecosystem that allows easy access to various ver-

ication services (both internal and third-party), ensuring

scalability and establishing trust in a completely decentral-

ized environment. In fact, this is the ambition of the EU

H2020 SocialTruth project. In this paper, we present the in-

novative project approach and the vision of eective online

disinformation detection for various practical use-cases.

KEYWORDS

pattern recognition, security, safety, detection, fake news,

networks.

ACM Reference Format:

Michał Choraś, Marek Pawlicki, Rafał Kozik, Konstantinos Demes-

tichas, Pavlos Kosmides, and Manik Gupta. 2019. SocialTruth Project

Approach to Online Disinformation (Fake News) Detection and

Mitigation. In Proceedings of the 14th International Conference on

Availability, Reliability and Security (ARES 2019) (ARES ’19), August

26–29, 2019, Canterbury, United Kingdom. ACM, New York, NY, USA,

10 pages. https://doi.org/10.1145/3339252.3341497

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for prot or commercial advantage and that copies bear

this notice and the full citation on the rst page. Copyrights for components

of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, or republish, to post on servers or to

redistribute to lists, requires prior specic permission and/or a fee. Request

permissions from permissions@acm.org.

ARES ’19, August 26–29, 2019, Canterbury, United Kingdom

ACM ISBN 978-1-4503-7164-3/19/08. . . $15.00

https://doi.org/10.1145/3339252.3341497

1 INTRODUCTION

During the last decade, there has been an unprecedented

revolution in how people interconnect and socialize. From

the early days of Facebook to today’s proliferation of Social

Media, people have been embracing this new form of social-

ization. Social networks, media and platforms are becoming

the usual way of how our societies operate for the purposes of

communication, information exchange, conducting business,

co-creation, learning and knowledge acquisition. However,

the extreme growth and adoption of Social Media, in combi-

nation with the lack of control over the digital content being

published and shared, has led the information veracity to

be in dispute. Establishing synergies with innovative infor-

mation and communication technologies (such as semantic

analysis tools, blockchains, emotional descriptors, lifelong

learning) can enhance the auditability, reliability and accu-

racy of the information being shared in Social Media, leading

to a more veritable society. The key to this situation is to

safeguard the distributed and open nature of Social Media,

strengthening pluralism and participation and mitigating

censorship.

According to a recent MIT study [

], false information is

spread six times faster than truth and reaches more people

than true stories, often with devastating impacts. A single

rumour spread in 2013 by a compromised Associated Press

account on Twitter resulted in an estimated $136.5 billion

drop in the S&P 500 index [

]. Over the past two years,

Fake and Hoaxed news have gained tremendous proportions,

particularly with Donald Trump’s presidential campaign in

the United States, as many people used the social networks

as a distribution system to spread highly inaccurate or com-

pletely erroneous stories[

]. As the fake news cases are

becoming countless [

], motives for their spreading are

often nancial or political.

In the Freedom of the Net 2017 report [

], Freedom House

is led to the same conclusion. The report studied 65 countries

worldwide between June 2016 and May 2017 and found out

that online manipulation and disinformation tactics played

an important role in elections in at least 18 out of 65 countries

during this period, including the United States.

ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.

Because of this high rate of false information spread, large

media organisations face increasing pressure to respond

quickly and accurately to breaking news stories. Although

established workows and editorial structures, such as the

use of copytasters, have been able to deal with the task in the

past, the challenge has severely increased, as news sources

have multiplied both in numbers and diversity in the era

of social media. In such an environment, the publishing or-

ganizations face the increasingly dicult task to identify

early a breaking news story, conrm its accuracy, provide

appropriate background, and publish it or broadcast it as

quickly as possible, thus providing high-quality journalism

[17].

Therefore in this paper we overview the approach adopted

by the EU H2020 SocialTruth project. The projects develops

innovative tools to ght with online disinformation and tests

those at specic use-cases, namely:

(1)

journalists and news editors (the solution will be tested

at ADNKRONOS, Italy)

(2)

search engines (the solution will be tested at QWANT,

France)

(3)

citizens and web users (the solution will be supported

by Infocons, Romania)

(4)

teaching material provider (the solution will be tested

by De-Agostini, Italy)

The paper is structured as follows: in Section II the challenges

and vision on how to solve the problem are presented. Sec-

tion III details the SocialTruth project means and techniques

to counter online disinformation problem. Conclusions are

given thereafter.

2 STATE OF AFFAIRS, CHALLENGES AND VISION

Facebook plans to use improved machine learning methods

to identify potential fake news articles, which can be passed

on to external fact checkers. Other attempts to deal with the

problem also exist, namely those of FakeBox [

], FightHoax

[

] and Truly Media [

]. However, after examining current

approaches, SocialTruth advocates that:

content verication cannot be entrusted to a single

centralised authority;

the aim should not be to devise the "single most perfect

verication algorithm", since even the most sophisti-

cated deep learning classication model is optimized

at the time it is created - and as a result its accuracy

deteriorates as new sources of fake news arise every

day and the writing style of fake news changes, in

order to successfully "game" and bypass verication

checks;

content verication should be easy and exible to use

"as a service" by individual users and professional or-

ganisations alike.

In response to these unmet challenges, it is necessary to

take into consideration the existing approaches with their

limitations, but also strongly focus on creating an open,

democratic, pluralistic and distributed ecosystem that en-

ables easy access to various verication services, ensuring

scalability and establishing trust in a completely decentral-

ized environment. An ideal system uniquely combines sev-

eral cutting-edge ICT technologies, including social mining,

multimodal content analytics, blockchains and lifelong learn-

ing machines, enabling deep analysis, contextualization and

understanding of mass-volume digital content, collected in

real time from dierent social networks and web sources. By

bringing together and incorporating business insights not

just from the ICT domain but also from several other elds,

such as digital journalism, mass communication media and

social media, a system like that promises a exible, robust

and pluralistic digital content verication and trust estab-

lishment solution that is open and easily adoptable by the

concerned stakeholders, service providers and user commu-

nities alike.

A. Fake news data analysis strategies and sources

The analyzed news can contain variety of data types such as

unstructured text, images/videos, references to other sources,

etc. One of the methods for dealing with textual informa-

tion is proposed in [

]. Therefore, one should consider the

following principles to address fake news detection:

•

Indexing and gathering of information published in

the Internet in order to cross-reference current news

with previous ones (e.g. to detect duplicates or pic-

tures/photos used in dierent context). We consider:

–

State-of-the-art image processing techniques for con-

tent analysis and images comparison

–

State-of-the-art text analysis techniques (e.g. docu-

ment term matrices, etc.)

•

Reputation scoring to identify reliability of person

and/or information source providing the news. We

propose to consider reputation evaluation of:

–Webpage providing/forwarding news

–

Person publishing information/news via social net-

work, etc.

–Reputation of the content

•

Comparison, of the similar news published by dierent

information sources

•

Machine learning techniques for content feature anal-

ysis

•

Analysis of semantics by means of applied ontologies

The problem of machine learning in such a complex

and dynamic environment as fake news detection, re-

quire eective techniques and the ecient analysis of

heterogeneous data sources as presented in Fig. 1. [

]

SocialTruth Project Approach to Online Disinformation (Fake News) Detection and MitigationARES ’19, August 26–29, 2019, Canterbury, United Kingdom

Figure 1: Online Disinformation and Fake News Detection data analysis strategies and sources

The advantages of an open, democratic, pluralistic and dis-

tributed ecosystem are straightforward:

•

No "one size ts all" solution and no vendor lock-in:

open ecosystem that provides access to congurable

combinations of content analytics and verication ser-

vices (with support for text, image and video content)

through standard Application Programming Interfaces

(APIs).

•

Distributed trust and reputation establishment pow-

ered by blockchain technology, strengthening auditabil-

ity and revealing information cascades.

•

Integration of Lifelong Learning Machines (LLM) that

constantly accumulate experience and learn new paradigms

of fake news.

•

Digital Companion for convenient everyday access of

individual users to verication services from within

their browsers. In this context, ICT engineers, data sci-

entists and blockchain experts need to work together

with media specialists and user communities, in an

eective and constructive manner to co-create such

an open and innovative content verication solution.

This will help:

–

Individual users to verify the validity of Social Media

content and stop the spreading of false information.

Besides this, individual users will be able to check

and verify the original author/source of the content.

–

Media organisations, story writers, content authors

and journalists to boost their investigative and cre-

ative capabilities by enabling them to cross-check

and combine various multimedia information sources,

retrieve and use relevant and veriable background

information, and maintain a stream of real-time up-

dates.

–

Search engines, Social media platforms and Online

advertising networks to improve information verac-

ity and contribute into a healthier and more sustain-

able web and social media ecosystem.

A system performing these functions needs to full the

following set of objectives:

•

Develop a distributed content verication solution

with a complex-free Digital Companion for online cred-

ibility verication of digital content found on web and

social media.

•

Compose a digital content analytics and verication

ecosystem with support for text, image and video, open

to third-party service providers.

•

Leverage blockchain technologies to establish distributed

reputation and trust in digital content sharing.

•

Deploy a distributed and thoroughly validated archi-

tecture (TRL-7) for the delivery of the credibility eval-

uation services.

ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.

•

Introduce innovative business models for news, web

and social media stakeholders, and provide support to

the EU strategic agendas and policies.

The main ambition of the project is on providing multiple,

extensible and reusable capabilities for verifying the credi-

bility of digital content and detecting hoaxes and chains of

news-scams spread in social media. A distributed architec-

ture would allow to scan and process vast amounts of digital

contents from social media and the Web to identify fake news

and to provide individuals and professionals with a certain

degree of condence about their accuracy and credibility.

One of the crucial components is the Digital Companion

- an open source browser plugin that enables individuals

to invoke one or more verication services and customise

their use. When multiple verication services are to be com-

bined, an open design software engine is supposed to assist

in carrying out the meta-verication process. Along with its

Digital Companion it increases the credibility of the content

shared in social media and improves the reliability of the

search engines, online advertising and click stream analytics

by limiting click percentages in hoaxes and falsied stories.

The project enables third-party service providers to plug

into the ecosystem and make available their own content an-

alytics and verication services through standard APIs. The

system exploits a distributed architecture and the blockchain

technology in order to provide a decentralised mechanism

for content verication. This decentralised approach will

radically facilitate the assessment of the shared content itself

and will pave the way for a wider adoption of decentralised

and community-based approaches in the next generation

Social Media platforms.

This foundation enhances the role of prosumers, commu-

nities and small businesses, mastering technological barriers,

introducing innovative and participatory forms of quality

journalism, and using various data in a secure manner. The

focus is on a series of advanced technologies to signicantly

enhance the production, management, use and reuse of digi-

tal content in Social Media, including social mining, lifelong

learning machines, blockchains, multi-level content analytics

and multimedia verication services. This enables the ex-

ploitation of the dierent data sources in order to introduce

a novel and eective content verication mechanism. Such

mechanism promote high-quality journalism, enhancing the

role of prosumers, communities and small businesses in the

eld of media.

The use of novel social mining technologies, such as emo-

tional content descriptors, advanced machine learning tech-

niques, multimedia verication algorithms and blockchain

technologies enables the emergence of a highly-ecient

and highly-distributed solution for identifying falsied infor-

mation and discovering information cascades across Social

Media. The scope of the project is to provide a distributed

solution in order to avoid the current situation where the

information is collected in a centralized approach to big data

companies outside Europe. Such companies are harvesting

individual users’ data in order to provide personalised or

easily "clickable" content to the users, thus increasing their

revenue when the users click on the provided content.

As to the development of intermediary-free solutions ad-

dressing information veracity for Social Media - the solutions

contribute to the understanding of information cascades, the

spreading of information and the identication of informa-

tion sources, the openness of algorithms and users’ access to

and control of their personal data (such as proles, images,

videos, biometrical, geolocation data and local data).

Personal data is better protected by limiting clicks on dubi-

ous digital content and suspicious web sources. Furthermore,

the solution is based on open algorithms (namely, those of

the expert meta-verication engines, lifelong learning mod-

els, as well as part of the verication services), in order for

the community of developers to be able to evolve them to

tackle the future evolution of fake news industries. The en-

tire ecosystem is open to third-party verication service

providers through open interfaces.

C. Design Principles

The distributed architecture we propose follows an open and

modular design, embodying the following core capabilities:

•

Digital Companion: This is an easy-to-use browser

plugin that allows a non-professional user to invoke

a metaverication process upon some form of digi-

tal content (e.g. article), passing its URI as input to a

meta-verication engine. In case of nonprofessional

use, the Digital Companion can be used by the author

of the digital article, by a reproducer (who shares the

article in Social Media) or even by a simple reader of

the article, who wishes to get an estimation of the cred-

ibility of the content before or after reading it. In case

of professional use, the Digital Companion is a web

front-end for medium/large organisations (e.g. news

agencies, search engines, etc.) that allows several calls

per day to the APIs of the meta-verication engine(s).

SocialTruth follows a user-centred design approach

for the Digital Companion.

•

Distributed Verication Services: This is a set of het-

erogeneous verication services each one providing a

specic type of content analytics (e.g. for text, image,

video) or verication-relevant functionality (e.g. emo-

tional descriptors, social inuence mapping. Each ser-

vice can be deployed at a dierent hosting facility (e.g.

dierent servers or clouds), hence there is no imposed

SocialTruth Project Approach to Online Disinformation (Fake News) Detection and MitigationARES ’19, August 26–29, 2019, Canterbury, United Kingdom

centralization. All of them use the same standard in-

terfaces to allow them to be easily accessible, reusable

and interchangeable. The registrar of service providers

and the services they oer is stored and maintained in

the blockchain.

•

Expert Meta-Verication Engine(s): Much like a meta-

search engine combines and presents results from mul-

tiple search engines, the expert metaverication en-

gine combines verication results from various sources

to compute a meta-score that reects the credibility of

the digital content under consideration. It follows an

open design, open algorithms and an expert-systems

approach. It uses open algorithms while most of its

settings and weights (e.g. which verication services

to prefer or to avoid, with which priority, etc.) can

optionally be congured through its standard web-

service interfaces. An example of a fake-news verica-

tion engine based on a two-level convolutional neural

network with reasoning based on collective user intel-

ligence can be found in [16]

•

Blockchain: The blockchain is used as a distributed

system of record with respect to digital content veri-

cation history. Since the complex web and social media

landscape is characterised by several competing con-

tent creators and distributors, each with their own mo-

tives, interests, strategies and practices, the blockchain

is an ideal tool to establish reputation and trust with-

out the need of a central authority or intermediary

(thus also avoiding to centralise even more regulatory

power to the US Internet giants, such as Facebook or

Google). Hence, a public distributed ledger provides an

auditable and immutable trail of verication actions

and reputation scores. The blockchain stores article

identication information, article descriptors (e.g. hash

codes for digital content integrity), author identica-

tion information, verication and meta-verication

scores, as well as identication information for the

verication services that have been used to calculate

them. It also holds the registrar of verication service

providers and the services they oer.

D. Meta-Verification process

For our purposes, the digital content is dened as text, photo

and video, produced, uploaded and/or edited by individuals,

journalists, reporters, writers, bloggers etc., in news sites,

social networks and web channels. The concept is to break

down the digital content into its constituent elements and

subsequently call individual verication services, ultimately

combining their results.

The verication process will scan the content shared through

Social Media and will identify its sources, metadata, media

elements, writing style, as well as how it has spread across

the Internet. It will attempt to verify these elements and will

ag the content as fake or not with a degree of certainty on

its accuracy, providing relevant background if applicable, in

a process that is partially similar to [

] In this light, the

verication process encompasses the following capabilities:

•

Seamless data crawling and social mining from mul-

tiple, heterogeneous external web sources and media.

Streamlined multi-layer semantic analysis, forgery de-

tection of multimedia content and indexing of corre-

sponding sources.

•

Deep understanding and analysis of authors’ writing

styles and content semantics based on multidimen-

sional contextual reasoning and inference. The proce-

dure will integrate these capabilities through an intelli-

gent meta-verication algorithm and make them easily

usable through a exible end-user tool (Digital Com-

panion), designed via an ecient user-centred method-

ology. By integrating and interfacing with advanced

services in digital content analysis, it promises to open

new applications and opportunities to wider audiences

and stakeholders, including but not limited to digital

content production professionals, aggregation and sup-

ply professionals, journalists and editors, online ad-

vertising, search engine and e-commerce companies,

content prosumers, social media sharing networks, etc.

3 SOCIALTRUTH APPROACH, TECHNIQUES AND

METHODS FOR ONLINE DISINFORMATION

DETECTION

This section provides insights on specic aspects of content

verication employed on the basis of the general architecture

and the meta-verication concept presented earlier.

A. Understanding Author and Writing Style

The semantic technologies are about understanding a story’s

author and writing style. This helps in the credibility evalua-

tion process. To address this, the solution uses as a starting

point and extends the semantic technology of an existing

tool developed by Expert System (ESF), namely COGITO,

which facilitates deep understanding of language.

The solution extends this capability through advanced and

innovative features like "writeprint" or stylometric analysis,

which makes it possible to analyse the style of writing behind

each story acquired from the web, social networks and any

other textual source, with the scope to:

•

Understand if an individual who is publishing a story

has a style of writing that can be related and mapped

to another style from a historical database.

•

Understand if contents published on the web using

dierent accounts or nicknames are actually related

ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.

to the same person (i.e. clustering of dierent virtual

IDs).

This stylometric analysis is based on a series of techni-

cal parameters, such as usage of short words, conjunctions,

vocabulary richness and complexity, lexical dierentiation,

etc., which are strictly related to specic human factors and

behaviour, thus eectively dening a sort of "ngerprint"

(writeprint) of the writer.

B. Semantic analysis and Clustering of similar news

Using the semantic analysis capability of COGITO as a start-

ing point, the goal is to identify the primary and secondary

subjects of articles and stories, as well as other elements rel-

evant for classication and entity extraction (such as People,

Organizations and Places). Thus, the solution aims at being

able to classify the text according to a detailed, customizable

tree of categories, enabling modication according to the

end-user’s requirements. These capabilities allow to:

(1)

Make the information management automatic, more

ecient and independent of subjective criteria.

(2)

Immediately identify useful information and reduce

search time, by simplifying access to content and en-

abling search by subject/topic.

(3)

Cluster information according to customizable tax-

onomies and categories.

Clustering using categorisation and entities extraction is one

point. Extracting more semantic tags, such as relations, verbs

and events, can lead the proposed solution to nd correla-

tions based on dierent analogy or similarity criteria. This

is valuable, since, for example, the following two sentences

are dierent from the statistical point of view, but similar

from the semantics point of view: "John is driving his car to

home" versus "John is reaching home by his vehicle".

C. Sentiment/Emotional Analysis

The solution needs to have the capability to extract a full set

of emotions from textual content, not just using a standard

positive/negative/neutral evaluation approach (sentiment)

but also providing the capability to provide a ner granu-

larity of the dierent kinds of feelings (i.e. stress, fear, trust,

anger, etc.). When applied to sources like social networks

and the web, this feature can be used, amongst others, as

a bias estimator (e.g. detecting users that may have bad in-

tention towards an event, a person, an infrastructure, etc.)

or for assessing the public opinion about a specic topic of

interest.

D. Natural Language Processing baseline

For the implementation of the aforementioned advanced Nat-

ural Language Processing (NLP) and semantic functionalities,

the solution uses the semantic engine COGITO. This allows it

to provide advanced semantic solutions, including semantic

search, text analytics, ontology and taxonomy management,

automatic categorization, automated self-help solutions, ex-

traction of unstructured information and natural language

processing. COGITO is powered by two main components:

•

The Disambiguator, a multi-level linguistic engine able

to disambiguate the meaning of a word by recognising

the context of occurrence of that word. Disambigua-

tion can be described as the process of resolving con-

icts that arise when a term can express more than

one meaning, leading to dierent interpretations of the

same string of text. The ultimate aim of such a process

it to associate each term with the author’s intended

use of the term itself.

•

The Sensigrafo, a semantic network that represents

and stores the dierent semantic relations between

the words of a language. Unlike traditional dictionar-

ies where words are listed in alphabetical order, words

contained in this database are arranged in groups of

items expressing identical or similar meaning. These

groups are connected to each other by millions of logi-

cal and language-related links.

The COGITO semantic engine, starts from this consoli-

dated architecture, but will require further development to

customize the semantic engine in the domain of disinfor-

mation and fake news, within the scope, in terms of tax-

onomies/categories denition, entities to be extracted, as

well as emotions and writeprint to be identied from text.

In order to apply this dedicated tuning to the semantic net-

work, a manual or semiautomatic approach can be used. The

second approach is preferable and is related to the use of

machine learning techniques to create new ontologies au-

tomatically, which can be validated by human beings. This

is a unique technique that can be used also in the opposite

direction, so that the output of the semantic analysis serves

as an input to the machine learning algorithms; these bidi-

rectional processes allow the creation of a truly high-quality

hybrid engine.

E. Image/Photo/Video Verification

Nowadays, photo and video content is an essential part of

media coverage, often attracting more the consumer’s at-

tention than the actual information provided as text. Thus,

image and video data can be used as powerful tools to cre-

ate untruthful information or to unreasonably strengthen a

message provided to a recipient. For example, fake images

illustrating the textual content can be considered as common

elements of fake news. Fake photos can be linked to both

factual (true) news as well as totally fake information shared

online. Fabricated photos might be posted due to propaganda

reasons, but also for strengthening the intended message and

SocialTruth Project Approach to Online Disinformation (Fake News) Detection and MitigationARES ’19, August 26–29, 2019, Canterbury, United Kingdom

for inducing emotions. Often, publishers highlight the tex-

tual content of news by means of a random photo stored

in the publisher’s archives that is contextually related to an

ongoing situation, however not taken at the time and place

of the presented or analysed event. The other possible indi-

cators of fake photos are: excessively drastic and unrealistic

scenes, emotional messages and time elapsed between the

described event and photo publication (sometimes seconds).

Sometimes, poor quality of the image/video content in rela-

tion to the context (e.g. excessively low or excessively high

quality) can indicate that the image/video could not be cap-

tured during the given situation. Hence, image verication

becomes crucial for fake news analysis and can be used to de-

termine whether a given image is similar or not to previously

posted images. To verify the trustworthiness of photos, one

can employ and combine approaches such as the following:

•

Person re-identication - This relates to the problem

of identifying people across images that have been

taken using dierent cameras or across time using

a single camera. Due to the variations of viewpoint,

pose, illumination, expression, aging, cosmetics, and

occlusion - a given individual may appear consider-

ably dierent across dierent camera views. Hence,

re-identication is an important capability for fake im-

age analysis. The problem of deciding if two patches

correspond to each other or not is quite a challenging

and dicult problem, because large variations in view-

point and lighting across dierent views can cause two

images of the same person to look quite dierent and

can cause images of dierent people to look very sim-

ilar. Typically, methods for re-identication include

two components: a method for extracting invariant

and discriminative features from input images, and a

similarity metric for comparing those features across

images. Research on re-identication will focus either

on nding an improved set of features, nding an im-

proved similarity metric for comparing features, or a

combination of both.

•

Contextual Image analysis - Inconsistencies in ele-

ments such as actual weather or season, architecture

of depicted place, time indicators in which the photo

was taken (e.g. clocks) can be helpful to verify that

the presented photo was taken at another place or at

another time than it pretends. Therefore, techniques

like image geo localization that use recognition tech-

niques from computer vision can be used to estimate

the location (at city, region, or global scale) of ordinary

ground level photographs and can be used to deter-

mine the actual context within which the image was

taken. Additionally, GPS information and metadata

annotations can be useful in geolocating images.

•

Verication of the publisher - Examining the history

of posts/news can be helpful in anomaly detection

in publication history and possible identication of

an invalid publisher, since a fake piece of news or

photo is often the rst piece of content published by

a malevolent user, or even his/her rst online activity

ever.

•

Analysis of image features - Services such as

http://fotoforensics.com allow for online analysis of

modications done to a given photo, using Error Level

Analysis (ELA) of submitted jpeg les. In general, these

algorithms detect the modied regions in a photo, due

to the fact that these regions generate more errors

in jpeg (lossy) compression during re-savings of the

photo.

•

Reverse image searching - Finding similar or slightly

modied photos in relation to the searched photo,

including photos with changed ratio and resolution,

colours, horizontally ipped, trimmed, etc. Reverse im-

age search engines also allow for ordering the search

results depending on the time of online publication,

therefore it is possible to nd that the photo illustrat-

ing the current events is shared on the web already

for a number of months or years. The most popular

online tools used for reverse image searching include

google.images.com and tineye.com. On the other hand,

there is a limited number of online tools available for

analysis of video similarities, and thus for detection

of modied, fake video content. One such example

is the YouTube Data Viewer developed by Amnesty

International [

], allowing for detection of older ver-

sions of the same video. It can be used to determine

the original video when multiple copies of the same

video from the same date are available. Given an ar-

bitrary target object marked with a bounding box at

the beginning of a video, the goal of visual tracking

is to localize this target in subsequent video frames.

An ideal tracker should be fast, accurate and robust

to the appearance change of the target and this is a

challenging problem that needs to be addressed as part

of the video verication innovation activities.

4 SOCIALTRUTH - DETECTION SYSTEM DESIGN

A. Distributed reputation and trust through

Blockchain technologies

Blockchain is a security architecture allowing dierent users

to share sensitive data in a secure and decentralised manner,

without a central authority. This architecture will be used

to share the reputation or credibility (verication scores) of

digital content between the dierent end-users in a trusted

way. These end-users, interested in the details behind this

ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.

scoring, will be able to read the verication history and to

decide if they trust or not the specic content. The solution

bases its blockchain algorithm on a public blockchain. With

that, it is be able to use directly a complete, functional infras-

tructure, but with a specically developed part on top of it.

The solution has to manage simultaneously remuneration

and data transactions. These data transactions will integrate

data and metadata (like Article ID, Author ID, Article De-

scriptors and dierent verication results) with dierent

links between them. To that end, a blockchain implementa-

tion with smart contract or chaincode is mandatory (such

as Ethereum or Hyperledger). One also has to take into ac-

count privacy (including General Data Protection Regulation

- GDPR compliance) and condentiality requirements (for

example to cypher any sensitive data). Dierent solutions

exist but need to be adapted for this kind of data (like the

Zk-snarks solution used for the ZCash blockchain). One can

complement the solution with an access control mechanism,

also based on blockchain. This mechanism will be able to

manage the access control to the data. For that, one can use

and extend the innovative Access Control - BlockChain (AC-

BC), based on the Ethereum Blockchain. Another important

aspect is the auditability, i.e. to use the data and the logs of

access control stored publicly in the blockchain. These data

will not only enable inference of abnormal behaviors, but

could also be used by one or several Expert Meta-verication

Engines to improve the data processing performed on the

multimedia data.

In conclusion, the proposed innovation strikes a balance

between the dierent blockchain aspects (public vs private,

Proof-of-Work consensus vs Proof-of-Concept consensus,

smart contract vs chaincode) to reach a commercially viable

solution.

B. Security and privacy by design

The design of the solution architecture is performed in ac-

cordance to the Security-by-Design and Privacy-by- Design

principles to:

•

implement basic security controls at the design phase,

so as to minimise the number of possible discovered

vulnerabilities in later phases, decreasing their poten-

tial impact on the prototype system’s security, and

•

ensure the privacy of users whose data is analysed

using the system (open data available on the web, social

media data, end-user data).

For the purposes of system security assurance, important

Security-by-Design principles dened by OWASP15 are im-

plemented, by means of establishing appropriate security

defaults, minimising attack surface area by reducing the

number of authorised users for any given functionality, re-

alising the so-called "least privilege" principle by assigning

only minimum amount of user rights to operate the services

and applying separation of duties (considering various user

roles with various levels of trust). Also, the use of third-party

components will be thoroughly analysed in terms of secu-

rity and its impact on the overall level of protection will be

evaluated.

In the design phase, applicable Privacy-by-Design princi-

ples [

] are also implemented, including privacy preserving

means as default system features, embedded in its design,

end-to-end data protection and privacy preservation man-

agement through the entire data lifecycle, and considering

privacy and data protection without an impact on the plat-

form functionalities. From the technical viewpoint, encryp-

tion by default will be considered, in order to mitigate the

security concerns associated with the unauthorised access

to the data, integrating it into data workows in an auto-

matic and seamless manner, whereas mechanisms for secure

destruction or removal of stored data will be established.

The implementation of appropriate Privacy Enhancing Tech-

nologies (PETs) are also considered, such as: use of privacy

keys, digital signatures, secures authentication and adop-

tion of secure communication protocols. For the purposes

of disassociation of the data / generated content that might

allow the detection of the identity of the data owner, appli-

cable techniques will be employed, such as anonymisation,

pseudonymisation and data clustering to prevent the correla-

tion of Personally Identiable Information (PII) for purposes

other than detection of fake news considered in the project.

Also, the data gathered, stored and analysed (e.g. during use

cases execution) will be used only for the predened pur-

poses according to the dened goals of use cases. On the

other hand, ethics management in the project will ensure

that the stored or analysed data will not be subject to any

commercial use and will not be shared publicly without the

consent of the data owner.

C. Web and social media data crawling

Within the project, work in the area of web and social media

data crawling uses the relevant software tools and services.

For its search engine, hardware and software architecture

hosted in its Data Centers located in Paris could be employed.

This architecture allows to process every day over 70 million

queries, to crawl 0.5 billion Web pages and run a 16 billion

Web pages index. The architecture is scalable and can grow

rapidly.

D. Verification in social media through deep learning

With the increasing popularity of the various social media

platforms, detecting and dealing with misinformation and

their creators becomes a critical problem. Malicious users

can be tracked down and their online linguistic expressions

can be used to infer their personal attributes and diverse

SocialTruth Project Approach to Online Disinformation (Fake News) Detection and MitigationARES ’19, August 26–29, 2019, Canterbury, United Kingdom

geographical, sociological and political demographics. Deep

learning models can be used to construct behavioural repre-

sentations for user personality identication and both private

traits/attributes can be predicted from digital records of hu-

man behaviour. Moreover, social network connections and

ow observations can be examined to detect misinformation

and false content can be found out using geosemantic infor-

mation. Similarly, the problem of detecting bots has gained

tremendous attention, since bots can be used to automate so-

cial media accounts and post fake content hampering the data

credibility. For example, bots can be used to sway political

elections by distorting the online discourse, manipulate the

stock market, or push antivaccine conspiracy theories that

might cause health epidemics. There are techniques to detect

bots at the account level, by processing large amount of so-

cial media posts, and leveraging information from network

graph structure, temporal dynamics, sentiment analysis, etc.

A repository of fake news with in-depth analysis of this kind

of data can be found in [

] Deep neural networks based on

long short-term memory (LSTM) architecture can be used

to exploit both content and metadata to detect bots at the

tweet level.

E. Lifelong Learning Intelligent Systems

One of the current challenges in machine learning is to de-

velop intelligent systems that are able to learn consecutive

tasks and to transfer knowledge from previously learnt basis

to learn new tasks. Such capability is termed as Lifelong

Learning [

] and tries to mimic "human learning". In prin-

ciple, humans are capable of accumulating knowledge from

the past and use it to solve future problems. Currently, the

existing classical machine learning algorithms are not able

to achieve that. Our ambition is to adapt Lifelong Machine

Learning techniques in order to gradually improve detection

models and incrementally update the knowledge, so that the

system can learn faster (reusing historical knowledge) e.g. by

transferring the knowledge (e.g. from dierent news topics

and tasks) [

]. Infact, there is an existing line of research [

]

showing promising results (also in the area of text mining).

Moreover, there are also competitions (such as DARPA L2M

- Lifelong Learning Machines) aiming at fostering research

in these directions. Therefore, the project’s ambition is to

use the Lifelong Learning paradigm to address challenges

in fake news detection. In particular, hybrid classier sys-

tems (HCS) and ensemble learning will be considered, as

these have already been successfully applied to solve com-

plex machine learning problems in various other domains.

In fact, the multi-classier paradigm is akin to some of the

algorithms proposed for Lifelong Learning that build and

maintain a kind of reservoir of latent models/modules that

may be useful or reused in the future.

5 PROTOTYPE

In [

] a prototype of a solution geared towards detecting

forged images is presented. The proposed method revolves

around image assessment, with the supposition that if the

image is forged, the whole piece might be fake. Three fac-

tors are taken into account - ELA analysis, copycat search

and meta data analysis. A publicly available dataset of 800

original and over 900 forged images was used. The detec-

tion of pasted elements is performed through dividing the

image into overlapping blocks and conducting the SURF and

FLANN agorithms. Meta-data allows to spot if the image

was modied with any image processing tools. A fusion of

decisions in our method allows to spot image forgeries with

64% accuracy.

6 CONCLUSION

The "fake news" phenomenon is a serious issue in modern

media and communication, with respect to false information

spreading within the society about current events and in-

cidents. The classication of fake news is challenging due

to its vague denition and tensions related to freedom of

speech.

Therefore the EU H2020 SocialTruth project tries to tackle

this important problem facing the modern society. In this

paper we have presented the project approach, design prin-

ciples and the used techniques. Our idea is that the selected

hands-on trials will cover a wide range of requirements and

operational conditions, allowing for a multi-faceted system

evaluation from the perspective of various end-users as enu-

merated in the paper.

ACKNOWLEDGMENTS

This work is funded under SocialTruth project, which has

received funding from the European Union’s Horizon 2020

research and innovation programme under grant agreement

No. 825477

REFERENCES

[1]

[n. d.]. FakeBox project homepage. https://machinebox.io/docs/fakebox

accessed= 24-Mar-2019.

[2]

[n. d.]. Fighthoax project website. http://ghthoax.com Accessed=

24-Mar-2019.

[3]

[n. d.]. Truly Media tackles fake news ahead

of German elections. https://www.truly.media/

truly-media- tackles-fake-news-ahead- of-german-elections/

Accessed= 24-Mar-2019.

[4]

[n. d.]. Youtube DataViewer. http://www.amnestyusa.org/sites/default/

customscripts/citizenevidence/ Accessed= 24-Mar-2019.

[5]

Ann Cavoukian. 2012. Operationalizing Privacy by Design: A Guide to

Implementing Strong Privacy Practices. http://www.ontla.on.ca/library/

repository/mon/26012/320221.pdf.

[6]

Zhiyuan Chen, Nianzu Ma, and Bing Liu. 2018. Lifelong Learning for

Sentiment Classication. (01 2018).

ARES ’19, August 26–29, 2019, Canterbury, United Kingdom Michał Choraś, et al.

[7]

Michał Choraś, Agata Giełczyk, Konstantinos Demestichas, Damian

Puchalski, and Rafał Kozik. 2018. Pattern Recognition Solutions for Fake

News Detection: 17th International Conference, CISIM 2018, Olomouc,

Czech Republic, September 27-29, 2018, Proceedings. 130–139. https:

//doi.org/10.1007/978-3- 319-99954- 8_12

[8]

Michal Choras, Agata Gielczyk, Konstantinos P. Demestichas, Damian

Puchalski, and Rafal Kozik. 2018. Pattern Recognition Solutions for

Fake News Detection. In Computer Information Systems and Industrial

Management - 17th International Conference, CISIM 2018, Olomouc,

Czech Republic, September 27-29, 2018, Proceedings. 130–139. https:

//doi.org/10.1007/978-3- 319-99954- 8_12

[9]

Michał Choraś, Rafał Kozik, Rafal Renk, and Witold Hołubowicz. 2017.

The Concept of Applying Lifelong Learning Paradigm to Cybersecurity.

663–671. https://doi.org/10.1007/978-3- 319-63315- 2_58

[10]

Patti Domm. [n. d.]. False Rumor of Explosion at White House Causes

Stocks to Briey Plunge; AP Conrms Its Twitter Feed Was Hacked.

https://www.cnbc.com/id/100646197.

[11]

Chloe Farand. [n. d.]. French social media awash with fake news stories

from sources ’exposed to Russian inuence’ ahead of presidential election.

https://tinyurl.com/y5gdzvz4.

[12]

Patxi Galán-García, José Gaviria de la Puerta, Carlos Laorden, Igor

Santos, and Pablo García Bringas. 2013. Supervised machine learning

for the detection of troll proles in twitter social network: application

to a real case of cyberbullying. Logic Journal of the IGPL 24 (2013),

42–53.

[13]

Mykhailo Granik and Volodymyr Mesyura. 2017. Fake news detection

using naive Bayes classier. 2017 IEEE First Ukraine Conference on

Electrical and Computer Engineering (UKRCON) (2017), 900–903.

[14]

MATHEW INGRAM. [n. d.]. Google’s Fake News Problem Could

Be Worse Than on Facebook. http://fortune.com/2017/03/06/

google-facebook- fake-news/.

[15]

Zhang Jiawei, Limeng Cui, Yanjie Fu, and Fisher B. Gouza. 2018. Fake

News Detection with Deep Diusive Network Model.

[16]

Feng Qian, Chengyue Gong, Karishma Sharma, and Yan Liu. 2018.

Neural User Response Generator: Fake News Detection with Collective

User Intelligence. https://doi.org/10.24963/ijcai.2018/533

[17]

Kevin Rawlinson. [n. d.]. How newsroom pressure is letting fake stories

on to the web. https://www.theguardian.com/media/2016/apr/17/

fake-news-storiesclicks-fact- checking.

[18]

Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and

Huan Liu. 2018. FakeNewsNet: A Data Repository with News Content,

Social Context and Dynamic Information for Studying Fake News on

Social Media.

[19]

Olivia Solon. [n. d.]. Tim Berners-Lee: we must regulate tech rms to

prevent ’weaponised’ web.

[20]

Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true

and false news online. Science 359, 6380 (2018), 1146–1151. https:

//doi.org/10.1126/science.aap9559

TruthSeekers Chain: Leveraging Invisible CAPPCHA, SSI and Blockchain to Combat Disinformation on Social Media

Chapter

Full-text available

Jan 2022

Disinformation has become a worrisome phenomenon at a global scale, spreading rapidly thanks to the growth of social media and frequently causing serious harm. For instance, it can perplex and manipulate users, fuel scepticism on crucial issues such as climate change, jeopardize a variety of human rights, such as the right to free and fair elections, the right to health, to non-discrimination, etc.Among the most used tools and techniques to spread disinformation are social bots, deep-fakes, and impersonation of authoritative media, people, or governments through false social media accounts. To deal with these issues, in this paper, we suggest TruthSeekers Chain, a platform which add a layer on top of the existing social media networks where I) the feed is augmented with new functionalities and reliable information retrieved from a blockchain II) a bot screening mechanism is used to allow only human generated content and engagement to be posted, III) the platform is open to integration of 3rd-party content verification tools helping the user to identify the manipulated or tampered content and IV) a self sovereign identity model is used to ensure accountability and to contribute building a reliable portable reputation system.KeywordsFake newsSocial MediaReputation systemSSINFTCAPTCHABlockchain

CoVerifi: A COVID-19 news verification system

Article

Jan 2021

There is an abundance of misinformation, disinformation, and “fake news” related to COVID-19, leading the director-general of the World Health Organization to term this an ‘infodemic’. Given the high volume of COVID-19 content on the Internet, many find it difficult to evaluate veracity. Vulnerable and marginalized groups are being misinformed and subject to high levels of stress. Riots and panic buying have also taken place due to “fake news”. However, individual research-led websites can make a major difference in terms of providing accurate information. For example, the Johns Hopkins Coronavirus Resource Center website has over 81 million entries linked to it on Google. With the outbreak of COVID-19 and the knowledge that deceptive news has the potential to measurably affect the beliefs of the public, new strategies are needed to prevent the spread of misinformation. This study seeks to make a timely intervention to the information landscape through a COVID-19 “fake news”, misinformation, and disinformation website. In this article, we introduce CoVerifi, a web application which combines both the power of machine learning and the power of human feedback to assess the credibility of news. By allowing users the ability to “vote” on news content, the CoVerifi platform will allow us to release labelled data as open source, which will enable further research on preventing the spread of COVID-19-related misinformation. We discuss the development of CoVerifi and the potential utility of deploying the system at scale for combating the COVID-19 “infodemic”.

The HEIC application framework for implementing XAI-based socio-technical systems

Article

Nov 2022

The development of data-driven Artificial Intelligence systems has seen successful application in diverse domains related to social platforms; however, many of these systems cannot explain the rationale behind their decisions. This is a major drawback, especially in critical domains such as those related to cybersecurity, of which malicious behavior on social platforms is a clear example. In light of this problem, in this paper we make several contributions: (i) a proposal of desiderata for the explanation of outputs generated by AI-based cybersecurity systems; (ii) a review of approaches in the literature on Explainable AI (XAI) under the lens of both our desiderata and further dimensions that are typically used for examining XAI approaches; (iii) the Hybrid Explainable and Interpretable Cybersecurity (HEIC) application framework that can serve as a roadmap for guiding R&D efforts towards XAI-based socio-technical systems; (iv) an example instantiation of the proposed framework in a news recommendation setting, where a portion of news articles are assumed to be fake news; and (v) exploration of various types of explanations that can help different kinds of users to identify real vs. fake news in social platform settings.

Authentic Facts: A Blockchain Based Solution for Reducing Fake News in Social Media

Conference Paper

Dec 2021

Now a days, people spend most of their times in social media. Due to availability of news and also for the free scope of sharing, most of the time rumors are being extensive in a short period of time. Detecting and preventing rumors and false information remains a significant challenge for social network. The introduction of blockchain technology has paved the way for the development of decentralized apps in order to address this issue. In this technology any information is recorded permanently. We will explore a strategy to eliminate bogus news on social media by utilizing the benefits of peer-to-peer network ideas. By issuing non-fungible token content rating we can detect and ensure appropriate news. The findings revealed that the suggested technique has a satisfactory performance and efficiency in recognizing rumors and preventing their spread.

The Mirage of Truth: The Instrumentalization of Fact-Checking to Spread an Ideological Discourse

Chapter

Nov 2021

Fact-checkers have grown recently, facing the decline of journalism and the acceleration of disinformation flows on the internet. Due to the recent scholarly attention to these journalistic outlets, some authors have pointed to diverse critics such as the political bias and the low impact of fact-checking initiatives. In line with the research approaching the weaponization of disinformation in politics, this chapter reflects on the instrumentalization of verifying practices as a fact to consider when studying fact-checking. The investigation applies a combined methodology to compare Bendita and Maldita initiatives. While the latter is internationally recognized as an entity of fact-checking, the second one arises as an imitation of it and lacks recognition and scholarly attention. Conclusions suggest that fact-checking implies more complex activities than refuting specific facts, while alt-right positions can instrumentalize fact-checking for political objectives. The authors call for the importance of definitions that exclude this type of misuse of verification.

PROVENANCE: An Intermediary-Free Solution for Digital Content Verification

Preprint

Full-text available

Nov 2021

The threat posed by misinformation and disinformation is one of the defining challenges of the 21st century. Provenance is designed to help combat this threat by warning users when the content they are looking at may be misinformation or disinformation. It is also designed to improve media literacy among its users and ultimately reduce susceptibility to the threat among vulnerable groups within society. The Provenance browser plugin checks the content that users see on the Internet and social media and provides warnings in their browser or social media feed. Unlike similar plugins, which require human experts to provide evaluations and can only provide simple binary warnings, Provenance's state of the art technology does not require human input and it analyses seven aspects of the content users see and provides warnings where necessary.

Who Will Score? A Machine Learning Approach to Supporting Football Team Building and Transfers

Article

Full-text available

Jan 2021
Entropy

Background: the machine learning (ML) techniques have been implemented in numerous applications, including health-care, security, entertainment, and sports. In this article, we present how the ML can be used for building a professional football team and planning player transfers. Methods: in this research, we defined numerous parameters for player assessment, and three definitions of a successful transfer. We used the Random Forest, Naive Bayes, and AdaBoost algorithms in order to predict the player transfer success. We used realistic, publicly available data in order to train and test the classifiers. Results: in the article, we present numerous experiments; they differ in the weights of parameters, the successful transfer definitions, and other factors. We report promising results (accuracy = 0.82, precision = 0.84, recall = 0.82, and F1-score = 0.83). Conclusion: the presented research proves that machine learning can be helpful in professional football team building. The proposed algorithm will be developed in the future and it may be implemented as a professional tool for football talent scouts.

Detection of Fake News Using Clustering Algorithms

Chapter

Sep 2022

Due to its potential for causing considerable social and national harm, fake news is a growing problem in today’s media landscape, especially on social media. Previously, it has been the focus of much research. A supervised machine learning method for classifying fake news as genuine or false is developed utilising Python scikit-learn and natural language processing (NLP) for textual analysis, as detailed in this paper’s research on the identification of fake news. Python scikit-learn module contains utilities like Count Vectorizer and Tiff Vectorizer that may help with text data tokenization and feature extraction. Based on the findings from the confusion matrix, we will use feature selection methods to examine and pick the most accurate features to research and select the best features. Internet and social media have made it easier for anyone to get their hands on a wealth of information. While these tools have made communication and information flow simpler and quicker, they have also threatened the authenticity of the news that is being disseminated. Fake news has had such an effect on society that it even influenced the 2016 USA presidential election. In our model, we compare different models to find out which model is providing highest accuracy for detecting fake news. It can be done by using the sklearn module.KeywordsMachine learningRandom forestNaïve BayesFake news detection

Fake News Detection Platform—Conceptual Architecture and Prototype

Article

Feb 2022

Countering the fake news phenomenon has become one of the most important challenges for democratic societies, governments and non-profit organizations, as well as for the researchers coming from several domains. This is not a local problem and demands a holistic approach to analyzing heterogeneous data and storing the results. The research problem we face in this paper is the proposition of an innovative distributed architecture to tackle the above-mentioned problems. The architecture uses state-of-the-art technologies with a focus on efficiency, scalability and also openness, so that community-created components and digital content analyzers could be added. Moreover, we prove the usability of the prototype on Kaggle Fake News dataset. In particular, we consider different configurations of the proposed deep neural network and present the results reflecting the effectiveness, scalability and transferability of the proposed solution.

Implementation of the BERT-derived architectures to tackle disinformation challenges

Article

Full-text available

Jul 2021
NEURAL COMPUT APPL

Recent progress in the area of modern technologies confirms that information is not only a commodity but can also become a tool for competition and rivalry among governments and corporations, or can be applied by ill-willed people to use it in their hate speech practices. The impact of information is overpowering and can lead to many socially undesirable phenomena, such as panic or political instability. To eliminate the threats of fake news publishing, modern computer security systems need flexible and intelligent tools. The design of models meeting the above-mentioned criteria is enabled by artificial intelligence and, above all, by the state-of-the-art neural network architectures, applied in NLP tasks. The BERT neural network belongs to this type of architectures. This paper presents Transformer-based hybrid architectures applied to create models for detecting fake news.

Pattern Recognition Solutions for Fake News Detection: 17th International Conference, CISIM 2018, Olomouc, Czech Republic, September 27-29, 2018, Proceedings

Chapter

Full-text available

Jan 2018

Neural User Response Generator: Fake News Detection with Collective User Intelligence

Conference Paper

Full-text available

Jul 2018

Fake news on social media is a major challenge and studies have shown that fake news can propagate exponentially quickly in early stages. Therefore, we focus on early detection of fake news, and consider that only news article text is available at the time of detection, since additional information such as user responses and propagation patterns can be obtained only after the news spreads. However, we find historical user responses to previous articles are available and can be treated as soft semantic labels, that enrich the binary label of an article, by providing insights into why the article must be labeled as fake. We propose a novel Two-Level Convolutional Neural Network with User Response Generator (TCNN-URG) where TCNN captures semantic information from article text by representing it at the sentence and word level, and URG learns a generative model of user response to article text from historical user responses which it can use to generate responses to new articles in order to assist fake news detection. We conduct experiments on one available dataset and a larger dataset collected by ourselves. Experimental results show that TCNN-URG outperforms the baselines based on prior approaches that detect fake news from article text alone.

Supervised machine learning for the detection of troll profiles in twitter social network: Application to a real case of cyberbullying

Article

Full-text available

Jul 2014

The use of new technologies along with the popularity of social networks has given the power of anonymity to the users. The ability to create an Alter-Ego with no relation to the actual user, creates a situation in which no one can certify the match between a profile and a real person. This problem generates situations, repeated daily, in which users with fake accounts, or at least not related to their real identity, publish news, reviews or multimedia material trying to discredit or attack other people who may or may not be aware of the attack. These acts can have great impact on the affected victims' environment generating situations in which virtual attacks escalate into fatal consequences in real life. In this article, we present a methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analysing the content of the comments generated by both profiles. Accompanying this approach we also present a successful real life use case in which this methodology was applied to detect and stop a cyberbullying situation in a real elementary school.

Lifelong Learning for Sentiment Classification

Conference Paper

Full-text available

Jan 2015

This paper proposes a novel lifelong learning (LL) approach to sentiment classification. LL mimics the human continuous learning process, i.e., retaining the knowledge learned from past tasks and use it to help future learning. In this paper, we first discuss LL in general and then LL for sentiment classification in particular. The proposed LL approach adopts a Bayesian optimization framework based on stochastic gradient descent. Our experimental results show that the proposed method outperforms baseline methods significantly, which demonstrates that lifelong learning is a promising research direction.

Supervised Machine Learning for the Detection of Troll Profiles in Twitter Social Network: Application to a Real Case of Cyberbullying

Chapter

Full-text available

Jan 2014

The use of new technologies along with the popularity of social networks has given the power of anonymity to the users. The ability to create an alter-ego with no relation to the actual user, creates a situation in which no one can certify the match between a profile and a real person. This problem generates situations, repeated daily, in which users with fake accounts, or at least not related to their real identity, publish news, reviews or multimedia material trying to discredit or attack other people who may or may not be aware of the attack. These acts can have great impact on the affected victims' environment generating situations in which virtual attacks escalate into fatal consequences in real life. In this article, we present a methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analysing the content of the comments generated by both profiles. Accompanying this approach we also present a successful real life use case in which this methodology was applied to detect and stop a cyberbullying situation in a real elementary school.

The spread of true and false news online

Article

Mar 2018

Lies spread faster than the truth There is worldwide concern over false news and the possibility that it can influence political, economic, and social well-being. To understand how false news spreads, Vosoughi et al. used a data set of rumor cascades on Twitter from 2006 to 2017. About 126,000 rumors were spread by ∼3 million people. False news reached more people than the truth; the top 1% of false news cascades diffused to between 1000 and 100,000 people, whereas the truth rarely diffused to more than 1000 people. Falsehood also diffused faster than the truth. The degree of novelty and the emotional reactions of recipients may be responsible for the differences observed. Science , this issue p. 1146

Lifelong Learning for Sentiment Classification

Article

Jan 2018

Fake news detection using naive Bayes classifier

Conference Paper

May 2017

The Concept of Applying Lifelong Learning Paradigm to Cybersecurity

Conference Paper

Jul 2017
Lect Notes Comput Sci

One of the current challenges in machine learning is to develop intelligent systems that are able to learn consecutive tasks, and to transfer knowledge from previously learnt basis to learn new tasks. Such capability is termed as lifelong learning and, as we believe, it matches very well to counter current problems in cybersecurity domain, where each new cyber attack can be considered as a new task. One of the main motivations for our research is the fact that many cybersecurity solutions adapting machine learning are concerned as STL (Single Task Learning problem), which in our opinion is not the optimal approach (particularly in the area of malware detection) to solve the classification problem. Therefore, in this paper we present the concept applying the lifelong learning approach to cybersecurity (attack detection).

Operationalizing Privacy by Design

Article

Sep 2012

Ann Cavoukian

SocialTruth Project Approach to Online Disinformation (Fake News) Detection and Mitigation

Abstract and Figures

Recommended publications

FakeChain: A Blockchain Architecture to Ensure Trust in Social Media Networks

The space between us: Twitter and crisis communication

Evaluation of the Existing Tools for Fake News Detection

Verification on the trustworthiness of information: A study