Conference PaperPDF Available

Identifying Personality-Based Communities in Social Networks

November 2013

November 2013

DOI:10.1007/978-3-319-14139-8_2

Conference: International Workshop on Legal and Social Aspects in Web Modeling (LSAWM) in International Conference on Conceptual Modeling (ER)

Authors:

Eleanna Kafeza

Zayed University

Andreas Kanavos

Ionian University

Christos Makris

University of Patras

Dickson K. W. Chiu

The University of Hong Kong

In this paper we present a novel algorithm for forming communities in a graph representing social relations as they emerge from the use of services like Twitter. The main idea centers in the careful use of features to characterize the members in the community, and in the hypothesis that well formed communities are those that designate diversity in the features of the participating members.

Content uploaded by Andreas Kanavos

Content may be subject to copyright.

Identifying Personality-based Communities in

Social Networks

Eleanna Kafeza1, Andreas Kanavos2, Christos Makris2and Dickson Chiu3

1. Athens University of Economics and Business, Greece, kafeza@aueb.gr

2. Computer Engineering and Informatics Department, University of Patras, Greece

{kanavos, makri}@ceid.upatras.gr

3. The University of Hong Kong, Hong Kong, dchiu88@hku.hk

Abstract. In this paper we present a novel algorithm for forming com-

munities in a graph representing social relations as they emerge from the

use of services like Twitter. The main idea centers in the careful use of

features to characterize the members in the community, and in the hy-

pothesis that well formed communities are those that designate diversity

in the features of the participating members.

1 Introduction

The topic of the paper is to present a novel methodology in order to characterize

interesting communities as they arise in social networks, such as those that are

formed in Twitter. The novelty of our approach lies in the fact that we are looking

for emerging communities, according to the diversity among the characters of

the involved users.

Until now, most practices on message transmission are based on ﬁnding the

inﬂuential users and try to use them to transmit a message. Moreover, recent

work on data ﬂow on social networks deals with the problem of predicting the

information current. Our approach is diﬀerent in the sense that we examine ways

to ”drive” the information within the network. We look for sub-networks that

demonstrate a high degree of information ﬂow and as a second step; we aim at

using these networks for increasing information continuance.

There is a lot of work from diﬀerent areas for creating communities from

graphs. For a thorough survey, we propose [5]. In our approach we argue that

communities in social media, e.g. Twitter, are more probable to contact informa-

tion easily if they are not ”biased” with respect to user personality. A balanced

community can handle information ﬂow quicker and deeper. Hence, here we di-

vide the Twitter graph related to the personality of users which is extracted

based on their behavior.

2 Related Work

Analysis in social networks has a long history, which is related to graph cluster-

ing algorithms, web searching algorithms, as well as bibliometrics; for a complete

2 Eleanna Kafeza, Andreas Kanavos, Christos Makris and Dickson Chiu

review of this area one should consult [4], [5], [10], [12], [14] and [17]. The ﬁeld

is related to link analysis in the web with cornerstone the analysis of the si-

gniﬁcance of web pages in Google using the PageRank citation metric [3], the

HITS algorithm proposed by Kleinberg [9] as well as their numerous variants

proposed in [11]. PageRank employs a simple metric based on the importance of

the incoming links while HITS uses two metrics emphasizing the dual role of a

web page as a hub and as an authority for information. Both metrics have been

improved in various forms and a related review can be found in [11].

Concerning community detection, various algorithms in literature have been

proposed. It should be noted that HITS by itself if exploring non principal eigen-

vectors, can be used in order to compute communities. Concerning communities,

the problem with which one can come across in bibliography, is related to graph

partitioning. A breakthrough in the area is the algorithm proposed in [6], for

identifying the edges lying between communities and their successive removal;

a procedure that after some iterations leads to the isolation of the communities

[6]. The majority of the algorithms proposed in the area are related to spectral

partitioning techniques. Those are techniques that partition objects by using the

eigenvectors of matrices, which form themselves in the speciﬁc set [8], [15], [18]

and [19]. One should also mention techniques that use modularity, a metric that

designates the density of links inside communities against the density outside

communities [5], [13], with the most popular being the algorithm proposed by

[2].

Besides ﬁnding emerging communities, estimating authorities has also at-

tracted attention. In [1], they extracted several graph features such as the users’

degree distribution, hubs and authority scores in order to model a user’s relative

importance. Other works in this area include Expertise Ranking [7] and [21],

where they identiﬁed authorities using link analysis by considering the induced

graph from interactions between users.

Interesting is the work presented in [20], which employs Latent Dirichlet

Allocation and a variant of the PageRank algorithm that clusters according

to topics and ﬁnds the authorities of each topic; the proposed metric is called

TwitterRank. A method proposed in [16], though similar to TwitterRank, diﬀers

in the use of additional features, in the employment of clustering, and in its

applicability in real-time scenarios since it can be easily implemented.

3 A methodology for identifying personality-based

communities in Social Networks

In our work we address the problem of identifying networks that can potential

exhibit maximum ﬂow of information regarding a subject matter. As already

mentioned in most cases in literature authors deal with the problem of ﬁnding

inﬂuential nodes, and there are several metrics developed to address this issue.

Our approach is diﬀerent in two aspects; ﬁrst we identify inﬂuential networks

and not individuals ones. Then we extract the networks related to a speciﬁc

Identifying Personality-based Communities in Social Networks 3

subject, and compute the inﬂuence based on user personality as extracted and

computed by quantitative metrics retrieved by Social Networks.

Social Networks provide metrics to measure diﬀerent aspects of user behavior.

In this paper, we will use Twitter as a case study but our approach can be easily

extended to any Social Network.

3.1 Basic Twitter Metrics

In this section we examine the basic metrics that we have exported from Twitter

so as to extract users personality. Primarily, we can categorize users’ tweets into

two categories: direct tweets and indirect tweets:

–Direct tweets (D): Here we can ﬁnd tweets that are produced by an author.

This category comes from the option Compose new Tweet and by this, a

user can potentially start a new conversation.

–Indirect tweets (I1, I2): In this category, tweets come from another user

and can take place with one of two following ways: when a user copies or

forwards a speciﬁc tweet so as to spread it in his network (retweets) or in

the second possible way, a user makes a comment to another tweet and as a

matter of fact, a possible conversation may be started (conversations). More

speciﬁcally, I1 represents the number of retweets of a user for a speciﬁc time

interval and in contrast, I2 represents the number of times there actually

was a conversation upon a tweet.

Other metrics we look into are:

–Number of followers (F): The number of users that follow a speciﬁc user.

–Frequency (F R): It calculates the frequency of users tweets. Hence this me-

tric indicates how often an author posts tweets. The way to calculate the

frequency is given as a set of time e.g. half an hour, how many times the

user tweeted.

–Number Hashtag keywords (HK): These keywords are words starting with

the symbol #. Under this symbol, anyone can put a speciﬁc tweet into a

certain thematic category. These metrics count the number of hashtags a

user has used upon a set of tweets that have occurred for a speciﬁc set of

time.

3.2 Using metrics to extract user personality

Related to the above metrics, we identify users personality as it appears in Twit-

ter. We have classiﬁed users in four basic categories based on their personality

as perceived by their peers and as reﬂected by their behavior. We call them per-

sonality traits. A personality trait is a type of behavior exhibited by a Twitter

user and can get one of the following values:

1. Popular: when a user is followed by many other users (e.g. followers).

4 Eleanna Kafeza, Andreas Kanavos, Christos Makris and Dickson Chiu

Table 1. Twitter basic metrics

Metric Sense/Meaning

F Number of followers

D Number of direct tweets

I1 Number of indirect tweets (retweets)

I2 Number of indirect tweets (conversations)

FR Frequency of user’s tweets

HK Number of hashtag keywords

2. Energetic: when a user posts tweets frequently. This means that this speciﬁc

user is energetic and enjoys talking hence he/she tweets on a regular basis.

3. Conversational: when a user takes part in conversations either by com-

menting other people’s posts or republishing them.

4. Multi-systemic: where a user has a high number of interests and likes to

state his opinion in a variety of subjects.

Given the above basic behavioral characteristics that a user can show in

any social network, we associate features in each one of them so as to have a

qualitative insight.

An atomic personality trait for a user x, is a tuple (F1, F 2, F 3, F 4) where

each Fiis deﬁned as follows:

1. Atomic Popular (F1): the number of followers computed as F.

2. Atomic Energetic (F2): the number of direct tweets divided by time interval,

computed as F R.

3. Atomic Conversational (F3): the number of retweets plus the number of

conversations computed as I1 + I2.

4. Atomic Multi-systemic (F4): the number of hashtags found in a given set of

tweets that occurs in a speciﬁc time interval, computed as HK.

Related to the above deﬁnitions, given a user of the Twitter xi, the atomic

personality trait is a tuple (F1, F 2, F 3, F 4) where each Fi, 1 ≤i≤4, holds the

degree that a user’s personality is associated with each one of the personality

traits. As a next step, we need to identify the dominant characteristics for each

user. As a result, for each metric, we set a range of values such that for the chara-

cteristic Fiif the atomic values of Fiare within the given range, we characterize

the user as having the corresponding behavior. For example, let us assume that

we have the user Helen14,3,2,1and that the range for each Fiis set to (10-14,

1-5, 3-7,0-4); then Helen is Popular, Energetic as well as Multi-systemic.

Let P={Popular (p1), Energetic (p2), Conversational (p3), Multi-systemic

(p4)}be the set of personality traits and xF1,F 2,F 3,F 4the atomic personality

trait for the user x. Moreover, we deﬁne with RR1,R2,R3,R4, a set of values that

determine the dominance of personalities; then color(x) is a tuple (c1, c2, c3, c4)

(personality tuple) such that cihas the value piif Fi≤Ri, for 1 ≤i≤4.

Identifying Personality-based Communities in Social Networks 5

3.3 The community extraction algorithm

Based on the above we can now derive the personality traits of each user of the

Twitter. We conceptualize Twitter as a graph where users are the nodes and we

color each node of the graph related to user personality.

We map Twitter as a graph where each node is a user and there is an edge

between two users if there is a relation between them i.e. one user follows the

other or vice versa. We then associate each node with one of the 15 possible

personality traits. Based on the above description of personality there are 4

possible personality traits and each user can have any of the 24−1 possible

values as his personality tuple. We use a breadth ﬁrst approach to traverse the

graph and for each node we use the deﬁnition of color(x) to decide upon the

color of each node/user.

After having decided upon the color/personality of each user, we deﬁne ”per-

sonality balanced networks”. A personality balances network is a network that

contains at least one node of every possible personality tuple. Related to ob-

servations regarding the ﬂow of information in human networks, we notice that

balanced networks which are composed by users with diﬀerent personality traits

tend to demonstrate higher degrees of information ﬂow. In our approach we

create sub-graphs based on the coloring of the nodes. Given the initial Twitter

graph, we traverse the graph using BFS until we ﬁnd nodes from each one of the

15 personality tuples. The algorithm then extracts that sub-graph.

4 Experimental Evaluation and Results

We examined the validity of our approach through experiments. We then imple-

mented the Twitter graph using Twitter4J, and have colored our graph according

to our methodology and ﬁnally have extracted the personality-based community

graph. Twitter4J is a Java library for the Twitter API, with which one can easily

integrate a Java application with the Twitter service.

Firstly, we created a Twitter graph as follows: we made a query on Twitter

on the subject of #SocialNetworks and we retrieved all the associated infor-

mation regarding users and tweets for a time interval of 7 days (01/07/2013 −

08/07/2013).

We also deﬁned the dominance of personality ranges as follows. Initially, we

speciﬁed ranges of a user that has all the personality trait as follows; (15%, 35%,

35%, 25%). Consequently, we identiﬁed the users that satisfy the above ranges.

As a next step, for the following more similar personality traits, we increased

each particular range by (33%, 25%, 25%, 33%) respectively.

For example, suppose that the initial user has (500,2,3,6) and is characte-

rized as Popular, Energetic, Conversational as well as Multi-systemic. Then a

user in order to be characterized as Popular, Energetic, Conversational, he/she

has to demonstrate this tuple (750,3,4, < 8).

We use this approach to set ranges in order to incorporate the concept that

these personality traits are inter-related. For example, a user with 1000 tweets

6 Eleanna Kafeza, Andreas Kanavos, Christos Makris and Dickson Chiu

is characterized as Popular, but also a user with 500 tweets and a number of

retweets plus conversations is also characterized as Popular.

Our results show that the selected nodes are approximately the 3% of our

graph for the time interval we use. Furthermore, the percentage of the number

of all tweets that the users of this graph exchange divided by the total number of

tweets for the given time is approximately 10%. Tweets consist of direct tweets,

retweets, as well as conversational tweets.

5 Conclusions and Future Work

Our conclusions are that although the community graph was the 3% of the

whole graph (number of community nodes divided by the number of total nodes

in the graph), we had in this network almost 10% of tweets (direct, retweets,

conversations) of the whole Twitter traﬃc. Hence we can conclude that our

assumption of the personality based communities playing a dominant role in

data traﬃc, has been veriﬁed.

This is a preliminary work. Further work is necessary to identify other per-

sonality traits, diﬀerent clusters of networks and diﬀerent ranges for dominant

personalities. Moreover, this work could be extended to other Social Networks

as well.

References

1. E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality

content in social media. WSDM 2008:183-194.

2. V.D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of

community hierarchies in large networks. Journal of Statistical Mechanics: Theory

and Experiment. P1000. 2008.

3. S. Brin, and L. Page. The PageRank Citation Ranking: Bringing Order to the Web.

Stanford Digital Library. 1998.

4. P.J. Carrington, J. Scott, and S. Wasserman. Models and Methods in Social Network

Analysis. Cambridge University Press. 2005.

5. S. Fortunato. Community detection in graphs. Physics Reports 486, 75-174. 2010.

6. M. Girvan, and M.E.J. Newman. Community Structure in Social and Biological

Networks. National Academy of Sciences, Vol. 99, No. 12, pp. 7821-7826. 2002.

7. P. Jurczyk, and E. Agichtein. Discovering Authorities in Question Answer Commu-

nities by Using Link Analysis. CIKM 2007:919-922.

8. B.W. Kernighan, and S. Lin. An Efcient Heuristic Procedure for Partitioning

Graphs. The Bell System Technical Journal, Vol. 49, No. 1, pp. 291-307. 1970.

9. J.M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. SODA

1998:668-677.

10. A. Lancichinetti, and S. Fortunato. Community detection algorithms: A compara-

tive analysis. Physical Review E80, 056117. 2009.

11. A.N. Langville, and C.D. Meyer. Google’s PageRank and Beyond: The Science of

Search Engine Rankings. Princeton University Press. 2006.

12. J. Leskovec, K.J. Lang, and M.W. Mahoney. Empirical Comparison of Algorithms

for Network Community Detection. WWW 2010:631-640.

Identifying Personality-based Communities in Social Networks 7

13. M.E.J. Newman. Fast algorithm for detecting community structure in networks.

Phys. Rev. E 69, 066133. 2004.

14. M.E.J. Newman. Networks: An Introduction. Oxford University Press. 2010.

15. A.Y. Ng, M.I. Jordan, and Y. Weiss. On Spectral Clustering: Analysis and an

algorithm. NIPS 2001:849-856.

16. A. Pal, and S. Counts. Identifying topical authorities in microblogs. WSDM

2011:45-54.

17. J.G. Scott. Social Network Analysis: A Handbook. SAGE Publications Ltd. 2000.

18. J. Shi, and J. Malik. Normalized Cuts and Image Segmentation. CVPR 1997:731-

737.

19. J. Shi, and J. Malik. Normalized Cuts and Image Segmentation. IEEE Transactions

on Pattern Analysis and Machine Intelligence 22(8):888-905. 2000.

20. J. Weng, E.-P. Lim, J. Jiang, and Q. He. TwitterRank: Finding Topic-sensitive

Inﬂuential Twitterers. WSDM 2010:261-270

21. J. Zhang, M.S. Ackerman, and L.A. Adamic. Expertise Networks in Online Com-

munities: Structure and Algorithms. WWW 2007:221-230.

Defining And Evaluating Twitter Influence Metrics: A Higher Order Approach In Neo4j

Article

Full-text available

Oct 2017

Ranking account influence constitutes an important challenge in social media analysis. Until recently, influence ranking relied solely on the structural properties of the underlying social graph, in particular on connectivity patterns. Currently, there has been a notable shift to the next logical step where network functionality is taken into account, as online social media such as Reddit, Instagram, and Twitter are renowned primarily for their function-ality. However, contrary to structural rankings, functional ones are bound to be network-specific since each social platform offers unique interaction possibilities. This article examines seven first order influence metrics for Twitter, defines a strategy for deriving their higher order counterparts, and outlines a probabilistic evaluation framework. Experiments with a Twitter subgraph with ground truth influential accounts indicate that a single metric combining structural and functional features outperforms the rest in said framework.

Evaluating Methods for Efficient Community Detection in Social Networks

Article

Full-text available

Apr 2022

Exploring a community is an important aspect of social network analysis because it can be seen as a crucial way to decompose specific graphs into smaller graphs based on interactions between users. The process of discovering common features between groups of users, entitled “community detection”, is a fundamental feature for social network analysis, wherein the vertices represent the users and the edges their relationships. Our study focuses on identifying such phenomena on the Twitter graph of posts and on determining communities, which contain users with similar features. This paper presents the evaluation of six established community-discovery algorithms, namely Breadth-First Search, CNM, Louvain, MaxToMin, Newman–Girvan and Propinquity Dynamics, in terms of four widely used graphs and a collection of data fetched from Twitter about man-made and physical data. Furthermore, the size of each community, expressed as a percentage of the total number of vertices, is identified for the six particular algorithms, and corresponding results are extracted. In terms of user-based evaluation, we indicated to some students the communities that were extracted by every algorithm, with a corresponding user and their tweets in the grouping and considered three different alternatives for the extracted communities: “dense community”, “sparse community” and “in-between”. Our findings suggest that the community-detection algorithms can assist in identifying dense group of users.

Aspect-Based Community Detection of Cultural Heritage Streaming Data

Conference Paper

Full-text available

Oct 2021

Twitter is considered a major and very popular social network providing an abundance of data generated by users’ interactions through tweets. After an appropriate analysis of this information, sets consisting of users who share similar attributes, and preferences can be identified. Massive cultural content management is important because reviews can be analyzed for extracting significant representations. In this study, an aspect mining method of a cultural heritage approach by incorporating big data methods, is proposed. We propose the combination of a community detection algorithm, i.e., the Parallel Structural Clustering Algorithm for Networks (PSCAN), with topic modelling methods, i.e., the Latent Dirichlet Allocation (LDA), for performing large-scale data analysis in Twitter.

Community Detection Algorithms for Cultural and Natural Heritage Data in Social Networks

Chapter

Full-text available

Jun 2021

In social network analysis, it is crucial to discover a community through the retrospective decomposition of a large social graph into easily interpretable subgraphs. Four major community discovery algorithms, namely the Breadth-First Search, the Louvain, the MaxToMin, and the Propinquity Dynamics, are implemented. Their correctness was functionally evaluated in the four most widely used graphs with vastly different characteristics and a dataset retrieved from Twitter regarding cultural and natural heritage data because this platform reflects public perception about historical events through means such as advanced storytelling in users timelines. The primary finding was that the Propinquity Dynamics algorithm outperforms the other algorithms in terms of NMI for most graphs. In contrast, this algorithm with the Louvain performs almost the same regarding modularity.

T-PCCE: Twitter Personality based Communicative Communities Extraction System for Big Data

Article

Mar 2019

The identification of social media communities has recently been of major concern, since users participating in such communities can contribute to viral marketing campaigns. In this work we focus on users' communication considering personality as a key characteristic for identifying communicative networks i.e. networks with high information flows. We describe the Twitter Personality based Communicative Communities Extraction (T-PCCE) system that identifies the most communicative communities in a Twitter network graph considering users' personality. We then expand existing approaches in users' personality extraction by aggregating data that represent several aspects of user behaviour using machine learning techniques. We use an existing modularity based community detection algorithm and we extend it by inserting a post-processing step that eliminates graph edges based on users' personality. The effectiveness of our approach is demonstrated by sampling the Twitter graph and comparing the communication strength of the extracted communities with and without considering the personality factor. We define several metrics to count the strength of communication within each community. Our algorithmic framework and the subsequent implementation employ the cloud infrastructure and use the MapReduce Programming Environment. Our results show that the T-PCCE system creates the most communicative communities.

Finding Influential Users in Twitter Using Cluster-Based Fusion Methods of Result Lists

Conference Paper

May 2018

The topic of the paper is to present a novel methodology in order to characterize influential users, such as members of Twitter, as they arise in social networks. The novelty of our approach lies in the fact that we incorporate a set of features for characterizing social media authors, including both nodal and topical metrics, along with new features concerning temporal aspects of user participation on the topic. We also take advantage of cluster-based fusion techniques for retrieved result lists for the ranking of top influential users.

Spam Detection On Social Media Platforms

Conference Paper

Jul 2020

Influence Based Community Detection Over Social Media

Conference Paper

Feb 2020

Understanding the Psycho-Sociological Facets of Homophily in Social Network Communities

Article

May 2019

"Community" in social networks is a nebulous concept. A community is generally assumed to be formed by people who possess similar attributes or characteristics, also known as "homophily". Although there has been a lot of research on community detection based on network topology, the semantic interpretation of communities is rarely studied. The present work aims to understand the behavioral similarity of users present in their personal neighborhood communities formed by friends, relatives, or colleagues, and addresses two fundamental questions: (i) Are communities formed by users who possess similar behavioral traits? If so, does this apply to all those sub-networks, i.e., friends, relatives, and colleagues? (ii) Does adding behavioral node-specific attributes/features to the nodes in a network lead to better community detection? To better understand the psycho-sociological homophilic nature of personal networks, the personalities and values of Twitter users were analyzed using the well-established "Big-5 personality model" and "Schwartz sociological behavior model". Empirical results based on the psychosociological behavior show that friends networks exhibit homophily, whereas relatives and colleagues networks do not exhibit such homophilic behavior. It can also be observed that neurotic people tend to behave heterogeneously with people of various personality traits. In addition, it is shown that such empirical evidence can be used as features for the tasks of community detection and link prediction.

Towards A Framework For Tensor Ontologies Over Neo4j: Representations And Operations

Conference Paper

Full-text available

Aug 2017

Ontology has been an active research field connecting philosophy, logic, history, mathematics, and computer science to name a few. Within an ontological context defined over a domain the entities as well as their associated relationships can be represented by the vertrices and the edges of a tree. From the latter new knowledge can be then inferred through a number of techniques including Horn logic from reasoners and RDF triplets. With the advent of the Semantic Web and sophisticated associated software tools including graph databases such as Neo4j, Sparksee, and TitanDB or XML parsers such as Xerces graph mining is done efficiently on the semantic level instead of the combinatorial or algebraic ones. Multilayer graphs, namely graphs whose labeled edges belong to a number of predetermined classes, have been recently introduced in social network analysis in order to represent the different interaction options between netizens. In this work the potential of applying this new type of graphs to an ontological context creating essentially an ontological tensor is outlain and its complexity is assessed. A human readable dataset based on the late 1970s and early 1980s Apple manually constructed from the 2011 officially authorized biography of Steve Jobs and the 1999 film Pirates of Silicon Valley serves as a concrete example complete with Neo4j queries.

Finding high-quality content in social media

Article

Jan 2008

Networks: An Introduction

Book

Jan 2010

M E J Newman

Social Network Analysis: A Handbook.

Article

Jan 1993

Networks: An Introduction

Article

Jan 2010

Mark E. J. Newman

The scientific study of networks, including computer networks, social networks, and biological networks, has received an enormous amount of interest in the last few years. The rise of the Internet and the wide availability of inexpensive computers have made it possible to gather and analyze network data on a large scale, and the development of a variety of new theoretical tools has allowed us to extract new knowledge from many different kinds of networks. The study of networks is broadly interdisciplinary and important developments have occurred in many fields, including mathematics, physics, computer and information sciences, biology, and the social sciences. This book brings together the most important breakthroughs in each of these fields and presents them in a coherent fashion, highlighting the strong interconnections between work in different areas. Subjects covered include the measurement and structure of networks in many branches of science, methods for analyzing network data, including methods developed in physics, statistics, and sociology, the fundamentals of graph theory, computer algorithms, and spectral methods, mathematical models of networks, including random graph models and generative models, and theories of dynamical processes taking place on networks.

Detecting Community Structure in Networks

Article

Mar 2004

Mark Newman

Many networked systems, including physical, biological, social, and technological networks, appear to contain ``communities'' -- groups of nodes within which connections are dense, but between which they are sparser. The ability to find such communities in an automated fashion could be of considerable use. Communities in a web graph for instance might correspond to sets of web sites dealing with related topics, while communities in a biochemical network or an electronic circuit might correspond to functional units of some kind. We present a number of new methods for community discovery, including methods based on ``betweenness'' measures and methods based on modularity optimization. We also give examples of applications of these methods to both computer-generated and real-world network data, and show how our techniques can be used to shed light on the sometimes dauntingly complex structure of networked systems.

Social Network Analysis: A Handbook 2nd Ed

Article

Jan 2000

John Scott

Authoritative Sources in a Hyperlinked Environment

Article

Jan 1999

Jon Kleinberg

The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.

Authoritative sources in a hyperlinked environment

Article

Nov 1998

Jon Kleinberg

The network structure of a hypcrlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.

Google’s PageRank and Beyond

Book

Jan 2006

Google's Pagerank and Beyond: The Science of Search Engine Rankings

Article

Mar 2008

Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. Many illustrative examples and entertaining asides MATLAB code Accessible and informal style Complete and self-contained section for mathematics review

Identifying Personality-Based Communities in Social Networks

Abstract

Recommended publications

Link prediction in directed social networks

Exploiting Geo-tagged Tweets to Understand Localized Language Diversity

Effective and Efficient Community Search Over Large Directed Graphs

On the visualization of the detected communities in dynamic networks: A case study of Twitter's netw...