Conference PaperPDF Available

Identifying Personality-Based Communities in Social Networks

Authors:

Abstract

In this paper we present a novel algorithm for forming communities in a graph representing social relations as they emerge from the use of services like Twitter. The main idea centers in the careful use of features to characterize the members in the community, and in the hypothesis that well formed communities are those that designate diversity in the features of the participating members.
Identifying Personality-based Communities in
Social Networks
Eleanna Kafeza1, Andreas Kanavos2, Christos Makris2and Dickson Chiu3
1. Athens University of Economics and Business, Greece, kafeza@aueb.gr
2. Computer Engineering and Informatics Department, University of Patras, Greece
{kanavos, makri}@ceid.upatras.gr
3. The University of Hong Kong, Hong Kong, dchiu88@hku.hk
Abstract. In this paper we present a novel algorithm for forming com-
munities in a graph representing social relations as they emerge from the
use of services like Twitter. The main idea centers in the careful use of
features to characterize the members in the community, and in the hy-
pothesis that well formed communities are those that designate diversity
in the features of the participating members.
1 Introduction
The topic of the paper is to present a novel methodology in order to characterize
interesting communities as they arise in social networks, such as those that are
formed in Twitter. The novelty of our approach lies in the fact that we are looking
for emerging communities, according to the diversity among the characters of
the involved users.
Until now, most practices on message transmission are based on finding the
influential users and try to use them to transmit a message. Moreover, recent
work on data flow on social networks deals with the problem of predicting the
information current. Our approach is different in the sense that we examine ways
to ”drive” the information within the network. We look for sub-networks that
demonstrate a high degree of information flow and as a second step; we aim at
using these networks for increasing information continuance.
There is a lot of work from different areas for creating communities from
graphs. For a thorough survey, we propose [5]. In our approach we argue that
communities in social media, e.g. Twitter, are more probable to contact informa-
tion easily if they are not ”biased” with respect to user personality. A balanced
community can handle information flow quicker and deeper. Hence, here we di-
vide the Twitter graph related to the personality of users which is extracted
based on their behavior.
2 Related Work
Analysis in social networks has a long history, which is related to graph cluster-
ing algorithms, web searching algorithms, as well as bibliometrics; for a complete
2 Eleanna Kafeza, Andreas Kanavos, Christos Makris and Dickson Chiu
review of this area one should consult [4], [5], [10], [12], [14] and [17]. The field
is related to link analysis in the web with cornerstone the analysis of the si-
gnificance of web pages in Google using the PageRank citation metric [3], the
HITS algorithm proposed by Kleinberg [9] as well as their numerous variants
proposed in [11]. PageRank employs a simple metric based on the importance of
the incoming links while HITS uses two metrics emphasizing the dual role of a
web page as a hub and as an authority for information. Both metrics have been
improved in various forms and a related review can be found in [11].
Concerning community detection, various algorithms in literature have been
proposed. It should be noted that HITS by itself if exploring non principal eigen-
vectors, can be used in order to compute communities. Concerning communities,
the problem with which one can come across in bibliography, is related to graph
partitioning. A breakthrough in the area is the algorithm proposed in [6], for
identifying the edges lying between communities and their successive removal;
a procedure that after some iterations leads to the isolation of the communities
[6]. The majority of the algorithms proposed in the area are related to spectral
partitioning techniques. Those are techniques that partition objects by using the
eigenvectors of matrices, which form themselves in the specific set [8], [15], [18]
and [19]. One should also mention techniques that use modularity, a metric that
designates the density of links inside communities against the density outside
communities [5], [13], with the most popular being the algorithm proposed by
[2].
Besides finding emerging communities, estimating authorities has also at-
tracted attention. In [1], they extracted several graph features such as the users’
degree distribution, hubs and authority scores in order to model a user’s relative
importance. Other works in this area include Expertise Ranking [7] and [21],
where they identified authorities using link analysis by considering the induced
graph from interactions between users.
Interesting is the work presented in [20], which employs Latent Dirichlet
Allocation and a variant of the PageRank algorithm that clusters according
to topics and finds the authorities of each topic; the proposed metric is called
TwitterRank. A method proposed in [16], though similar to TwitterRank, differs
in the use of additional features, in the employment of clustering, and in its
applicability in real-time scenarios since it can be easily implemented.
3 A methodology for identifying personality-based
communities in Social Networks
In our work we address the problem of identifying networks that can potential
exhibit maximum flow of information regarding a subject matter. As already
mentioned in most cases in literature authors deal with the problem of finding
influential nodes, and there are several metrics developed to address this issue.
Our approach is different in two aspects; first we identify influential networks
and not individuals ones. Then we extract the networks related to a specific
Identifying Personality-based Communities in Social Networks 3
subject, and compute the influence based on user personality as extracted and
computed by quantitative metrics retrieved by Social Networks.
Social Networks provide metrics to measure different aspects of user behavior.
In this paper, we will use Twitter as a case study but our approach can be easily
extended to any Social Network.
3.1 Basic Twitter Metrics
In this section we examine the basic metrics that we have exported from Twitter
so as to extract users personality. Primarily, we can categorize users’ tweets into
two categories: direct tweets and indirect tweets:
Direct tweets (D): Here we can find tweets that are produced by an author.
This category comes from the option Compose new Tweet and by this, a
user can potentially start a new conversation.
Indirect tweets (I1, I2): In this category, tweets come from another user
and can take place with one of two following ways: when a user copies or
forwards a specific tweet so as to spread it in his network (retweets) or in
the second possible way, a user makes a comment to another tweet and as a
matter of fact, a possible conversation may be started (conversations). More
specifically, I1 represents the number of retweets of a user for a specific time
interval and in contrast, I2 represents the number of times there actually
was a conversation upon a tweet.
Other metrics we look into are:
Number of followers (F): The number of users that follow a specific user.
Frequency (F R): It calculates the frequency of users tweets. Hence this me-
tric indicates how often an author posts tweets. The way to calculate the
frequency is given as a set of time e.g. half an hour, how many times the
user tweeted.
Number Hashtag keywords (HK): These keywords are words starting with
the symbol #. Under this symbol, anyone can put a specific tweet into a
certain thematic category. These metrics count the number of hashtags a
user has used upon a set of tweets that have occurred for a specific set of
time.
3.2 Using metrics to extract user personality
Related to the above metrics, we identify users personality as it appears in Twit-
ter. We have classified users in four basic categories based on their personality
as perceived by their peers and as reflected by their behavior. We call them per-
sonality traits. A personality trait is a type of behavior exhibited by a Twitter
user and can get one of the following values:
1. Popular: when a user is followed by many other users (e.g. followers).
4 Eleanna Kafeza, Andreas Kanavos, Christos Makris and Dickson Chiu
Table 1. Twitter basic metrics
Metric Sense/Meaning
F Number of followers
D Number of direct tweets
I1 Number of indirect tweets (retweets)
I2 Number of indirect tweets (conversations)
FR Frequency of user’s tweets
HK Number of hashtag keywords
2. Energetic: when a user posts tweets frequently. This means that this specific
user is energetic and enjoys talking hence he/she tweets on a regular basis.
3. Conversational: when a user takes part in conversations either by com-
menting other people’s posts or republishing them.
4. Multi-systemic: where a user has a high number of interests and likes to
state his opinion in a variety of subjects.
Given the above basic behavioral characteristics that a user can show in
any social network, we associate features in each one of them so as to have a
qualitative insight.
An atomic personality trait for a user x, is a tuple (F1, F 2, F 3, F 4) where
each Fiis defined as follows:
1. Atomic Popular (F1): the number of followers computed as F.
2. Atomic Energetic (F2): the number of direct tweets divided by time interval,
computed as F R.
3. Atomic Conversational (F3): the number of retweets plus the number of
conversations computed as I1 + I2.
4. Atomic Multi-systemic (F4): the number of hashtags found in a given set of
tweets that occurs in a specific time interval, computed as HK.
Related to the above definitions, given a user of the Twitter xi, the atomic
personality trait is a tuple (F1, F 2, F 3, F 4) where each Fi, 1 i4, holds the
degree that a user’s personality is associated with each one of the personality
traits. As a next step, we need to identify the dominant characteristics for each
user. As a result, for each metric, we set a range of values such that for the chara-
cteristic Fiif the atomic values of Fiare within the given range, we characterize
the user as having the corresponding behavior. For example, let us assume that
we have the user Helen14,3,2,1and that the range for each Fiis set to (10-14,
1-5, 3-7,0-4); then Helen is Popular, Energetic as well as Multi-systemic.
Let P={Popular (p1), Energetic (p2), Conversational (p3), Multi-systemic
(p4)}be the set of personality traits and xF1,F 2,F 3,F 4the atomic personality
trait for the user x. Moreover, we define with RR1,R2,R3,R4, a set of values that
determine the dominance of personalities; then color(x) is a tuple (c1, c2, c3, c4)
(personality tuple) such that cihas the value piif FiRi, for 1 i4.
Identifying Personality-based Communities in Social Networks 5
3.3 The community extraction algorithm
Based on the above we can now derive the personality traits of each user of the
Twitter. We conceptualize Twitter as a graph where users are the nodes and we
color each node of the graph related to user personality.
We map Twitter as a graph where each node is a user and there is an edge
between two users if there is a relation between them i.e. one user follows the
other or vice versa. We then associate each node with one of the 15 possible
personality traits. Based on the above description of personality there are 4
possible personality traits and each user can have any of the 241 possible
values as his personality tuple. We use a breadth first approach to traverse the
graph and for each node we use the definition of color(x) to decide upon the
color of each node/user.
After having decided upon the color/personality of each user, we define ”per-
sonality balanced networks”. A personality balances network is a network that
contains at least one node of every possible personality tuple. Related to ob-
servations regarding the flow of information in human networks, we notice that
balanced networks which are composed by users with different personality traits
tend to demonstrate higher degrees of information flow. In our approach we
create sub-graphs based on the coloring of the nodes. Given the initial Twitter
graph, we traverse the graph using BFS until we find nodes from each one of the
15 personality tuples. The algorithm then extracts that sub-graph.
4 Experimental Evaluation and Results
We examined the validity of our approach through experiments. We then imple-
mented the Twitter graph using Twitter4J, and have colored our graph according
to our methodology and finally have extracted the personality-based community
graph. Twitter4J is a Java library for the Twitter API, with which one can easily
integrate a Java application with the Twitter service.
Firstly, we created a Twitter graph as follows: we made a query on Twitter
on the subject of #SocialNetworks and we retrieved all the associated infor-
mation regarding users and tweets for a time interval of 7 days (01/07/2013
08/07/2013).
We also defined the dominance of personality ranges as follows. Initially, we
specified ranges of a user that has all the personality trait as follows; (15%, 35%,
35%, 25%). Consequently, we identified the users that satisfy the above ranges.
As a next step, for the following more similar personality traits, we increased
each particular range by (33%, 25%, 25%, 33%) respectively.
For example, suppose that the initial user has (500,2,3,6) and is characte-
rized as Popular, Energetic, Conversational as well as Multi-systemic. Then a
user in order to be characterized as Popular, Energetic, Conversational, he/she
has to demonstrate this tuple (750,3,4, < 8).
We use this approach to set ranges in order to incorporate the concept that
these personality traits are inter-related. For example, a user with 1000 tweets
6 Eleanna Kafeza, Andreas Kanavos, Christos Makris and Dickson Chiu
is characterized as Popular, but also a user with 500 tweets and a number of
retweets plus conversations is also characterized as Popular.
Our results show that the selected nodes are approximately the 3% of our
graph for the time interval we use. Furthermore, the percentage of the number
of all tweets that the users of this graph exchange divided by the total number of
tweets for the given time is approximately 10%. Tweets consist of direct tweets,
retweets, as well as conversational tweets.
5 Conclusions and Future Work
Our conclusions are that although the community graph was the 3% of the
whole graph (number of community nodes divided by the number of total nodes
in the graph), we had in this network almost 10% of tweets (direct, retweets,
conversations) of the whole Twitter traffic. Hence we can conclude that our
assumption of the personality based communities playing a dominant role in
data traffic, has been verified.
This is a preliminary work. Further work is necessary to identify other per-
sonality traits, different clusters of networks and different ranges for dominant
personalities. Moreover, this work could be extended to other Social Networks
as well.
References
1. E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality
content in social media. WSDM 2008:183-194.
2. V.D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of
community hierarchies in large networks. Journal of Statistical Mechanics: Theory
and Experiment. P1000. 2008.
3. S. Brin, and L. Page. The PageRank Citation Ranking: Bringing Order to the Web.
Stanford Digital Library. 1998.
4. P.J. Carrington, J. Scott, and S. Wasserman. Models and Methods in Social Network
Analysis. Cambridge University Press. 2005.
5. S. Fortunato. Community detection in graphs. Physics Reports 486, 75-174. 2010.
6. M. Girvan, and M.E.J. Newman. Community Structure in Social and Biological
Networks. National Academy of Sciences, Vol. 99, No. 12, pp. 7821-7826. 2002.
7. P. Jurczyk, and E. Agichtein. Discovering Authorities in Question Answer Commu-
nities by Using Link Analysis. CIKM 2007:919-922.
8. B.W. Kernighan, and S. Lin. An Efcient Heuristic Procedure for Partitioning
Graphs. The Bell System Technical Journal, Vol. 49, No. 1, pp. 291-307. 1970.
9. J.M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. SODA
1998:668-677.
10. A. Lancichinetti, and S. Fortunato. Community detection algorithms: A compara-
tive analysis. Physical Review E80, 056117. 2009.
11. A.N. Langville, and C.D. Meyer. Google’s PageRank and Beyond: The Science of
Search Engine Rankings. Princeton University Press. 2006.
12. J. Leskovec, K.J. Lang, and M.W. Mahoney. Empirical Comparison of Algorithms
for Network Community Detection. WWW 2010:631-640.
Identifying Personality-based Communities in Social Networks 7
13. M.E.J. Newman. Fast algorithm for detecting community structure in networks.
Phys. Rev. E 69, 066133. 2004.
14. M.E.J. Newman. Networks: An Introduction. Oxford University Press. 2010.
15. A.Y. Ng, M.I. Jordan, and Y. Weiss. On Spectral Clustering: Analysis and an
algorithm. NIPS 2001:849-856.
16. A. Pal, and S. Counts. Identifying topical authorities in microblogs. WSDM
2011:45-54.
17. J.G. Scott. Social Network Analysis: A Handbook. SAGE Publications Ltd. 2000.
18. J. Shi, and J. Malik. Normalized Cuts and Image Segmentation. CVPR 1997:731-
737.
19. J. Shi, and J. Malik. Normalized Cuts and Image Segmentation. IEEE Transactions
on Pattern Analysis and Machine Intelligence 22(8):888-905. 2000.
20. J. Weng, E.-P. Lim, J. Jiang, and Q. He. TwitterRank: Finding Topic-sensitive
Influential Twitterers. WSDM 2010:261-270
21. J. Zhang, M.S. Ackerman, and L.A. Adamic. Expertise Networks in Online Com-
munities: Structure and Algorithms. WWW 2007:221-230.
... Second, in order to evaluate their performance, these metrics are compared in a higher-order probablistic framework. Specifically, the first-order metrics proposed in Kafeza et al. (2013Kafeza et al. ( , 2014 are extended to higher-order ones by taking into consideration the interaction of the accounts within the particular social graph using techniques from Drakopoulos et al. (2016b). ...
... Functional metrics, as their name suggests, focus on the functionality of a social network and, consequently, facilitate interpretation at the expense of universal applicability. Regarding Twitter, personality models have been used for community discovery (Kafeza et al. 2013(Kafeza et al. , 2014, probabilistic analysis predict the most trending authors for a given topic (2011 a). Concerning the digital influence of a Twitter account, it can be derived by PageRank extensions (Weng et al. 2010;TunkRank 2015), by its importance compared to that of the remaining network (Mehta et al. 2012), or by a nonlinear combination of features 3 (Razis and Anagnostopoulos 2014). ...
... In order to differentiate between the existing metrics of Kafeza et al. (2013Kafeza et al. ( , 2014 and the proposed ones, the following definitions are necessary. ...
Article
Full-text available
Ranking account influence constitutes an important challenge in social media analysis. Until recently, influence ranking relied solely on the structural properties of the underlying social graph, in particular on connectivity patterns. Currently, there has been a notable shift to the next logical step where network functionality is taken into account, as online social media such as Reddit, Instagram, and Twitter are renowned primarily for their function-ality. However, contrary to structural rankings, functional ones are bound to be network-specific since each social platform offers unique interaction possibilities. This article examines seven first order influence metrics for Twitter, defines a strategy for deriving their higher order counterparts, and outlines a probabilistic evaluation framework. Experiments with a Twitter subgraph with ground truth influential accounts indicate that a single metric combining structural and functional features outperforms the rest in said framework.
... It is worth noting that through the works [44,46,47], it becomes clear that the significance and notion of leverage beyond the user perspective to the communication system perspective, as well as personality is the main criterion for the identification of influential communication systems. This results in the creation of such communities within the graphs of Twitter, using a grouping detection strategy based on modularity, which takes into consideration the individual personality traits of users. ...
Article
Full-text available
Exploring a community is an important aspect of social network analysis because it can be seen as a crucial way to decompose specific graphs into smaller graphs based on interactions between users. The process of discovering common features between groups of users, entitled “community detection”, is a fundamental feature for social network analysis, wherein the vertices represent the users and the edges their relationships. Our study focuses on identifying such phenomena on the Twitter graph of posts and on determining communities, which contain users with similar features. This paper presents the evaluation of six established community-discovery algorithms, namely Breadth-First Search, CNM, Louvain, MaxToMin, Newman–Girvan and Propinquity Dynamics, in terms of four widely used graphs and a collection of data fetched from Twitter about man-made and physical data. Furthermore, the size of each community, expressed as a percentage of the total number of vertices, is identified for the six particular algorithms, and corresponding results are extracted. In terms of user-based evaluation, we indicated to some students the communities that were extracted by every algorithm, with a corresponding user and their tweets in the grouping and considered three different alternatives for the extracted communities: “dense community”, “sparse community” and “in-between”. Our findings suggest that the community-detection algorithms can assist in identifying dense group of users.
... Researchers have incorporated the concept of influence from the side of users to the side of networks and users' personality has been utilized as the key characteristic for identifying influential networks [8], [9], [10]. Moreover, the behavior of users on an emotional level is enhanced by introducing a new methodology that effectively aids in community detection [11], [12], [13]. ...
Conference Paper
Full-text available
Twitter is considered a major and very popular social network providing an abundance of data generated by users’ interactions through tweets. After an appropriate analysis of this information, sets consisting of users who share similar attributes, and preferences can be identified. Massive cultural content management is important because reviews can be analyzed for extracting significant representations. In this study, an aspect mining method of a cultural heritage approach by incorporating big data methods, is proposed. We propose the combination of a community detection algorithm, i.e., the Parallel Structural Clustering Algorithm for Networks (PSCAN), with topic modelling methods, i.e., the Latent Dirichlet Allocation (LDA), for performing large-scale data analysis in Twitter.
... Also, in [13][14][15], the concept of influence from the side of users to the side of networks is expanded and personality has been utilized as the key characteristic for identifying influential networks. The result is to create this type of communities in Twitter graphs using a modularity-based community detection algorithm, taking into account users' personalities. ...
Chapter
Full-text available
In social network analysis, it is crucial to discover a community through the retrospective decomposition of a large social graph into easily interpretable subgraphs. Four major community discovery algorithms, namely the Breadth-First Search, the Louvain, the MaxToMin, and the Propinquity Dynamics, are implemented. Their correctness was functionally evaluated in the four most widely used graphs with vastly different characteristics and a dataset retrieved from Twitter regarding cultural and natural heritage data because this platform reflects public perception about historical events through means such as advanced storytelling in users timelines. The primary finding was that the Propinquity Dynamics algorithm outperforms the other algorithms in terms of NMI for most graphs. In contrast, this algorithm with the Louvain performs almost the same regarding modularity.
... More precisely, y 1 is the number of Followers, y 2 is the number of Direct Tweets, y 3 is the number of Retweets, y 4 is the number of Conversations, y 5 is the Frequency of user's Tweets and y 6 is the number of Hashtag Keywords as in [29]. These metrics describe the user communication behaviour in Twitter. ...
Article
The identification of social media communities has recently been of major concern, since users participating in such communities can contribute to viral marketing campaigns. In this work we focus on users' communication considering personality as a key characteristic for identifying communicative networks i.e. networks with high information flows. We describe the Twitter Personality based Communicative Communities Extraction (T-PCCE) system that identifies the most communicative communities in a Twitter network graph considering users' personality. We then expand existing approaches in users' personality extraction by aggregating data that represent several aspects of user behaviour using machine learning techniques. We use an existing modularity based community detection algorithm and we extend it by inserting a post-processing step that eliminates graph edges based on users' personality. The effectiveness of our approach is demonstrated by sampling the Twitter graph and comparing the communication strength of the extracted communities with and without considering the personality factor. We define several metrics to count the strength of communication within each community. Our algorithmic framework and the subsequent implementation employ the cloud infrastructure and use the MapReduce Programming Environment. Our results show that the T-PCCE system creates the most communicative communities.
... More specifically, their technique uses a Gaussian Mixture Model to cluster users into two clusters over their feature space as the aim is to reduce the size of the target cluster; that is the cluster containing the most authoritative users. In addition in [11] and [12], the notion of influence from users to networks is extended and in following, personality as a key characteristic for identifying influential networks is considered. The system creates influential communities in a Twitter network graph by considering user personalities where an existing modularity-based community detection algorithm is used. ...
Conference Paper
The topic of the paper is to present a novel methodology in order to characterize influential users, such as members of Twitter, as they arise in social networks. The novelty of our approach lies in the fact that we incorporate a set of features for characterizing social media authors, including both nodal and topical metrics, along with new features concerning temporal aspects of user participation on the topic. We also take advantage of cluster-based fusion techniques for retrieved result lists for the ranking of top influential users.
Article
"Community" in social networks is a nebulous concept. A community is generally assumed to be formed by people who possess similar attributes or characteristics, also known as "homophily". Although there has been a lot of research on community detection based on network topology, the semantic interpretation of communities is rarely studied. The present work aims to understand the behavioral similarity of users present in their personal neighborhood communities formed by friends, relatives, or colleagues, and addresses two fundamental questions: (i) Are communities formed by users who possess similar behavioral traits? If so, does this apply to all those sub-networks, i.e., friends, relatives, and colleagues? (ii) Does adding behavioral node-specific attributes/features to the nodes in a network lead to better community detection? To better understand the psycho-sociological homophilic nature of personal networks, the personalities and values of Twitter users were analyzed using the well-established "Big-5 personality model" and "Schwartz sociological behavior model". Empirical results based on the psychosociological behavior show that friends networks exhibit homophily, whereas relatives and colleagues networks do not exhibit such homophilic behavior. It can also be observed that neurotic people tend to behave heterogeneously with people of various personality traits. In addition, it is shown that such empirical evidence can be used as features for the tasks of community detection and link prediction.
Conference Paper
Full-text available
Ontology has been an active research field connecting philosophy, logic, history, mathematics, and computer science to name a few. Within an ontological context defined over a domain the entities as well as their associated relationships can be represented by the vertrices and the edges of a tree. From the latter new knowledge can be then inferred through a number of techniques including Horn logic from reasoners and RDF triplets. With the advent of the Semantic Web and sophisticated associated software tools including graph databases such as Neo4j, Sparksee, and TitanDB or XML parsers such as Xerces graph mining is done efficiently on the semantic level instead of the combinatorial or algebraic ones. Multilayer graphs, namely graphs whose labeled edges belong to a number of predetermined classes, have been recently introduced in social network analysis in order to represent the different interaction options between netizens. In this work the potential of applying this new type of graphs to an ontological context creating essentially an ontological tensor is outlain and its complexity is assessed. A human readable dataset based on the late 1970s and early 1980s Apple manually constructed from the 2011 officially authorized biography of Steve Jobs and the 1999 film Pirates of Silicon Valley serves as a concrete example complete with Neo4j queries.
Article
The scientific study of networks, including computer networks, social networks, and biological networks, has received an enormous amount of interest in the last few years. The rise of the Internet and the wide availability of inexpensive computers have made it possible to gather and analyze network data on a large scale, and the development of a variety of new theoretical tools has allowed us to extract new knowledge from many different kinds of networks. The study of networks is broadly interdisciplinary and important developments have occurred in many fields, including mathematics, physics, computer and information sciences, biology, and the social sciences. This book brings together the most important breakthroughs in each of these fields and presents them in a coherent fashion, highlighting the strong interconnections between work in different areas. Subjects covered include the measurement and structure of networks in many branches of science, methods for analyzing network data, including methods developed in physics, statistics, and sociology, the fundamentals of graph theory, computer algorithms, and spectral methods, mathematical models of networks, including random graph models and generative models, and theories of dynamical processes taking place on networks.
Article
Many networked systems, including physical, biological, social, and technological networks, appear to contain ``communities'' -- groups of nodes within which connections are dense, but between which they are sparser. The ability to find such communities in an automated fashion could be of considerable use. Communities in a web graph for instance might correspond to sets of web sites dealing with related topics, while communities in a biochemical network or an electronic circuit might correspond to functional units of some kind. We present a number of new methods for community discovery, including methods based on ``betweenness'' measures and methods based on modularity optimization. We also give examples of applications of these methods to both computer-generated and real-world network data, and show how our techniques can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Article
The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
Article
The network structure of a hypcrlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
Article
Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. Many illustrative examples and entertaining asides MATLAB code Accessible and informal style Complete and self-contained section for mathematics review