ThesisPDF Available

Understanding the universe of jazz by means of its musicians networks

Authors:

Abstract and Figures

In this paper, we will look in detail at networks formed by the collaboration of jazz musicians. In such networks, each node represents a musician and the edges between these nodes indicate whether they played together on an album or at a concert. Two such networks have been built, one for artist collaborations during the recording of an album, the other for live collaborations during a concert. We will compare the parameters and topologies of these networks, and then examine different meta-networks abstracted from the previous ones. This document is a master thesis which tries to answer the general following question: How to better understand the universe of jazz by means of its musicians networks?
Content may be subject to copyright.
i
Acknowledgements
This document being the result of a work spread over a period of one and a
half years, it seems essential to me to thank all the people who helped me in
any way. First of all, thanks to my supervisor and promoter Hugues Bersini
for his advice, his knowledge of jazz and his follow-up during this period of
time. Then, thanks to Lluc Bono Rosello for his different comments and for the
introduction to Gephi. Thanks also to the reviewers and the organizers of the
NetSciX 2022 conference and to the IRIDIA lab for the trust and the wonderful
memories related to this trip to Porto. Thanks to the organizers of the Print-
emps des Sciences for allowing me to present this work in a vulgarized way to
younger students. Thanks also to Laurie Goffette for her feedback and advice
on the English language. And finally, thanks to my family and friends for their
daily support and motivational speeches.
ii
UNIVERSITÉ LIBRE DE BRUXELLES
Abstract
Faculty of Sciences
Master in Computer Sciences
Understanding the universe of jazz by means of its musicians networks
by Julien Baudru
In this paper, we will look in detail at networks formed by the collabora-
tion of jazz musicians. In such networks, each node represents a musician and
the edges between these nodes indicate whether they played together on an
album or at a concert. Two such networks have been built, one for artist col-
laborations during the recording of an album, the other for live collaborations
during a concert. We will compare the parameters and topologies of these
networks, and then examine different meta-networks abstracted from the pre-
vious ones. This document is a master thesis which tries to answer the general
following question: How to better understand the universe of jazz by means
of its musicians networks?
iii
Contents
Abstract ii
List of Figures vi
List of Tables viii
List of Abbreviations ix
1 Introduction 1
1.1 Motivations .............................. 1
1.2 Researchquestions .......................... 2
1.3 A short history of jazz . . . . . . . . . . . . . . . . . . . . . . . . 3
2 State of the art 5
2.1 Networks................................ 5
2.2 Socialnetworks ............................ 5
2.3 Scale-freenetworks.......................... 6
2.4 Preferential attachment . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Communities ............................. 9
2.6 Jazzcommunities........................... 10
2.7 Natural language processing (NLP) . . . . . . . . . . . . . . . . 11
3 Method 12
3.1 Datacollection............................. 13
3.1.1 Choice of dataset . . . . . . . . . . . . . . . . . . . . . . . 13
iv
3.1.2 Problematic .......................... 13
3.1.3 Process............................. 14
Collecting data for the album network using NLP . . . . 15
Collecting data for the live performance network . . . . 18
3.1.4 Information on datasets . . . . . . . . . . . . . . . . . . . 18
3.2 Networks construction . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Staticnetwork......................... 20
3.2.2 Dynamic network . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 Meta-networks ........................ 23
3.3 Networks visualization . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Results 25
4.1 Technical analysis of both networks . . . . . . . . . . . . . . . . 25
4.1.1 General parameters and comparison . . . . . . . . . . . . 26
4.1.2 Scale-free networks . . . . . . . . . . . . . . . . . . . . . . 28
4.1.3 Preferential attachment . . . . . . . . . . . . . . . . . . . 30
4.1.4 Other parameters . . . . . . . . . . . . . . . . . . . . . . . 32
Rich-club coefficient . . . . . . . . . . . . . . . . . . . . . 33
Modularity and community . . . . . . . . . . . . . . . . . 34
Clustering coefficient . . . . . . . . . . . . . . . . . . . . . 39
4.2 Hubs analysis of both networks . . . . . . . . . . . . . . . . . . . 41
4.2.1 Definition of hub . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.2 Tophubs............................ 42
4.2.3 Instrument........................... 44
4.2.4 Geographical information . . . . . . . . . . . . . . . . . . 46
4.2.5 Birthyear............................ 48
4.2.6 Gender............................. 49
4.3 Comparison of musician’s networks at festivals . . . . . . . . . . 51
4.4 Analysis of meta-networks . . . . . . . . . . . . . . . . . . . . . . 53
v
4.4.1 Montreux instruments meta-network . . . . . . . . . . . 54
4.4.2 Years meta-networks . . . . . . . . . . . . . . . . . . . . . 56
4.5 Analysis of hub’s meta-networks . . . . . . . . . . . . . . . . . . 59
4.5.1 Country ............................ 59
5 Discussion 61
5.1 Limitations............................... 61
5.1.1 Datasets ............................ 61
5.1.2 Labels ............................. 62
5.1.3 Stylesofjazz.......................... 62
5.1.4 Time .............................. 62
5.2 Further improvement . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.1 Datacollection ........................ 63
5.2.2 Confidence in the characteristics carried by the hubs . . 64
5.2.3 Popularity of musicians . . . . . . . . . . . . . . . . . . . 64
5.2.4 Evolution of racial segregation . . . . . . . . . . . . . . . 65
5.2.5 Meta-networks ........................ 65
6 Conclusion 66
6.1 Topology................................ 66
6.1.1 Album and Live networks . . . . . . . . . . . . . . . . . . 66
6.1.2 Comparison between live networks . . . . . . . . . . . . 67
6.1.3 Communities ......................... 67
6.2 Tophubs ................................ 68
6.2.1 Archetypalhub........................ 68
6.2.2 Well known figures . . . . . . . . . . . . . . . . . . . . . . 69
6.2.3 Evolution of mentalities . . . . . . . . . . . . . . . . . . . 69
6.2.4 Research question . . . . . . . . . . . . . . . . . . . . . . 69
References 70
vi
List of Figures
3.1 Generalprocess ............................ 15
3.2 Steps for getting the jazzman name . . . . . . . . . . . . . . . . . 16
3.3 Albumnetwork ............................ 21
3.4 Montreuxnetwork .......................... 21
3.5 Album network evolution sample . . . . . . . . . . . . . . . . . 22
3.6 Example of meta-network construction . . . . . . . . . . . . . . 23
4.1 Number of new nodes per year - Wikipedia network . . . . . . . 27
4.2 Number of new nodes per year - Montreux network . . . . . . . 27
4.3 Degree distributions . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Preferential attachment - Album network . . . . . . . . . . . . . 31
4.5 Preferential attachment - Montreux network . . . . . . . . . . . 32
4.6 Distribution of rich-club coefficient by degree . . . . . . . . . . . 33
4.7 Modularity optimization comparison . . . . . . . . . . . . . . . 35
4.8 Communities visualization for both networks . . . . . . . . . . . 36
4.9 Network of Montreux festival: Isolated community . . . . . . . 37
4.10 Community in the network of Wikipedia albums: This figure
shows the separation in community around the sub-genre of jazz. 38
4.11 Clustering coefficient for Wikipedia network . . . . . . . . . . . 40
4.12 Clustering coefficient for Montreux network . . . . . . . . . . . 41
4.13 Top 3 hubs of album network . . . . . . . . . . . . . . . . . . . . 42
4.14 Top 3 hubs of festival network . . . . . . . . . . . . . . . . . . . . 43
4.15 Instruments of the top hubs . . . . . . . . . . . . . . . . . . . . . 44
vii
4.16 All nodes instrument distribution in the Montreux network . . 45
4.17 Countries of the top hubs . . . . . . . . . . . . . . . . . . . . . . 47
4.18 Cities of the top hubs . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.19 Birthyears of the top hubs . . . . . . . . . . . . . . . . . . . . . . 49
4.20 Genders of the top hubs . . . . . . . . . . . . . . . . . . . . . . . 50
4.21Festivalnetworks........................... 51
4.22 Degree distributions . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.23 Number of new nodes per year . . . . . . . . . . . . . . . . . . . 53
4.24 Instruments meta-network for Montreux festival . . . . . . . . . 55
4.25 Years meta-network for album collaboration network . . . . . . 57
4.26 Years meta-network for Montreux festival . . . . . . . . . . . . . 58
4.27 Instruments meta-network for Montreux festival . . . . . . . . . 59
viii
List of Tables
3.1 Comparison of both dataset . . . . . . . . . . . . . . . . . . . . . 19
4.1 Generalparameters.......................... 26
4.2 Parameters related to communities . . . . . . . . . . . . . . . . . 36
4.3 Average clustering coefficient . . . . . . . . . . . . . . . . . . . . 40
4.4 Top 5 hubs and their degree for both networks . . . . . . . . . . 42
ix
List of Abbreviations
API Application Programming Interface
BA Barabási Albert (model)
CD Compact Disc
CN Collaboration Network
CNM Clauset Newman Moore (model)
GRU Gated Recurrent Unit
HTML Hyper Text Markup Language
IoT Internet ofThings
LSTM Long Short-Term Memory
NLP Natural Langage Prossing
NLTK Natural Langage Tookit
RNN Recurrent Neural Networks
WBPA Weighted Betweenness Preferential Attachment (model)
1
Chapter 1
Introduction
1.1 Motivations
There are many systems that take the form of networks, i.e. a set of nodes
connected to each other by edges. Among the systems most commonly studied
in the literature, we find, among others, the network of hypertext links on the
World Wide Web, the network of scientific citation [1], the network of roads
between cities in a country [2], the networks related to biology or the very
famous network of film actors. This document focuses in detail on the network
formed by jazz musicians. In this network, each node represents a musician
and the links between these nodes indicate whether they played together on
an album or at a concert. These lines will therefore focus on the study of a
collaborative networks (CN) as well as on the construction of these networks
and their associated parameters.
This document is the continuation and the conclusion of a research work
extending over a year and a half. Thus, reference will be made to the results
obtained previously in order to compare them with those recently obtained.
Also, over this period of time, I had the chance to briefly present my thesis at
the NetSciX 2022 conference. During this conference, when some participants
saw my poster, they questioned the usefulness of this research, so I would like
to seize this opportunity to clarify several reasons why such a study might be
Chapter 1. Introduction 2
useful. First, the study of collaborative networks such as the ones studied here
can allow us to learn more about the relationships that humans involved in
these networks have. We will see that certain characteristics of these networks
often reflect historically-based realities. Then, if I had to find a purely prac-
tical function to this study, I would believe that labels and record companies
could use the results to identify the most influential musicians and artists. This
would allow them to sign the most interesting artists from a financial point of
view, keeping in mind that this study could be transposed to more mainstream
music styles. Moreover, it is not impossible that the results obtained in these
pages or the datasets created could be used as a basis for new studies by other
students or researchers. Finally, I would add, and this will be the only subjec-
tive point of this thesis, that the study of networks related to music and more
generally the study of music, remains one of the subjects that federates most
people. Since all time everybody is affected by this phenomenon that some
like to qualify as magic; the study of all the aspects of music, the compositions,
the rhythm, the dynamics, the tempo or even its networks in this case seem
essential.
1.2 Research questions
In this paper, two main questions will be addressed. The first one will focus
on a particular property of networks, namely the preferential attachment; the
second one will focus on the human and contextual aspect of networks, a nec-
essary question when studying the so-called social networks like those studied
here.
The first question this study will attempt to answer is the following: What
are the parameters favoring the preferential attachment among jazz musicians
within a collaborative network? In other words, this paper will try to find out
what are the parameters that favor the fact that musicians who have already
Chapter 1. Introduction 3
consecutively collaborated are more likely to make collaborations with new
musicians entering the network.
The second underlying question raised in this paper could be formulated
as follows: How to better understand the universe of jazz by means of its mu-
sicians networks? The latter calls for a more general analysis of networks and
jazz in order to draw conclusions about the relationships between musicians
in jazz. We will see in the following pages that these two questions are closely
related.
1.3 A short history of jazz
In order to study networks composed of jazz musicians, it seems essential to
start with a brief introduction to the history of jazz in order to better under-
stand the challenges and the different socio-historical contexts in which these
musicians have evolved over the years.
Many sources place the emergence of jazz in the early 20th century in the
United States, particularly in the city of New Orleans. Thanks to the cultural
diversity and the African, French, Italian, Caribbean and Mexican population
of the city, the different styles of music played by these populations (ex. rag-
time, blues) gradually mixed and formed the basis of the traditional jazz as
we know it today [3]. Some sources explain the explosion of jazz at this time
by the fact that World War I had just ended, thus leading to a period of peace
and economic boom combined with the fact that the younger generation at
that time had a huge need for freedom and expression. Therefore, jazz quickly
spread across the country thanks to the various jazz clubs that have become
legendary nowadays such as the Blue Note or the Cotton Club.
Later, thanks to the growing popularity of the genre, jazz spread all around
the world giving birth to new sub-genres such as free funk, jazz fusion or cool
jazz. One of the essential components of jazz is the fact that initially, most
Chapter 1. Introduction 4
of the musicians were black people, some of the clubs where they played even
forbade entry to non-white people. This reflects the history of the United States
and the racial segregation that took place in this country between 1877 and
1964, i.e. during the birth of jazz. Today, it would seem that the popularity
of jazz is not as strong as it was back then [4], but its influence is still evident
in many of the new productions of the modern day. The number of people
who attend jazz festivals as well as the number of its practitioners remains
significant.
5
Chapter 2
State of the art
In the following section, an overview of the current state of research in the
different fields that will be discussed is presented.
2.1 Networks
As explained in the introductory section, the main topic of this document is
networks. The exact mathematical term for these structures would be graph;
the creation of this concept is often attributed to the father of graph theory,
Euler, and his famous Königsberg bridge problem. Among the first authors
to be interested in the topic, it is unthinkable not to mention Hamilton, who
introduced a classical problem of graph theory namely the Hamiltonian cir-
cuit. Furthermore, it seems important to note that the first book dealing with
the subject is historically attributed to onig with his work entitled Theorie der
endlichen und unendlichen Graphen [Theory of Finite and Infinite Graphs] [5].
This publication served as a theoretical basis for all the scientists who will fol-
low him such as Erd˝os,Turán or the researchers cited later in this section.
2.2 Social networks
When we study networks in which the nodes are organizations or people and
the links between them represent a social interaction, we refer to of a social
Chapter 2. State of the art 6
network. This is the exact type of network that will be studied in this paper.
This notion seems to have been introduced by the anthropologist Barnes in
1954. Later Gluckman was the first to use graph theory in social science studies.
One of the first and best known theories on social networks is undoubtedly the
one by Milgram called the Small-world phenomenon [6]. With this experiment,
he managed to show that all people in a social network are six or fewer social
connections away from each other. This idea was first theorized by Karinthy in
1929. However, this theory has been heavily questioned after its publication,
especially with the arrival of the scale-free model.
2.3 Scale-free networks
Regarding the advances in the general field of network topology, the latest
major discoveries are the scale-free networks. This notion is very often asso-
ciated with the one of social network also called collaborative network (CN).
The scale-free characteristic, put forward by Barabási and Albert in the paper
Emergence of Scaling in Random Networks [7], have challenged the small-world
network models and have also allowed a more general understanding of net-
work topology.
Typically, scale-free networks are characterized by the fact that they have
a small number of extremely connected nodes and a large number of weakly
connected nodes, giving a power law degree distribution. The study of this
type of networks has offered the possibility to highlight properties common
to many networks in sometimes very distant research fields [8]. The most fre-
quently cited networks in the different fields of application are the following:
the network where the nodes are proteins and genes and where the edges rep-
resent the chemical interactions between them [9]; the one where the nodes are
nerve cells and the links are axons [10]; the one where the nodes are HTML
pages connected by links pointing to other pages [11]; or the one where the
Chapter 2. State of the art 7
nodes are scientists who have written papers that are linked according to the
citations in the other papers [1].
2.4 Preferential attachment
An inseparable concept of scale-free networks discussed earlier is the so-called
preferential attachment. Indeed preferential attachment and growth are two
main factors explaining the appearance of the scale-free property in the net-
works. Thus, the study of the preferential attachment seeks to determine what
factors influence the creation of new links during the dynamic evolution of a
social network.
The concept of preferential attachment was introduced by Barabasi and Al-
bert [7], who showed that nodes with higher degrees tend to attract new nodes,
and thus links, during the evolution of the network. Historically, it seems that
Udny Yule was one of the first to put forward this phenomenon in order to ex-
plain the power law distribution, which is why it is also called the Yule process.
This process generates a so-called long-tailed distribution following a Pareto
distribution or a power law. The phenomenon of preferential attachment could
be roughly summarized by the sentence: The rich get richer. It is more globally
known as the Matthew effect or cumulative advantage process.
Since the discovery of scale-free networks, many methods have been sug-
gested to generate this type of network using the preferential attachment mech-
anism, the most famous model allowing to simulate the phenomenon of pref-
erential attachment being the one suggested in 1999 by Barabasi and Albert. In
their model (BA), each new node added to the network is connected to existing
nodes with a probability proportional to the number of links that the existing
nodes already have. Note that this first model was based on the previous work
of the physicist Derek J. de Solla Price and his Price model.
Chapter 2. State of the art 8
Another model that should be mentioned is the one suggested by Barabasi
and Bianconi in their paper entitled Competition and multiscaling in evolving Net-
works [12]. In the latter, the authors recommended a new model, this time
based on the fitness of nodes to compete for links. In addition, they demon-
strated that the competition for links between nodes results in what they call a
multiscaling, which is a dynamic fitness-dependent exponent, allowing more
fit nodes to overcome more connected but less fit nodes.
In 2003, with a method to quantify preferential attachment in evolutionary
networks, Jeong,Néda and Barabási showed that this phenomenon was indeed
present in real networks [13]. In their studies on four networks, the scientific
citation, the internet, the actor collaboration and the science coauthorship net-
works, they found that the rate Π(k), with which a node with klinks acquires
new links, is a monotonically increasing function of keither linear, power-law,
or sublinear.
In 2009, based on the preferential attachment model (BA) put forward by
Barabasi and Albert,Ben-Naim and Krapivsky showed that in a preferential at-
tachment network, the degree distribution of the nodes depends on the depth.
Here, the depth is defined as the distance between the node and its root [14].
Moreover, they showed that nodes closer to the root tended to have a larger
number of connections. They explained this phenomenon by the correlation
that exists between the depth of a node and its age, so that younger nodes,
which are further from the root because they arrived later, are the least con-
nected.
In a paper published in Nature [15], Topirceanu,Udrescu and Marculescu
showed that the degree of the node is not the main attractor of new social
links. They showed that the betweenness of the nodes and the strength of
the links play a crucial role in the preferential attachment and thanks to that,
they suggested a new model named Weighted Betweenness Preferential Attach-
ment (WBPA) model.
Chapter 2. State of the art 9
More recently, among the new models proposed, Ruj and Pal have been
the first to put forward a preferential attachment model with degree bound
[16]. In this model, the maximum degree is upper bounded by a fixed value
and according to the authors, this model is more suitable for IoT and cyber-
physical systems than the conventional preferential attachment model.
2.5 Communities
One of the first algorithms to detect communities within a network was based
on centrality indices to find the boundaries of different communities and was
suggested by Newman and Girvan [17]. In their article Community structure
in social and biological networks, the two authors provided a new method with
a high degree of success in identifying communities in real-world networks
whose community structure is already known.
Since the introduction of this first algorithm, many new methods have been
proposed to detect communities, each with its own particularity. The most
often mentioned are the following ones:
- The method of Clauset,Newman and Moore (CNM) [18], the main inno-
vation of this method lies in the fact that it allows to treat very large
networks with a lower complexity than the other methods of the time.
- The method of Pons and Latapy [19], the latter uses the concept of random
walk which has the effect of making it efficient and allowing it to capture
well the community structure.
- And finally, the method of Watike and Tsurmi [20], this last method devel-
oped in 2007 proposes an improvement of the CNM method allowing to
settle its inefficiency on large networks caused by merging communities
in unbalanced manner.
Chapter 2. State of the art 10
The method chosen for this project is the one proposed in 2008 by Blondel
[21] from the University of Louvain. This method is based on the concept of
modularity, which is a quality index for a partition of a network into communi-
ties. Note that the Leuven algorithm is the method with the current best global
modularity. However, regarding modularity, in their paper titled Resolution
limit in community detection [22], through the analysis of modularity and its
applicability to community detection, Fortunato and Barthélemy found that by
applying modularity optimization, possible network partitions are explored
at a coarse level, which may favor network partitions with groups of nodes
combined into larger communities.
It should be noted that community detection algorithms can be divided
into two families: the static and the dynamic ones. These two families can be
further subdivided into two where we find the algorithms allowing the over-
lapping of communities, meaning a node can belong to more than one com-
munity, and the algorithms that do not allow it. In this document, concerning
the communities, we will only talk about static algorithms that do not allow
overlapping. However, regardless of the algorithm chosen, it is still difficult to
interpret the communities without the help of additional information.
2.6 Jazz communities
Regarding the communities in collaborative networks of jazz players in partic-
ular, the existing research is not extensive. However Gleiser and Danon, in their
paper entitled Community structure in Jazz [23], have put forward the primor-
dial parameter of communities, i.e. the grouping of nodes into sets supposed
to share common characteristics. Thus, the two authors have highlighted the
presence of racial communities in the jazz world between 1912 and 1940. At
Chapter 2. State of the art 11
that time, most of the white musicians performed only with other white mu-
sicians and the same was true for black musicians. Moreover, based on a ge-
ographical parameter, they also highlighted the separation of the musicians
from this period into four communities, which are New York, Chicago, the last
two and other cities.
2.7 Natural language processing (NLP)
The field of natural language processing, which will be discussed in section
3.1.3 on data collection, is a particular field of machine learning that focuses
on the understanding of texts by algorithms in order to predict a possible clas-
sification, translation or many other purposes. Thus, the main goal of natural
language processing algorithms is to enable machines to understand and in-
terpret human speech and text.
The first creation of modern NLP algorithms seems to date back to the end
of the 1980s thanks to a mix of linguistic and statistical methods. Today, there
are several models that have been considered for a while as being the most
performing. We can mention, among others, the models of Recurrent Neu-
ral Networks (RNN), Long Short-Term Memory (LSTM) and Gated Recurrent
Unit (GRU) networks in particular. In 2017, researchers from Google, in their
paper named Attention Is All You Need [24], suggested a model called the Trans-
former that currently seems to be the most successful in the field of natural
language processing.
Finally, from a more practical point of view, it seems that the most popular
libraries to do NLP nowadays are the Transformers library using Jax,PyTorch
and TensorFlow and the Natural Language Toolkit (NLTK) library [25]. The latter
has been chosen for this study.
12
Chapter 3
Method
In order to answer the questions stated in point 1.2, the methodology is di-
vided into two main parts: the construction of datasets (3.1) and the construc-
tion of the two networks (3.2), one for the collaborations during album record-
ing and one for the collaborations during live performance at the Montreux Jazz
Festival. The first part, the data collection, uses artificial intelligence algorithms
of NLP type and web scrapping to extract useful information from web pages
used as sources for one of both networks. Once these data are collected, the
static networks, dynamic networks and meta-networks can be built. This will
be discussed in the second part.
For information purposes, note that all the code for the multiple functions
used during this project was written in Python 3.8.8. This choice was made
for simplicity reasons inherent to the use of this language and the speed with
which it allows to prototype ideas1.
1All the python files are accessible via the GitHub directory of the author:
https://github.com/jbaudru/MasterThesis-JazzNetwork. Note however that many opti-
mizations can still be applied to the code, the emphasis has been put on obtaining analytical
results more than on algorithmic perfection.
Chapter 3. Method 13
3.1 Data collection
3.1.1 Choice of dataset
At the beginning, this work only included a collaboration network during al-
bum recording. Later, a network of collaboration during live performance was
added, allowing to compare the two networks and to highlight the similarities
and the differences.
The first network, was built thanks to the information available on the
French version of Wikipedia, because of the way these pages are structured
compared to their English version. This facilitates the webscrapping that we
will see later. Concerning the second network, the one of collaboration be-
tween musicians during concerts, the choice was made to take the Montreux
Jazz Festival because it is one of the best known in the jazz world and espe-
cially because it has an open database facilitating the data collection.
3.1.2 Problematic
The main problem encountered in the development of this thesis was, as is
often the case in research, the collection of data. Indeed, there was no ready-
to-use database available listing the collaborations between jazz musicians, so
most of the time spent on this research consisted in creating sufficiently large
and reliable datasets to be able to draw conclusions.
A second problem underlying the first one then appeared very early in the
project: since the first dataset is built with information provided on Wikipedia,
which is a collaborative website, they may be subject to inaccuracies and gaps.
Another essential parameter to take into account is the fact that this dataset
is constituted via webscrapping and NLP methods detecting all the names of
musicians, or words looking like names, present on the Wikipedia page of an
album. Thus, although it is rarely the case, some articles quote musicians who
Chapter 3. Method 14
did not collaborate on the album; when they are quoted, it is often for compar-
ison or anecdotal purposes. This has a direct consequence: some links have
been made between musicians despite the fact that they did not collaborate in
real life.
In addition, another concern with collecting the data on Wikipedia is that
these sources only took into account albums and compilations. Compilations
are a real concern because musicians on a compilation do not actually play
together.
To solve these problems encountered in the work preceding this report, fil-
ters on the names of musicians were applied to the data from Wikipedia and
compilations were not taken into account in the final data. Thus, it gives an-
other reason why the other database has been added to the project. Indeed,
using this database of the Montreux Jazz Festival, we can be sure that the mu-
sicians really played together. However, it is important to note that not all
the concerts of this festival were of jazz style. We will see that this specificity
will be important in the following pages. Although the data collection for this
database uses webscraping, it does not use the natural language processing
algorithms.
3.1.3 Process
For both networks the general procedure remains the same with a few more
subtleties for the album collaboration network. In a first step, we recover
the raw data (HTML file) via different webscrapping methods. Then, these
data are cleaned and finally, we build the networks from these data by using
NetworkX and some parsing functions. Figure 3.1 gives an overview of the
methodology used in the implementation of this approach.
Chapter 3. Method 15
FIGURE 3.1: General process
In addition to these two datasets associating the album/live with the list of
participating musicians, other data were collected during the different research
phases. These data included the instruments, the country, the birthdate and the
birthplace (see section 4.2.2).
Collecting data for the album network using NLP
As explained earlier, in order to build the collaborative network of jazz musi-
cians recording albums, data was collected from various Wikipedia pages. The
list below compiles the sources that allowed to build the dataset for the album
network:
1. Wikipedia: Liste des albums de jazz les plus vendus
2. Wikipedia: Album de jazz
3. Wikipedia: Album de jazz Américain
4. Wikipedia: Album de jazz français
5. Wikipedia: Album de jazz fusion
6. Wikipedia: Album de bossa nova
As briefly explained above, the main function of the NLP algorithms in this
project is to locate the different names of the artists present on a album web
page. Before using the NLP methods, a webscrapping algorithm is set up in
two steps. First, based on a Wikipedia page listing albums from a certain coun-
try, style or other criteria, the algorithm will fetch all the links leading to the
pages of the different albums mentioned in this main page. Then, for each of
Chapter 3. Method 16
these links, so each album, the algorithm will fetch the HTML code of the page
pointed by this link. To summarize the data collection, the NLP algorithm
related to webscrapping is shown in figure 3.2.
FIGURE 3.2: Steps for getting the jazzman name
The NLP algorithm extracts from the HTML page the sets of words that are
recognized as being of type PERSON. This type is provided natively in the nltk
library. The common names that can be erroneously recognized as PERSON
by the NLP algorithm are filtered using the following word lists provided by
nltk:
1s et ( n lt k . co r pu s . w or d s . wo rd s ( en ) )
2s et ( n lt k . co r pu s . w or d s . wo rd s ( fr ) )
Moreover, other filters are applied to the collected data. These filters avoid
counting producers, authors, journalists, labels and album names often cited
in the Wikipedia articles who are sometimes recognized as false positives by the
NLP algorithm. Depending on the category, each filter contains, for example, a
list of the most frequently cited jazz producers in a simple text file. Of course,
the professions of producer, lyricist and journalist are closely related to music
and are of considerable importance in the jazz scene. However, they do not fit
into the question that this document tries to answer, so it was decided to leave
Chapter 3. Method 17
them out. Thus, the purpose of these filters is to reduce the noise present in
the collected data.
Thus we manage to recover the following information: the title of the al-
bum, the year of release of this album and the list of musicians collaborating
on this album. Note that the dates of the collaboration are used for the con-
struction of the dynamic network in section 3.2.2.
Another possible use of natural language processing algorithms would be
to recognize when several different names refer to the same musician. For
example, the pseudonym of the jazz composer George Gershwin is Jacob Ger-
showitz. We often notice the presence of pseudonyms for musicians too. Most
famous pseudonyms were managed by hand.
Moreover, another machine learning application initially present in this
project used the ethnicolr library based on TensorFlow. The purpose of this al-
gorithm was to add the ethnic origin of the musicians in the datasets in order
to compare the results obtained by Gleiser and Danon about the racial segrega-
tion between 1920 and 1940 [23] with the data used in this project. The aim was
also to eventually show an evolution of mentalities after 1940. The choice of
using machine learning for this task was made because data on musicians’ ori-
gins are rarely explicitly provided on the Wikipedia pages used as sources and
can therefore not be obtained via classical webscrapping. More importantly,
given the large number of musicians present in the datasets, this data collec-
tion seemed too tedious to be done by hand. To do so, the algorithm tried to
guess the origin based on musicians’ names. The resulting classes were: Non-
Hispanic Whites, Non-Hispanic Blacks, Asians, and Hispanics. But since this
library is based primarily on consensus from 2000 or 2010 in the United States
of America, the origins of jazz players prior to this period are often awkwardly
predicted. Moreover, as said before, a lot of jazz musicians use pseudonyms,
which deeply complicates the detection of the musicians’origin for the algo-
rithm.
Chapter 3. Method 18
The use of pseudonyms by jazz musicians also causes difficulties in detect-
ing the gender of musicians. Indeed, the gender-guesser library in charge of
this task is based on statistics from a list of 50, 000 names. In various tests, this
library has often identified the gender of most musicians as unknown.
Furthermore, race, ethnicity and gender are complex and controversial top-
ics, requiring knowledge that the author of this document does not possess.
For this reason, this part of the data collection has been abandoned.
Collecting data for the live performance network
Regarding the data collection for the Montreux Jazz Festival, it was much easier
than for the case presented above. Indeed, contrary to the previous case, the
site used as a source is structured in such a manner that it allows the user to
find out all the concerts that took place during a chosen year. You can access
the site via this link: Montreux concerts database. Thus, a simple webscrap-
ping aglorithm allowing to recover the data for each year was developed, and,
as for the previous case, the following information could be collected: the title
of the album, the year of release of this album and the list of musicians collab-
orating on this album. In addition to this information, the instruments played
by each musicians during the concert were also recorded.
3.1.4 Information on datasets
The dataset for the album network is composed of three columns: the name of
the album, the release year of the album and the list of musicians present on
the album. The dataset for the live performance network is composed of the
four following columns: the name of the band, the year of the performance,
and the list of musicians present on the stage and their instruments.
Chapter 3. Method 19
Wikipedia dataset Montreux dataset
Number of entries 1,038 albums 4,589 performance
Time interval From 1928 to 2020 From 1967 to 2020
Quality score 0.89 1
TABLE 3.1: Comparison of both dataset
It is interesting to note that despite its smaller number of cells2,Wikipedia-
based dataset covers a larger time period than the lives performance dataset.
This difference can be explained by similar arguments to those discussed in a
further section 4.2.5.
Next, in order to approximate the level of confidence that can be given to
these two datasets, the author defines a very simple quality factor for each
of them. The higher this factor is, the more trustworthy the dataset is. This
quality score3for a dataset Xis defined as follows:
Quality ScoreX=1 {Total number of empty cell in X
Total number of cell in X }
According to table 3.1, we notice that 11% of the data is missing for the
album network and that the live performance data set is 100% complete and
reliable. This 11% difference can be explained by the fact that in the early
stages of this work a part of the data (120 cells) of the album dataset was
completely entered by hand and that errors may have slipped into the data.
The most frequently missing entry for this dataset seems to be the year of re-
lease of the albums. To overcome this problem, it would have been possible
to fill in these missing data either by the average of the years of release or by
2Here, we define a cell as a simple field associated with a column in a row, this definition
is the same adopted by the classic Excel spreadsheets.
3This formula can be improved by adding a variable including the number of duplicated
entries but there are no such entries in the present data sets.
Chapter 3. Method 20
the mode, i.e. the year appearing most often in the data. However, it was de-
cided to keep the data unchanged in order to be as close as possible to reality.
An other solution to the problem of numerous empty inputs in the dataset has
been presented in section 5.2.
Finally, in addition to the quality score, the data collection will always be
considered as incomplete given the huge volume of albums and live perfor-
mances that are performed each year and the many albums that are not cited
by the various websites from which the data come. Like many social networks,
the collaborative networks of jazz musicians are growing networks; each year,
new data are added to the different Wikipedia pages mentioned above and ev-
ery year, the Montreux Jazz Festival updates its database according to the con-
certs that take place there.
3.2 Networks construction
This section will show how the two collaboration networks were built, namely
the collaboration network during album recording, based on Wikipedia data,
and the collaboration network during live performance, based on Montreux
Jazz Festival data. Finally, it will also discuss how the different meta-networks
have been created.
3.2.1 Static network
In the studied networks, each musician is represented by a node and every
link between two nodes has a weight. This weight is simply determined by the
number of times two musicians have collaborated during a recording or a live
performance. In addition, the size of a node is determined by the degree of that
node, so the more a musician collaborates, the bigger the node representing
him will be in the networks. Note that the networks constructed here do not
have any directed edges.
Chapter 3. Method 21
FIGURE 3.3: Album network
The figure 3.3 shows the network obtained for the data collected on Wikipedia.
We already notice that one node seems much more important than the others;
we will analyze this in detail later in the section 4.2.2.
FIGURE 3.4: Montreux network
The figure 3.4 represents the network obtained for the Montreux Jazz Festi-
val. The experienced eye of the reader will directly notice the striking differ-
ence between the topology of this network and the one presented just before;
we will also see this point in detail in the section 4.1.1 of this thesis.
Chapter 3. Method 22
3.2.2 Dynamic network
In order to study the preferential attachment phenomena that occurs in the
networks, dynamic networks were developed for both networks. Indeed, the
study of the preferential attachment requires adding a time variable to the net-
works in order to see the evolution of the links between the nodes constitut-
ing it. To do this, initial the choice to use the library DyNetx was made and
then replaced by the visualization tool Gephi, allowing to realize this type of
task with less effort. Thus each link contains information about the time at
which this collaborations appeared in the network. In the context presented
here, this temporal information is the album release date or the date of the
live performance. Note that the dates used to describe the links between the
different nodes are obtained thanks to the webscrapping method presented in
section 3.1.3. As a consequence, some nodes are not present in the dynamic
network because no information was available regarding the release date of
the album/performance on the web pages used as sources.
FIGURE 3.5: Album network evolution sample
In practice, to allow a better experience for the reader and a better visual-
ization of the networks evolution, a video for each of the networks has been
made. You can access them by clicking here:
Video of the evolution of the album network (YouTube)
Video of the evolution of the live network (YouTube)
Chapter 3. Method 23
3.2.3 Meta-networks
The term meta-network used here in this paper can be seen as a network of
a network. The idea behind this concept is to build a new network based on
common characteristics of the nodes composing a first network. The charac-
teristics treated in these pages will concern the instruments of the musicians
for the all the nodes of the festival network and the geographical origin of the
musicians for the hubs of the festival network. Finally, the last characteristic
taken into account will be the years of album releases for the Wikipedia network
and the years of concert performances for Montreux Jazz Festival network.
To provide a better understanding of how these types of networks are built,
we can take the following example: if we take the instrument played by the
musicians as characteristics to build our meta-network, the nodes will be the
different existing instruments, and two instruments Aand Bwill be linked to-
gether if a musician playing an instrument Ahas collaborated with a musician
playing an instrument B.
The figure 3.6 gives an example of this type of meta-network construction
with few nodes.
FIGURE 3.6: Example of meta-network construction
Note that, except for the meta-network of the year, the meta-networks as
Chapter 3. Method 24
described here have been built only for the Montreux Jazz Festival network for
which we have information on the instruments of the musicians and for the
most important nodes of this same network for which we have information on
the country of origin.
3.3 Networks visualization
The visual representation of the networks was improved during the different
steps of this research. Initially, this project only used the Networkx library for
the creation and visualization of the network. However due to the well-known
speed problems of Python and the large size of the collaborative network at
the time of concert, this library was almost unusable for a correct visualiza-
tion. Some libraries such as neo4j,graphistry,Cytoscape3 or BioFabric seem to be
suitable solutions for a more efficient network construction, even if given the
relatively small number of nodes in the networks studied here, it is not a pri-
ority. However, for the visualization of the networks, the choice was made to
switch to the open-source software Gephi, giving the user a greater freedom of
movement in the network, a much clearer view and a set of very useful tools.
25
Chapter 4
Results
In this section, the different results obtained regarding the topologies of the
two studied networks will be presented as well as the results obtained concern-
ing the hubs of these two networks. The results obtained following the com-
parison of the live performance network with a new similar network, namely
another collaborative network during a jazz festival, will also be presented.
Finally, the results obtained for the different meta-networks created will be
discussed.
The approach used to present these results will be as follows: for each point,
the metrics used will be explained, then these metrics will be put in perspective
to the subject studied, namely the world of jazz, and finally the different values
obtained will be discussed.
4.1 Technical analysis of both networks
The different parameters studied in this section provide an overview of the net-
works. These parameters are those usually studied in other documents dealing
with collaborative networks, such as the rich-club and clustering coefficients,
the gamma factor, and the notion of communities.
Chapter 4. Results 26
4.1.1 General parameters and comparison
Table 4.1 shows the different parameters typically studied in the exploration of
networks.
Wikipedia network Montreux network
Number of nodes 1,540 14,090
Maximum degree 153 251
Average degree 30.32 65.39
Diameter 12 15
Density 0.006 0.001
Modularity 0.83 0.931
Num. of weakly connected nodes 196 1,136
TABLE 4.1: General parameters
Several references to these parameters will be made in the following sec-
tions, but it is important to note here that the live performance network is
much larger in terms of size, i.e. it has many more nodes than the album net-
work, respectively, 14, 090 versus 1,540.
Another parameter that is consistent with this simple observation is the
diameter of the networks. The diameter is a measure to calculate the length of
the longest path between any two vertices of a graph. This parameter, denoted
δ, is calculated with the following formula, where s(i,j)is the number of edges
in the shortest path from vertex ito vertex j:
δ=maxi,js(i,j)
Thus we observe that the diameter of the collaboration network during live
performances is higher than the one of collaboration during album recording.
In addition, in order to study how these two networks evolve, the nodes
should be coupled with a time factor; an interesting parameter to study is thus
Chapter 4. Results 27
the number of new nodes as a function of time. The figures 4.1 and 4.2 repre-
sent the number of new nodes entering the networks each year.
FIGURE 4.1: Number of new nodes per year - Wikipedia
network
In general, we notice a big increase in the number of nodes in the album net-
work starting in 1955. After that, we observe a decrease in the number of new
nodes as time passes. This can be explained by two potential factors, the first
being the fact that, since around 2000 to 2010, the compact disc (CD) format,
often associated with albums, has seen its sales drop and be gradually replaced
by streaming services such as Youtube or Spotify to name a few. The fact that
the popularity of the album in CD format fell significantly during those years,
thus making the album format less attractive to artists, could explain why the
number of albums released decreased. The second factor that could explain
this result might be the fact that some entries in the album database do not
have information about the release date, thus potentially biasing the result.
FIGURE 4.2: Number of new nodes per year - Montreux network
Concerning the live performance network, the general trend seems to be
the opposite of the one described for the album network. The number of new
Chapter 4. Results 28
nodes in the network seems to increase progressively with time. This phe-
nomenon seems to be explained by two factors, the first being the increas-
ing popularity of the festival over time allowing the organizers to increase the
number of invited artists and/or stages on the festival. The second one is that
the organizers of the Montreux Jazz Festival try to invite new artists every year
in order to renew their line-up, whereas artists releasing an album at regular
intervals will tend to collaborate with the same musicians.
Finally, from a more practical point of view, our calculations show that
the average number of musicians taking part in an album recording for the
Wikipedia network is 3.28 while the average number of musicians taking part
in a concert is 6.46 for the Montreux Jazz Festival network. It can be observed
that the number of participants on stage is higher than the number of musi-
cians in the studio. This could be explained by the fact that the number of
musicians who needed to record an album is lower than the one needed to
play it. Indeed, one could easily imagine musicians practicing several instru-
ments during the recording of an album while it would be impossible to do it
simultaneously during a concert.
4.1.2 Scale-free networks
A common feature of many collaborative networks, or social networks, is the
fact that they are scale-free. This is what we will check in this section. In the
studied networks, the degree distribution of the nodes seems indeed to follow
a power law distribution. Thus, there is a high occurrence of low degree nodes
and a low number of high degree nodes.
We can see on the figures 4.3a and 4.3b that this property is respected by
both networks.
Chapter 4. Results 29
(A) Degree distribution of Wikipedia
network
(B) Degree distributions of Montreux
network
FIGURE 4.3: Degree distributions
The gamma factor is used to determine if a network is scale-free, which is
calculated via the following equation:
P(k)ckγ
where P(k) is the frequency such that P(k) = nk
n,cis a proportionality constant
and kis the degree. P(k) can also be interpreted as the probability that a node
has klinks. It is generally accepted that a network is considered to be scale-free
if the γvalue is between 2 and 3.
The value of the γfactor found for the network of album collaborations is
1.682 and for the network of Montreux Jazz Festival collaborations, this value
is 1.827. This value of γwas found thanks to the curve fitting method Fit
of the powerlaw library. We notice that for both networks, the values of the
gamma factor are close to each other and not far from 2. If we add that to the
results obtained for their degree distribution, we can conclude that these two
networks have the expected characteristics for scale-free networks.
Chapter 4. Results 30
4.1.3 Preferential attachment
As explained in section 3.2.2, the study of preferential attachment induces an
evolution over time of the network, which is commonly called a dynamic net-
work.
The preferential_attachment method of the NetworkX library allows to com-
pute the preferential attachment score between two nodes uand v. This type
of function as well as adamic_adar_index or jaccard_coefficient are often used to
make predictions about the next links that nodes will make in the network.
Many similar methods are described in the paper The Link Prediction Problem
for Social Networks [26], written by Liben-Nowell and Kleinberg. The preferential
attachment score proposed by Newman and Barabasi is calculated according
to the following formula:
pscore(u,v) = |Γ(u)||Γ(v)|
where Γ(u)gives the set of neighbors of u. The higher this score is, the more
strongly the two nodes uand vwill be linked.
Thus, this formula allows us to calculate that over the entire period of time
studied extending from 1967 to 2020, the average score of preferential attach-
ment for the nodes is 64.91 for the collaboration network related to albums and
82.69 for that related to live performance. According to the definition given to
the preferential attachment score, on average, we notice that musicians of the
live collaboration network have a higher probability to start a new collabora-
tion than those of the album network.
The figures 4.4a and 4.4b show the evolution of the average and the maxi-
mum score of preferential attachment per year within the album networks.
Chapter 4. Results 31
(A) Maximal preferential attachment score per year - Album network
(B) Average preferential attachment score per year - Album network
FIGURE 4.4: Preferential attachment - Album network
For the album networks, we clearly see a spike in preferential attachment
around the year 1971. This can be explained by the fact that during this year,
our database lists 7 albums, including 4 on which many musicians collabo-
rated. This increase in the number of musicians can be verified in figure 4.1.
This result can show that, during this year, musicians in the album network
had a greatest chance to create new collaborations with other musicians.
Moreover, for networks from 2009 to present, we notice that the preferential
attachment is significantly lower than during the period from 1967 to 2009.
This is due to the smaller number of nodes in the network for this period,
which is also verified in 4.1. Note that the data concerning the year of the
albums’ release are not complete for this network, meaning that these results
should be taken with a certain amount of caution.
The figures 4.5a and 4.5b show the evolution of the average and the max-
imum score of preferential attachment per year within the live performance
network.
Chapter 4. Results 32
(A) Maximal preferential attachment score per year - Montreux network
(B) Average preferential attachment score per year - Montreux network
FIGURE 4.5: Preferential attachment - Montreux network
As in the case of the album network, we notice, for the live collaboration
network, preferential attachment spikes. These occur in the years 1991, 1996
and 2008. Thus, as in the previous case, we can interpret this by saying that
during these three years, musicians within the network had, on average, a
higher probability of attracting new collaborators. Like before, this seems to
be explained by the presence of a larger number of nodes in the network cor-
responding to these years, so a higher number of nodes inevitably favors the
chances to have a higher preferential attachment score. Indeed, we notice in
figure 4.2 that from 1991 to 2018, the number of nodes in the networks of these
years is on average higher than for the period before 1991. Note that here, the
data concerning the years of the concerts indicated in this network are exhaus-
tive.
4.1.4 Other parameters
In the following section, we will look at parameters that are somewhat out
of the ordinary compared to those usually studied in the network literature.
Chapter 4. Results 33
These are particularly interesting as they are able to draw conclusions on the
topology of the two networks studied in this paper.
Rich-club coefficient
The rich-club coefficient allows us to check whether vertices of high degree
tend to be strongly connected to each other. This coefficient is calculated via
the following equation:
ϕ(k) = 2E>k
N>k(N>k1)
where E>kis the number of links among the N>knodes of degree higher than
kand N>k(N>k1)is the maximum number of links between the N>knodes.
This coefficient is normalized using the rich-club coefficient of a random graph
of the same order as the one studied. The normalized indicator is therefore the
following:
ρran(k) = ϕ(k)
ϕran(k)
For this metric, if for certain values of kwe have ρran(k)>1, this denotes the
presence of the rich-club effect.
(A) Wikipedia network (B) Montreux network
FIGURE 4.6: Distribution of rich-club coefficient by degree
Chapter 4. Results 34
For both networks, the rich-club coefficient is calculated via the rich_club_
coefficient() method of the NetworkX library, the average value of this one for the
Wikipedia network being 0.37 and 0.581 Montreux Jazz Festival network. It can
be noticed that for nodes of degree kclose to k_max, the rich-club coefficient
is close to 1, which means that the nodes of high degree are well connected
to each other. In concrete terms, this result can be interpreted as follows: this
would mean that the network is robust and that removing a hub would not
affect the general connectivity of the network. For example, if we take the
network created by the inter-connection of websites between them, the fact of
deleting an extremely consulted page, a hub therefore, would not stop us from
joining sites that were linked to this hub through other sites.
Moreover, we can see that the distributions of the rich-club coefficients with
respect to the degrees of the nodes take the same general form for both net-
works. However, it seems that the evolution of the rich-club coefficient is
clearly more abrupt for the album network, from degree 58 to 60, whereas
it appears to be progressive for the other network.
We could transpose these results to the world of jazz as follows: if one
of the highly connected musicians, a rich node, present in the studied net-
works, passed away, most of the collaborations between musicians would not
be greatly affected. History has given us some famous examples with George
Duke or Toots Thielemans. Indeed, after their death, the Montreux Jazz Festival
network continued to evolve and the connectivity of the network did not de-
crease.
Modularity and community
Most social networks highlight the notion of community [17] bringing together
the different members, or nodes, of the network. As explained in section 2,
there exist different methods to detect communities within a network such as
the method of Clauset,Newman and Moore [18], the method of Pons and Latapy
Chapter 4. Results 35
[19], the method of Watike and Tsurumi [27] or the method of Louvain [28]. The
one used in this document is called the Louvain method created by V. D. Bondel.
This method has been chosen because it is easily implemented thanks to the
python-louvain library and especially because it seems to be the most efficient
method to date.
FIGURE 4.7: Modularity optimization comparison
This method allows to perform the partitioning of a network by optimizing
the modularity. The modularity is a value between 0.5 and 1 which mea-
sures the density of edges inside the communities compared to the density of
edges connecting the communities. The formula to calculate the density is the
following:
Q=1
2m
ij
[Aij kikj
2m]δ(ci,cj)
where Aij gives the weight of the edge between the nodes i and j, kiand kjare
the sum of the weights of the edges linked to nodes iand j, 2mis the sum of
all the weights of the edges of the graph, ciand cjare the communities and δ
is the following Kronecker delta function:
δ(x,y) =
1, if x=y
0, otherwise
The table 4.2 shows the data concerning the communities of the two networks.
The choice was made to use the visuals obtained by the Gephi tool rather than
Chapter 4. Results 36
those obtained at the beginning of this research which used the NetworkX li-
brary. Indeed, the Gephi tool allows a clear and clean visualization of the dif-
ferent communities, which is not always the case for NetworkX. Note that the
Gephi tool also uses the Louvain algorithm to detect communities.
Wikipedia network Montreux network
Modularity value 0.83 0.931
Number of communities 217 1,650
TABLE 4.2: Parameters related to communities
Figure 4.8 shows the graphs obtained for the detection of communities in
the two networks using the Gephi tool.
(A) Communities in the Wikipedia network (B) Communities in the Montreux network
FIGURE 4.8: Communities visualization for both networks
The Louvain method has enabled to put forward 217 different communi-
ties for the album network and 1, 650 communities for the live performance
network. These multiple communities are highlighted here with different col-
ors. The number of communities can vary because the Louvain [28] algorithm
is unstable. Indeed, the placement of nodes in the different communities de-
pends, among other things, on the evaluation order of the nodes [29].
Chapter 4. Results 37
As mentioned in section 4.1.1 and shown in figure 4.9, thanks to the differ-
ent colors of the communities, it is easier to notice that the number of discon-
nected components, i.e. isolated community, of the network is higher for the
collaboration network during the Montreux Jazz Festival than for the collabora-
tion network during the album recording.
FIGURE 4.9: Network of Montreux festival: Isolated community
As mentioned in section 2, it is often difficult to draw conclusions from the
communities highlighted by the algorithm without having additional informa-
tion available. However, regarding the collaboration networks between jazz
musicians, we notice that, for both networks, most of the communities are built
around a very connected node. However, as there are clusters of nodes poorly
connected to the main network, it appears that sets of nodes logically form
communities without the presence of highly connected nodes within them.
Moreover, since musicians making up these two networks come from many
different countries, it appears that there is a link between the communities and
the geographical origin of the musicians, for example one notices the presence
of several communities comprising exclusively French musicians. The same is
Chapter 4. Results 38
true for the United States of America, Brazil and most of the countries men-
tioned at section 4.2.4 of this document. When there are several communities
for one country, it seems that the difference between these communities is de-
termined by the age of the musicians and by the sub-genre of jazz they usually
play.
To give an example, the figure 4.10 shows the separation into communities
of the collaboration network during album recording on the criterion of the
musical sub-genre usually practiced by the musicians. Thus in green, with
Duke Ellington, we observe the community associated with swing; in pink, with
Barney Bigard and Louis Armstrong, we observe the community associated with
Dixieland1; and finally in blue, with Roy Eldridge and John Lewis, we observe
the community associated with big-band style.
FIGURE 4.10: Community in the network of Wikipedia albums:
This figure shows the separation in community around the
sub-genre of jazz.
1This sub-genre of jazz is also called traditional jazz. It essentially includes the music
produced in New Orleans at the beginning of the 20th century. This name refers to the music
developed originally by the Original Dixieland Jass Band.
Chapter 4. Results 39
Apart from the difference in the number of communities detected in the
two networks, these communities differ by the fact that those present in the
Montreux Jazz Festival network are generally more widespread. Indeed, they
include, in average, more nodes. This can be explained, as for the larger num-
ber of communities, by the simple fact that the Montreux Jazz Festival network
contains more nodes.
Clustering coefficient
As defined by Girvan and Newman [17], the clustering, or network transitivity,
is the property that two vertices, which are both neighbors of the same third
vertex, have a heightened probability of also being neighbors of one another.
The global clustering coefficient is defined as follows:
Ci=3number of triangles on the graph
number of connected triples of vertices
The coefficient Ciis the probability that two nodes are connected knowing that
they have a neighbor in common. In other words, this coefficient indicates
the probability that two musicians having collaborated with a musician on a
album have themselves collaborated on another album. Based on this coef-
ficient, it is possible to calculate the average clustering coefficient of the two
networks as follows:
C=1
n
vG
Ci
Another parameter strongly related to the one presented just before is the
transitivity value. This one allows to know the fraction of all possible triangles
present in the network. This value, T, can be calculated with the following
formula, where possible triangles are identified by the number of triads (i.e.
two edges with a shared vertex):
T=3number of triangles
the number of triads
Chapter 4. Results 40
The table 4.3 shows the values obtained for the transitivity ratio and the
clustering coefficient for both networks, the two most popular statistics that
measure the number of triangles in a network.
Wikipedia network Montreux network
Average clustering coefficient 0.745 0.917
Transitivity value 0.606 0.753
TABLE 4.3: Average clustering coefficient
Thanks to these two coefficients, we can see that the nodes of the live collab-
oration network have clearly more tendency to cluster together than the nodes
of the album collaboration network. However, both networks have a relatively
high average clustering coefficient, which is often the case for real-world net-
works, and in particular for social networks.
The figure 4.11 represents graphically the clustering coefficient obtained for
each node of the Wikipedia collaboration network while figure 4.12 represents
the results obtained for the Montreux Jazz Festival network.
FIGURE 4.11: Clustering coefficient for Wikipedia network
According to figure 4.11, it appears that most of the nodes (800) in the
album network have a clustering coefficient between 0.9 and 1.
Chapter 4. Results 41
FIGURE 4.12: Clustering coefficient for Montreux network
According to figure 4.12, it appears that most of the nodes (6,000) in the
album network have a clustering coefficient between 0.9 and 1.
Thus, from the definition of the clustering coefficient, it appears that more
than 90% of the nodes in both networks have a high probability of being con-
nected to another node knowing that they have a neighbor in common. Then,
we can conclude that both networks studied here are strongly aggregated.
4.2 Hubs analysis of both networks
4.2.1 Definition of hub
In the following section, we will analyze the hubs of the two networks pre-
sented earlier. To do this, it seems appropriate to start by defining what we
mean by hub. We define this term as a node of the network having a higher
number of connections (i.e. degree) than the average nodes. This type of node
is proper to scale-free networks and cannot be observed in random networks.
For the sake of simplicity and available information, here we will focus on the
50 most connected hubs. Indeed, the various characteristics that follow are dif-
ficult to collect for all the nodes of the two networks. In addition, the hubs are
interesting to study because they are central in the networks and they allow to
extract characteristics of most influential musicians in the jazz world.
Chapter 4. Results 42
4.2.2 Top hubs
Thus, based on their degree, the table 4.4 lists the 5 musicians who have done
most collaborations for each of the two networks2. For the sake of brevity, in
this subsections we will limit our analysis to the 3 most connected musicians.
However, all the statistics in the following subsections are performed on the
50 most connected nodes.
Wikipedia network Montreux network
1 Duke Ellington (152) George Duke (251)
2 Johnny Hodges (73) Toots Thielemans (209)
3 Johnny Griffin (60) Herbie Hancock (209)
4 Eric Dolphy (58) Quincy Jones (207)
5 Barney Bigard (56) Claude Nobs (198)
TABLE 4.4: Top 5 hubs and their degree for both networks
The figures 4.13 and 4.14 are for information purposes only, allowing to put
a face on the most important names mentioned hereafter.
(A) Duke Ellington (B) Johnny Hodges (C) Johnny Griffin
FIGURE 4.13: Top 3 hubs of album network
Regarding the album network, it is not surprising to find Duke Ellington (see
4.13a) as the most connected node. Indeed, he is an essential figure of jazz; he
2For more details, the list of the 200 top hubs of each of the two networks is available via
this link: GitHub: List of the top 200 hubs.
Chapter 4. Results 43
appears among the first results when we search for most famous jazz musicians
on Google. Moreover, his influence on the jazz scene through his numerous
collaborations is beyond doubt. This first result can explain the presence of
the second most connected node, namely Johnny Hodges (see 4.13b). Indeed,
the latter was a soloist (alto saxophone) in Duke Ellington’s big band. Thus, his
particular connection with the most connected node of the network allowed
him to acquire many common connections with this hub. This singular place
in relation to Duke Ellington seems to explain the gap between the degree of
the two musicians (79). Finally, the third most connected node in the album
network is Johnny Griffin (see 4.13c). This high ranking seems to be due to the
fact that he has released many albums during his career and has collaborated
with other famous names such as pianist Thelonious Monk, drummer Art Blakey
or tenor saxophonist Eddie "Lockjaw" Davis.
(A) George Duke (B) Toots Thielemans (C) Herbie Hancock
FIGURE 4.14: Top 3 hubs of festival network
Regarding the collaboration network during the Montreux Jazz Festival, the
musician who made the most collaboration is George Duke (see 4.14a) with a
degree of 251. As for Duke Ellington in the network of album, this musician is
one of the most prestigious in the world of jazz: he was nominated 7 times for
the Grammy awards, including 2 times where he won. In addition, he also has
a very extensive discography. It seems that these factors have allowed him to
perform many times with other musicians at the festival, bringing him to the
Chapter 4. Results 44
first place in terms of collaboration. Then, we find Toots Thielemans (see 4.14b)
and Herbie Hancock (see 4.14c) with a degree of 209. Although both are very
famous, which justifies their ranking in the number of collaborations at the
festival, the first one has the particularity to be also one of the rare musicians
of this importance to play the harmonica in addition to being the only Belgian
musician among the top hubs. Herbie Hancock on his side has the particularity
to have often played, not necessarily during the Montreux Jazz Festival, with
other famous jazz musicians such as Clark Terry,Miles Davis or Wayne Shorter.
4.2.3 Instrument
The figure 4.15a shows the most played instruments in the most connected
nodes for the album network.
We notice that the trumpet has largely the lead with 22.4% of the occur-
rences, followed by the saxophone and the piano with respectively 20.4% and
14.3% of the occurrences. It also appears that the two least represented instru-
ments in most connected nodes are the French horn and the vibraphone. The
figure 4.15b shows the most played instruments in most connected nodes for
the live performance network. For this network, we see that the most popular
instrument among the hubs is the voice with 22.4%, followed by the trumpet
with 12.4%, and finally the trombone and the saxophone with both 10.2%.
(A) Top hub instrument in album network (B) Top hub instrument in live network
FIGURE 4.15: Instruments of the top hubs
Chapter 4. Results 45
Since the instruments played by the musicians is a data only available for
the hubs of the album network, it is impossible to compare the results pre-
sented above with the percentage of musicians playing each instrument. How-
ever, we only have this data for the Montreux Jazz Festival network. Thus, fig-
ure 4.16 shows the distribution of instruments between all the nodes of this
network, not only the hubs. We notice that the most popular instruments for
the Montreux Jazz Festival are respectively the voice with 20.4%, the bass with
10.6%, the drums with 10.2%, the guitar with 10% and the trumpet with 9.2%.
Thus, we observe that, except for the voice, the most represented instruments
for all the nodes of this network are not the same as the most represented in-
struments for the hubs of the network. This is notably the case for the guitar,
the piano, the bass and the trombone. This last result leads us to think that the
instrument plays a significant role in the fact of being hubs of the network3.
FIGURE 4.16: All nodes instrument distribution in the Montreux
network
The instrumental compositions of a jazz band depends heavily on the style
3Unfortunately, this type of comparison relative to all the nodes of the network is only
possible for instruments of the Montreux Jazz Festival network. Thus, this type of analysis
will not be done in the following sections studying other parameters. But given the results
obtained here, we can assume that the characteristics studied are specific to the hubs and not
to all the nodes of the network.
Chapter 4. Results 46
of jazz played by the band and by the number of musicians. There are sev-
eral jazz ensemble compositions; the most common are duets, trios, quartets,
quintets, sextets, and beyond 12 musicians, we generally speak of big band
or orchestra. All these compositions can take very different forms. The most
common form of trio in jazz includes a pianist, a double bass player and a
drummer (e.g. The Bad Plus). A wind instrument such as a saxophone or a
trumpet is often added to this form of band.
Several hypotheses can explain the popularity of wind instruments and
voice among the hubs of the two networks. One of them is based on the fact
that they are rather leading instruments in the compositions of the groups.
Indeed, they tend to be added to a rhythmic pattern already established by the
other instruments, making them instruments that could be described as soloist
and inclined to the entertainment. The most obvious example in this group of
instruments is undoubtedly the saxophone. Indeed, invented in Belgium by
Adolphe Sax,Sidney Bechet popularized it in the 1920’s to the point of making it
indistinguishable from jazz.
Finally, it appears that there is a correlation between the fact that a node is
strongly connected and the fact that it plays a so-called solo instrument. How-
ever, it seems unlikely that the instrument is the only factor favoring collabora-
tions given that some soloistic musicians are not very connected, for example
Lester Young (saxophone) and Chet Baker (trumpet), who do not appear among
the hubs of the two networks.
4.2.4 Geographical information
The figures below show the percentage of high degree nodes in relation to their
birthplace (i.e. city and country). What stands out the most for figure 4.17a is
the predominance of the United States of America in this graph with 98% of
the 50 most connected musicians coming from this country. Concerning the
most connected musicians of the Montreux Jazz Festival network (see 4.17b), as
Chapter 4. Results 47
for the previous case, we note a predominance of the USA (42.9%) among their
countries of origin. Then come England and France with respectively 16.3%
and 14.3% of the musicians coming from these countries, and then Germany
and Switzerland with both 8.2%.
(A) Top hub country in album network (B) Top hub country in live network
FIGURE 4.17: Countries of the top hubs
It appears that New York has been the birthplace of 10.4% of most collab-
orative jazz musicians of the album network (see 4.18a). This city is followed
by Chicago with 8.3% of the musicians. Concerning the cities of origin of the
Montreux Jazz Festival network’s musicians (see 4.18b), we notice that the 4
most represented cities are all from the USA, namely Chicago (6.4%), Los An-
geles (4.3%), Detroit (4.3%) and San Raphael (2.1%). This result is consistent
with the result obtained in figure 4.17b.
(A) Top hub city in album network (B) Top hub city in live network
FIGURE 4.18: Cities of the top hubs
Chapter 4. Results 48
As explained in section 1.3, it is generally accepted that the geographical
origin of jazz is in the United States of America and more precisely in New
Orleans and Louisiana. This origin could explain the significant presence of
the United States in the countries of birth of the most connected nodes.
The predominance of New York and Chicago as birthplace of the most con-
nected musicians seems to be explained by the fact that most famous jazz clubs
were located in these cities. For instance, among others, the Savoy Ballroom and
the Cotton Club are both in Harlem.
Moreover, the presence of more European countries and cities for the Mon-
treux Jazz Festival can be explained both because this festival is held in Europe
but also because some countries such as France are also nests of jazz music. In-
deed, we can cite, among the most prolific jazzman at this festival names, such
as Louis-Herve Maton or Guillaume Dionnet. We can also underline the presence
of jazz club well known in France such as the Blue Note, the Tabou or the Club
Saint-Germain. The strong presence of English musicians among the most con-
nected nodes seems to be explained by the presence of pop/rock stars who
are often present at this festival, such as James Morrison and Mick Hucknall, the
singer of Simply Red.
4.2.5 Birthyear
It seems important to specify that the age of the musicians and the age of the
node in the networks are two distinct parameters. For example, a musician
might start collaborating at an older age, so the node representing him or her
will be recent and the age of that node will be low. However, the older a musi-
cian is, the more likely he or she is to join the network early in its construction
and thus have an older node as well.
This section will deal with the actual age of the musicians, rather than the
age of the nodes that represent these musicians.
Chapter 4. Results 49
(A) Top hub birthyear in album network (B) Top hub birthyear in live network
FIGURE 4.19: Birthyears of the top hubs
We notice that on average, the hubs of the Montreux Jazz Festival network
are younger (avg. 1951) than those of the album network from Wikipedia (avg.
1920). Several factors seem to explain the difference in average age between
the musicians of the two networks. First, since the Wikipedia pages used as a
source list the most influential jazz albums and these are often so-called his-
torical albums, it is very likely that they were released a long time ago. So
musicians are also older. Secondly, a factor that can explain why the musicians
of the Montreux Jazz Festival are younger is the fact that this festival, on the one
hand continues to be organized and therefore on the long term, musicians who
participate will inevitably be born later and later.
On the other hand, as said before, this festival welcomes musicians of all
styles, not only jazz for which the average age is 52 years old [30] but also other
styles like rap for which the average age in the Hot 100-Charting is 26.6 years
old [31].
4.2.6 Gender
Finally, one of the last criteria that seems interesting to analyze for hubs is their
gender. Since the author of this paper is not an expert in gender studies, this
Chapter 4. Results 50
section will be limited to highlighting the male/female ratio among the hubs
of the two networks.
(A) Top hub gender in album network (B) Top hub gender in live network
FIGURE 4.20: Genders of the top hubs
There is a major presence of men among the hubs of both networks. This
is nothing more than a reflection, though greatly improving with time, of the
weakest presence of woman in the universe of jazz. There are 42 men against 8
women within the Montreux Jazz Festival top hubs, whereas there is no female
hub in the album network. For this later network, this can be explained by the
age of these musicians and the era in which they performed, leaving little room
for women in music. As we have seen in section 4.2.5, on average, the hubs
of the Montreux Jazz Festival network are younger than those of the album net-
work. This suggests that mentalities seem to evolve over time and that women
tend to be more and more represented among influential jazz musicians.
Moreover, it is interesting to notice the absence of some great ladies of
jazz in the album network such as Ella Fitzgerald,Billie Holiday or Nina Si-
mone. This can be explained either by incomplete data or by the fact that the
Wikipedia pages serving as sources do not properly reference and credit the
various women in the jazz world. This absence can also possibly be explained
by the fact that the common element among the three women previously men-
tioned is the fact that they are all singers. Finally, for informative purposes,
the first woman who appears among the hubs of the album network is Maggie
Chapter 4. Results 51
Hyams, an American vibraphonist, located at the 96th place among the hubs
with a degree of 24.
4.3 Comparison of musician’s networks at festivals
In this section, the goal will be to compare the network of musicians at the
Montreux Jazz Festival to another festival in order to know if they have common
characteristics or not. To do this, a network has been created from the data
available online for the New Orleans festival. Figure 4.21a shows the network
obtained with these data.
(A) New Orleans festival network (B) Montreux festival network
FIGURE 4.21: Festival networks
The dataset used to build the network for the New Orleans Festival has a
quality score (see 3.1) equivalent to the one of the Montreux Jazz Festival, which
is 1. We can therefore compare these two networks on a solid foundation.
This new network has 11, 283 nodes, which is relatively close to the number
of nodes in the Montreux Jazz Festival network. However, these two networks
differ mainly by the maximum degree among the nodes which is 47 for the
Chapter 4. Results 52
New Orleans Festival network against 251 for the Montreux Jazz Festival net-
work. Thus, we notice that this new network is less connected. This conclu-
sion is confirmed by the number of weakly connected components which rises
to 7, 573 where it was only 1,136 for the Montreux Jazz Festival network. It is
also interesting to note that the average number of musicians present on stage
for the Montreux Jazz Festival is 6.46 while this number is only 1.44 for the New
Orleans Festival.
In addition to highlighting the fact that both networks are scale-free, the
figures 4.22 illustrate the difference that exists regarding the number of weakly
connected components.
(A) Degree distribution of New Orleans
Festival network
(B) Degree distributions of Montreux
network
FIGURE 4.22: Degree distributions
However, we notice that these two networks have a rather similar structure.
It seems that the networks of musicians collaborations during jazz festivals
generally adopt the same general form.
The next figure 4.23 compares the number of new nodes per year for the
two festival networks.
Chapter 4. Results 53
(A) New nodes per year - New Orleans
(B) New nodes per year - Montreux
FIGURE 4.23: Number of new nodes per year
Moreover, to support the hypothesis stated above, we notice on figure 4.23a
that the evolution of the number of new nodes as a function of time seems
to evolve similarly for the two festival networks. Indeed, they tend to grow
compared to the evolution followed by the album network illustrated in figure
4.1.
However, to confirm the hypothesis that the festival networks have the
same general form, it is necessary to have data on more than two festivals
and possibly to compare different types of festivals. Indeed, it is very likely
that an electronic music festival, where collaborations are rare, does not adopt
the same structure as the one presented here.
4.4 Analysis of meta-networks
In this section, we will analyze the meta-networks obtained for the instruments
of all the musicians of the Montreux Jazz Festival, for the countries of origin of
its the top hubs and for the years of album releases for the Wikipedia network
and the years of concert performances for Montreux Jazz Festival.
Chapter 4. Results 54
The interest of this type of network lies in the fact that they allow us to
highlight properties that are not obvious at first sight when we simply look at
the network or the statistics of the top hubs. Indeed, the latter establish a re-
lationship between the different instruments and countries, making it possible
to detect possible affinities between them.
As a reminder to the reader, these meta-networks are constructed as fol-
lows: if we take the musicians’ country of origin as a characteristic to build our
meta-network, the nodes will be the different existing countries. Two countries
A and B will then be linked if a musician born in country A has collaborated
with another musician born in country B. Thus the strength of the link between
countries A and B will be an indicator of the affinity that musicians from these
two countries have with each other.
4.4.1 Montreux instruments meta-network
The meta-network shown in figure 4.24 comprises a set of 25 instruments4. As
explained previously, each node of this network represents an instrument, so
all the musicians playing the same instrument are grouped in the same node,
the number of musicians playing the same instrument thus defines the size
of this node. Note that the stronger the link between two instruments is, the
more these instruments will be a common combination during the Montreux
Jazz Festival.
Note that since the album dataset does not have information about the in-
struments of the musicians, this meta-network could only be realized for the
Montreux Jazz Festival collaboration network.
4Note that for simplicity, the orchestra conductor has been considered as an instrument
here.
Chapter 4. Results 55
FIGURE 4.24: Instruments meta-network for Montreux festival
Thus we notice that in the specific case of this festival, the most common
association is combining the guitar with the voice. There is nothing surprising
in this result given the popularity of this type of association in the music world.
Then, with almost equal weight or affinity, the following group of instruments
are often associated together: the drums, the bass, the keyboard, the guitar
and the voice. This first group seems to represent the classic composition of
pop/rock bands. Among the bands using this composition we can mention
Genesis,Queen or Supertram5. To this first group of instruments it seems that
a second one is grafted, including the following instruments: the saxophone,
the piano, the trumpet, the trombone and the percussions. This second group
seems to be closer to the instruments traditionally associated with jazz music.
Moreover, for the remaining instruments, we note, among others, an affin-
ity between the violin and the clarinet and between the violin and the cello. In
addition, there is also a group of instruments that have almost no affinity with
5Although all these band taken as examples are British, there are obviously others such as,
among others, Lynyrd Skynyrd,The Beach Boys or Kraftwerk.
Chapter 4. Results 56
the others: the n’goni6, the harp, the bandoneon7, the accordion, the tuba, the
French horn, the haromica and the organ. This can be explained by the lesser
popularity of these instruments in general and particularly at Montreux Jazz
Festival. Finally, if we compare the results obtained here with those obtained
in section 4.2.3, we can deduce that, in spite of their great popularity among
the musicians of the festival, the guitar and the bass are not the most popular
among the top hubs of this same network.
4.4.2 Years meta-networks
In this section, the parameter for which we will study the affinity is the year
of release of the albums and the year of performance of the concerts. In this
network, each node represents a year, the size of the nodes depends on the
number of musicians having played that year. The links between these dif-
ferent nodes are established if two musicians belonging to these nodes have
played together during albums or concerts that took place in different years.
Thus, the stronger the link, the greater the relative affinity between the two
years. Note that given the way this one is constructed and given the concept
of overlapping years, this meta-network is less obvious to analyze than the one
presented in the previous section.
Figure 4.25 represents the meta-network described above for the album col-
laboration network.
6A traditional guitar of Mali.
7A kind of accordion popular in Uruguay and Argentina.
Chapter 4. Results 57
FIGURE 4.25: Years meta-network for album collaboration
network
The years with most musicians are 1961 with a degree of 518; 1960 with a
degree of 574; and finally the year 1956 with a degree of 648. We can clearly
see here the domination of the 60’s decade. Indeed, we notice a strong affinity
between multiple nodes of the years going from 1956 to 1963, that seem to
indicate that the musicians having participated in the realization of albums
during this period often tended to collaborate.
Moreover, it is interesting to note that the year 1976 is strongly linked to
the year 1956 and 2007 which are added to the period of time described earlier,
knowing that this last results haven’t yet been explained by the author.
Figure 4.26 represents the meta-network interconnecting the years for the
collaborative network during live performance.
Chapter 4. Results 58
FIGURE 4.26: Years meta-network for Montreux festival
The years where we count the most musicians are 1993 with a degree of 514;
1994 with a degree of 600; and finally 1991 with a degree of 666. We can clearly
see here the domination of the 90’s decade. Here, we observe two distinct
sets. Indeed, there is an affinity between the set of years from 1973 to 1995
and an affinity between the years that cover the period of time between 2001
and 2007. It would also seem that the year 2001 acts as a bridge between these
two sets. Thus these results seem to show that musicians who performed from
1973 to 1995 often tended to collaborate with each other and to participate in
each other’s concerts. The same is true for musicians who performed at the
festival in the first decade of the 2000s.
Chapter 4. Results 59
4.5 Analysis of hub’s meta-networks
4.5.1 Country
In this section, the same type of meta-network has been constructed but this
time on the basis of the country of the musicians’ origin rather than on the
basis of their instrument.
Note that, as said before, since this geographical data was not available for
all the musicians of the Montreux Jazz Festival network, this meta-network has
been built only from the top hubs (for which these data were available). Thus,
the results obtained should be interpreted with caution.
FIGURE 4.27: Instruments meta-network for Montreux festival
As for the results obtained in section 4.2.4, figure 4.27 allows us to highlight
the omnipresence of the USA among the most important nodes of the Montreux
Jazz Festival network. Indeed, we notice that the American top hubs collaborate
with all the other top hubs except those from Jamaica. Moreover, apart from
the existing relationship between Switzerland and England, the American top
Chapter 4. Results 60
hubs, in addition to intra-country collaboration, collaborate in an exclusive
way with the other countries. Finally, it can be noted that there seems to be
a stronger affinity between the USA and England and between the USA and
Switzerland than between the USA and other countries.
Note also that this type of meta-network for the top hubs of the album
network would not be very useful. Indeed, the results obtained in section 4.2.4,
indicate that there are only two countries among the top hubs of this network
(USA and Canada), so an affinity analysis would have limited interest.
61
Chapter 5
Discussion
In this section, the different technical limitations encountered during the de-
velopment of this research will be discussed, whether it is of technical or logis-
tic nature. In addition, we will also discuss the improvements that the author
would like to introduce to this document in the future.
5.1 Limitations
5.1.1 Datasets
As explained many times in this thesis, the main problem encountered was
the availability of online data to establish the datasets. Indeed, due to this lack
of information, it was only possible to collect the instruments associated to all
the musicians for the Montreux Jazz Festival network and not for the album net-
work. Thus it was not possible to compare the proportion of hubs playing an
instrument with the proportion of nodes playing an instrument for the album
network. The same is true for both networks for the other parameters studied
in section 4.2.2.
Moreover, information such as geographical data, birth years and instru-
ments had to be collected by hand for the top hubs, thus greatly limiting the
number of hubs considered for this study.
Chapter 5. Discussion 62
5.1.2 Labels
In the first version of this work, dealing only with the album network, a study
of the different labels from which the musicians were issued had been made.
This last point was abandoned for several reasons: firstly because the notion of
label for the live performance network added in this version of the document
does not make sense; secondly it is quite possible that a same musician has
worked with many different labels, thus making this parameter complex to
study correctly.
5.1.3 Styles of jazz
One of the important limitations to consider is, as music lovers will have rightly
noticed, that all the different styles of jazz have been included here under one
and the same set. Thus all the different variations of this music, explained in
section 1.3, were not taken into account when studying the top hubs. The only
reference to jazz style is the one made in section 4.8 with an assumption about
the grouping into communities according to the sub-genre of jazz practiced by
the musicians. This data could have told us if there exist a correlation between
the fact that a musician is a top hub and the fact that this musician plays a
particular style of jazz.
This data could not be collected because on the one hand it was not avail-
able in the different datasets used here, and on the other hand, for music, as
for other artistic domains, there are many cases of hybrid creation that are im-
possible to classify in a single style, making the task even more complex.
5.1.4 Time
Obviously, as in all research work, the biggest limitation is time. Indeed, we
would always like to do more and go further. Unfortunately one day we have
Chapter 5. Discussion 63
to write down what we have found. Thus the next section 5.2, summarizes the
different topics that the author would have loved to cover in this document.
5.2 Further improvement
5.2.1 Data collection
A possible solution to the problem of incomplete datasets presented in section
5.1.1, would have been to use an API of Spotify, or another streaming platform,
in order to have data about the musicians. Moreover, with this technique,
it would also have been possible to study parameters related to the average
tempo and the main key of the albums for example. Above all, if this API
allows it, it would completely have solved the problem of labels explained in
section 5.1.2. However, this would not have solved the problem of lack of data
for the live performance network.
Additionally, a future improvement that would greatly serve the reliability
of this research would be to collect data related to another album network in
order to compare it with the network of Wikipedia as it was done in the section
4.3 for the case of the Montreux Jazz Festival network which was compared to
the New Orleans live network.
Finally, a last point regarding data collection that could be re-evaluated in
the future, as a substitute for an API for Spotify, would be the possible use of
the DBpedia framework for the data collection of the album network. Indeed,
this tool, discovered too late by the author, would allow to avoid the multiple
webscrapping methods presented in section 3.1, generating a greater efficiency
and possibly limiting the errors related to the data collection. However, due to
the particular nature of the links between musicians it is not sure that this tool
will be of the best use in the future, but it remains a track to explore.
Chapter 5. Discussion 64
5.2.2 Confidence in the characteristics carried by the hubs
Although in section 4.2.3, the instrument characteristics of the hubs were com-
pared to those of the other nodes of the Montreux Jazz Festival network, thus
ensuring the uniqueness of these results, this type of comparison could not be
performed for the other characteristics of both networks.
For both networks, a possible solution to this problem would be to ran-
domly take nodes in the lower part of the first quartile and in the upper part
of the third quartile of the data. Then, to collect, by hand because we do not
have this information in the datasets, the characteristics of each of these nodes
in order to compare them with those of the hubs. This method would allow us
to make sure that the characteristics highlighted are specific to the hubs and
not common to all the nodes of the network.
5.2.3 Popularity of musicians
One parameter that seems to be important and that has not been taken into
account in this study is the popularity of the musicians towards the public.
Indeed, it would have been interesting to study the impact of the popularity of
a musician on his place in the network, i.e. if there is a correlation between the
fact that a musician is a hub and the fact that he is popular with the public.
The reason why this parameter has not been approached in this paper is the
following: this data is very complex to quantify. Of course it would be possi-
ble to use the number of plays (streams/views) on the streaming platforms of
a certain album of a musician to try to approach this value but it would remain
reductive. Indeed, by proceeding in such a way, we would neglect the popular-
ity of musicians during live performances. To do so, we would have to count
the number of spectators during each performance, a practically impossible
task given the inaccessibility of these data. Moreover, using this technique to
Chapter 5. Discussion 65
quantify the popularity of musicians, we would leave out a large number of
musicians who do not have any songs/albums on the streaming platforms.
Another existing popularity measure is Q-score [32]. The latter is often
calculated on the basis of groups sharing a common characteristic, such as
age, allowing for example to know how popular a celebrity is among a certain
segment of the population. However, this measure seems difficult to apply
here because it would require asking a large group about jazz musicians, which
is closer to sociology than to computer science.
Thus, in a possible continuation of this study, one of the important points
would be to find a way to quantify the popularity of a musician.
5.2.4 Evolution of racial segregation
An interesting analysis to be done in a future version of this document would
be to see if the racial segregation between 1912 and 1940 mentioned by Gleiser
and Danon [23] is still happening in the jazz world today. Unfortunately, this
comparison could not be made because the data concerning the racial origin
of the musicians could not be retrieved for all the nodes of both networks. The
intuition of the author is that this segregation is less pronounced nowadays
than it was then, but this has still to be demonstrated.
5.2.5 Meta-networks
In the ideal case where the API mentioned in the previous point could fill all
the missing data, it would be interesting to produce meta-networks for all the
musicians of the two networks for the other parameters studied in section 4.2.2,
i.e. the country of origin, the city of origin and possibly the labels. Indeed, up
to now, the study of affinities between instruments is only possible for the
Montreux Jazz Festival network and the study of the affinity between the other
parameters is currently only possible for the top hubs of the two networks.
66
Chapter 6
Conclusion
The final section presented here will summarize the different results obtained
in section 4. The purpose of this last part is to provide a general answer to the
second question exposed in section 1.2 which is How to better understand the
universe of jazz by means of its musicians networks?
6.1 Topology
First, we will look in detail at the conclusions that can be drawn from the topol-
ogy of the different networks studied.
6.1.1 Album and Live networks
An important observation is that both networks, the album and the concert
one, are scale-free. This is indeed an expected result for so-called social collab-
oration networks. This result has been documented many times in the litera-
ture on networks.
Furthermore, for the two collaborative networks, through the rich-club co-
efficient, it appears that the most connected nodes (i.e. hubs) of the networks
are strongly connected to each other, which could result in some robustness.
Thus, if we remove some hubs, the general connectivity of the networks would
not be significantly affected. This result can be understood as follows: if one of
Chapter 6. Conclusion 67
the highly connected musicians present in the studied networks passed away,
the majority of the collaborations between musicians would not be greatly af-
fected.
However, the two collaborative networks seem to differ on several points.
Firstly, we can analyze the evolution of the number of new nodes added to the
networks: indeed, we notice that for the album network, this number tends to
decrease since 2010, suggesting a correlation with the progressive death of the
CD format. This phenomenon has not been observed for the live performance
network. Secondly, the two networks are also distinguished by a large differ-
ence in the number of nodes, the live performance network being much more
populated than the album network. This difference has the direct consequence
of increasing the number of communities, the average and maximum degree
and the number of related components of the live performance network.
6.1.2 Comparison between live networks
Thanks to the results obtained in section 4.3, in which we compared two collab-
oration networks during live performances, we have shown that there seems
to be a similarity between these types of networks. Indeed, firstly, once again
these two networks are scale-free. Secondly, the evolution of the number of
new nodes entering the two networks over time is similar. This result sug-
gests that collaborative networks of jazz music festivals evolve in a similar
way. However, these two networks differ on the following points: the max-
imum degree, the number of weakly connected components and the average
number of musicians present on stage during a performance.
6.1.3 Communities
Regarding as for the communities, the two networks differ mainly on the num-
ber of them. Indeed, this result is due to the fact that the Montreux Jazz Festival
Chapter 6. Conclusion 68
collaboration network has many more nodes and weakly connected compo-
nents than the Wikipedia collaboration network.
However, it seems that for both networks, the communities are formed
around the same criteria. The intuition of the authors is that the communities
detected in this study are related to the following two parameters: the style of
jazz/music played by the nodes and their geographical origin.
6.2 Top hubs
Thanks to the multiple analyses carried out on the top hubs of the two net-
works, the following conclusions can be drawn from these particular struc-
tures.
6.2.1 Archetypal hub
First, it seems that the archetypal highly connected musician for the album
collaboration network is an American male from New York born around 1920.
He would also have a particular attraction for the trumpet or the saxophone.
Then, concerning the archetype of the very connected musician in a live col-
laboration network, this one has very strong chances to be an American man
born around 1951 and originating from Chicago. This top hub would proba-
bly be a singer and thanks to the instrument meta-network, we can see that he
has a strong chance to collaborate mainly with guitarists, themselves probably
American. It is also possible that his favorite instrument could be the trum-
pet, in which case it would be more likely that he collaborates mainly with
musicians playing the saxophone.
Thanks to the meta-network of years (see 4.25 and 4.26), we can hypothe-
size that there is a strong chance that the archetypal musician very connected
in the album network recorded the majority of his performances between 1956
Chapter 6. Conclusion 69
and 1963. Also, there is a strong chance that the hub archetype of the live col-
laboration network has played mainly between 1973 and 1995 or between 2001
and 2016.
6.2.2 Well known figures
It also seems that the majority of top hubs of both networks are already well
known figures in the jazz world (see 4.4). However, it is difficult to know if
they have attracted many collaborations over the years because of their posi-
tion as figureheads or if, on the contrary, they have become essential musicians
thanks to their multiple collaborations.
6.2.3 Evolution of mentalities
Furthermore, thanks to the gender distribution of the two networks’ hubs (see
4.2.6) and to their average age (see 4.2.5), there seems to be an evolution of the
place left to women in the world of jazz. Indeed, we notice that for the hubs of
the album network of Wikipedia, being older on average, no woman is present,
while for the hubs of the live performance network of Montreux Jazz Festival,
the number of women has increased by 16%.
6.2.4 Research question
Finally, to conclude this research, this last section will answer the first research
question introduced in section 1.2 which is What are the parameters favor-
ing the preferential attachment among jazz musicians within a collabora-
tive network? It seems that the parameters influencing the preferential attach-
ment of nodes, and thus the creation of important hubs within the network
are mainly: the instrument played by the musician, his country of origin, his
gender and possibly the reputation of the artist.
70
References
[1] M.E.J. Newman. “Scientific Collaboration Networks. II. Shortest Paths,
Weighted Networks, and Centrality”. In: Physical review. E, Statistical,
nonlinear, and soft matter physics 64 (Aug. 2001), p. 016132. DOI:10.1103/
PhysRevE.64.016132.
[2] Laurent Beauguitte and César Ducruet. “Scale-free, small-world networks
et géorgraphie”. In: Geography (June 17, 2011).
[3] Wikipedia. Histoire du jazz. 2014. URL:https : / / fr . wikipedia . org /
wiki/Histoire_du_jazz (visited on 01/30/2022).
[4] Google AI Blog. Explore the history of Pop and Punk, Jazz, and Folk with
the Music Timeline. 2014. URL:https:// ai.googleblog.com/2014/01 /
explore-history-of-pop-and-punk-jazz.html (visited on 01/30/2022).
[5] D. König. “Theorie der endlichen und unendlichen Graphen”. In: Math.
in Monogr. und Lehrb. XVI (1936).
[6] Jeffrey Travers and Stanley Milgram. “An Experimental Study of the
Small World Problem”. In: Sociometry 32 (Dec. 1969), pp. 425–443. DOI:
10.2307/2786545.
[7] Albert-László Barabási and Réka Albert. “Emergence of Scaling in Ran-
dom Networks”. In: Science 286.5439 (Oct. 1999), pp. 509–512. DOI:10.
1126 / science . 286 . 5439 . 509.URL:https : / / doi . org / 10 . 1126 %
2Fscience.286.5439.509.
References 71
[8] Réka Albert and Albert-László Barabási. “Statistical mechanics of com-
plex networks”. In: Reviews of Modern Physics 74.1 (Jan. 2002), pp. 47–97.
DOI:10.1103/revmodphys . 74 . 47.URL:https : // doi .org / 10. 1103%
2Frevmodphys.74.47.
[9] Gezhi Weng, Upinder S. Bhalla, and Ravi Iyengar. “Complexity in Bi-
ological Signaling Systems”. In: Science 284.5411 (1999), pp. 92–96. DOI:
10.1126/science.284.5411 . 92. eprint: https : / / www . science . org/
doi/pdf/10.1126/science.284.5411.92.URL:https://www.science.
org/doi/abs/10.1126/science.284.5411.92.
[10] Christof Koch and Gilles Laurent. Complexity and the Nervous System.
1999. DOI:10 . 1126 / science . 284 . 5411 . 96. eprint: https : / / www .
science.org/doi/pdf / 10 . 1126 / science . 284.5411.96.URL:https:
//www.science.org/doi/abs/10.1126/science.284.5411.96.
[11] Rita Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. “Diameter of
the World-Wide Web”. In: Nature 401 (Sept. 1999), pp. 130–131. DOI:10.
1038/43601.
[12] Ginestra Bianconi and Albert-Laszlo Barabasi. “Competition and mul-
tiscaling in evolving Networks”. In: EPL (Europhysics Letters) 54 (May
2001), p. 436. DOI:10.1209/epl/i2001-00260-6.
[13] H Jeong, Z Néda, and A. L Barabási. “Measuring preferential attach-
ment in evolving networks”. In: Europhysics Letters (EPL) 61.4 (Feb. 2003),
pp. 567–572. DOI:10.1209/epl/i2003-00166-9.URL:https://doi.org/
10.1209%2Fepl%2Fi2003-00166-9.
[14] E Ben-Naim and P L Krapivsky. “Stratification in the preferential attach-
ment network”. In: Journal of Physics A: Mathematical and Theoretical 42.47
(Nov. 2009), p. 475001. DOI:10 .1088 / 1751- 8113 / 42/ 47 /475001.URL:
https://doi.org/10.1088%2F1751-8113%2F42%2F47%2F475001.
References 72
[15] Alexandru Topirceanu, Mihai Udrescu, and Radu Marculescu. “Weighted
Betweenness Preferential Attachment: A New Mechanism Explaining
Social Network Formation and Evolution”. In: Scientific Reports 8 (July
2018). DOI:10.1038/s41598-018-29224-w.
[16] Sushmita Ruj and Arindam Pal. “Preferential Attachment Model with
Degree Bound and its Application to Key Predistribution in WSN”. In:
(2016). DOI:10. 48550 / ARXIV . 1604.00590.URL:https://arxiv. org /
abs/1604.00590.
[17] M. Girvan and M. E. J. Newman. “Community structure in social and
biological networks”. In: Proceedings of the National Academy of Sciences
99.12 (June 2002), pp. 7821–7826. DOI:10.1073/pnas.122653799.URL:
https://doi.org/10.1073%2Fpnas.122653799.
[18] Aaron Clauset, M Newman, and Cristopher Moore. “Finding commu-
nity structure in very large networks”. In: Physical review. E, Statistical,
nonlinear, and soft matter physics 70 (Jan. 2005), p. 066111. DOI:10.1103/
PhysRevE.70.066111.
[19] Pascal Pons and Matthieu Latapy. “Computing Communities in Large
Networks Using Random Walks”. In: Computer and Information Sciences -
ISCIS 2005. Ed. by pInar Yolum et al. Berlin, Heidelberg: Springer Berlin
Heidelberg, 2005, pp. 284–293. ISBN: 978-3-540-32085-2.
[20] Ken Wakita and Toshiyuki Tsurumi. “Finding Community Structure in
Mega-scale Social Networks”. In: vol. 105. Mar. 2007, pp. 1275–1276. DOI:
10.1145/1242572.1242805.
[21] Vincent Blondel et al. “Fast Unfolding of Communities in Large Net-
works”. In: Journal of Statistical Mechanics Theory and Experiment 2008
(Apr. 2008). DOI:10.1088/1742-5468/2008/10/P10008.
References 73
[22] Santo Fortunato and Marc Barthélemy. “Resolution limit in community
detection”. In: Proceedings of the National Academy of Sciences 104.1 (Jan.
2007), pp. 36–41. DOI:10.1073/ pnas .0605965104.URL:https:// doi .
org/10.1073%2Fpnas.0605965104.
[23] Pablo M. Gleiser and Leon Danon. “Community structure in jazz”. In:
Advances in Complex Systems 06.04 (Dec. 2003), pp. 565–573. DOI:10 .
1142/s0219525903001067.URL:https://doi.org/10.1142%2Fs0219525903001067.
[24] Ashish Vaswani et al. “Attention Is All You Need”. In: CoRR abs/1706.03762
(2017). arXiv: 1706.03762.URL:http://arxiv.org/abs/1706.03762.
[25] Edward Loper Bird Steven and Ewan Klein. “Natural Language Process-
ing with Python”. In: (2009).
[26] David Liben-Nowell and Jon Kleinberg. “The Link Prediction Problem
for Social Networks”. In: Proceedings of the Twelfth International Conference
on Information and Knowledge Management. CIKM ’03. New Orleans, LA,
USA: Association for Computing Machinery, 2003, pp. 556–559. ISBN:
1581137230. DOI:10.1145/956863.956972.URL:https://doi.org/10.
1145/956863.956972.
[27] Ken Wakita and Toshiyuki Tsurumi. “Finding Community Structure in
Mega-Scale Social Networks: [Extended Abstract]”. In: Proceedings of the
16th International Conference on World Wide Web. WWW ’07. Banff, Al-
berta, Canada: Association for Computing Machinery, 2007, pp. 1275–
1276. ISBN: 9781595936547. DOI:10.1145/1242572.1242805.URL:https:
//doi.org/10.1145/1242572.1242805.
[28] Pasquale De Meo et al. “Generalized Louvain method for community
detection in large networks”. In: 2011 11th International Conference on In-
telligent Systems Design and Applications (2011), pp. 88–93.
References 74
[29] Rachid Djerbi, Rabah Imache, and Mourad Amad. “Communities’ De-
tection in Social Networks: State of the art and perspectives”. In: 2018
International Symposium on Networks, Computers and Communications (IS-
NCC) (2018), pp. 1–6. DOI:10.1109/ISNCC.2018.8531055.
[30] Joan Jeffri. “Changing the Beat: A Study of the Worklife of Jazz Musi-
cians”. In: National endowment for the arts (2003).
[31] Brian Zisook. How Old is the Average Hot 100-Charting Rapper Right Now?
2018. URL:https://djbooth.net/features/2018-02-02- billboard-
rapper- age#:~:text=While%5C%20the%5C%20average%5C%20age% 5C%
20of, over%5C%20the%5C%20past%5C%20six%5C%20decades (visited on
01/30/2022).
[32] Wikipedia. Q Score. 2021. URL:https://en . wikipedia .org /wiki /Q _
Score# : ~ : text = The % 5C % 20Q % 5C % 20Score % 5C % 20(popularly % 5C %
20known, are%5C%20aware%5C%20of%5C%20the%5C%20subject. (visited
on 04/26/2022).
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The dynamics of social networks is a complex process, as there are many factors which contribute to the formation and evolution of social links. While certain real-world properties are captured by the degree-driven preferential attachment model, it still cannot fully explain social network dynamics. Indeed, important properties such as dynamic community formation, link weight evolution, or degree saturation cannot be completely and simultaneously described by state of the art models. In this paper, we explore the distribution of social network parameters and centralities and argue that node degree is not the main attractor of new social links. Consequently, as node betweenness proves to be paramount to attracting new links - as well as strengthening existing links -, we propose the new Weighted Betweenness Preferential Attachment (WBPA) model, which renders quantitatively robust results on realistic network metrics. Moreover, we support our WBPA model with a socio-psychological interpretation, that offers a deeper understanding of the mechanics behind social network dynamics.
Article
Full-text available
Preferential attachment models have been widely studied in complex networks, because they can explain the formation of many networks like social networks, citation networks, power grids, and biological networks, to name a few. Motivated by the application of key predistribution in wireless sensor networks (WSN), we initiate the study of preferential attachment with degree bound. Our paper has two important contributions to two different areas. The first is a contribution in the study of complex networks. We propose preferential attachment model with degree bound for the first time. In the normal preferential attachment model, the degree distribution follows a power law, with many nodes of low degree and a few nodes of high degree. In our scheme, the nodes can have a maximum degree $d_{\max}$, where $d_{\max}$ is an integer chosen according to the application. The second is in the security of wireless sensor networks. We propose a new key predistribution scheme based on the above model. The important features of this model are that the network is fully connected, it has fewer keys, has larger size of the giant component and lower average path length compared with traditional key predistribution schemes and comparable resilience to random node attacks. We argue that in many networks like key predistribution and Internet of Things, having nodes of very high degree will be a bottle-neck in communication. Thus, studying preferential attachment model with degree bound will open up new directions in the study of complex networks, and will have many applications in real world scenarios.
Article
Full-text available
Despite its increasing role in communication, the world wide web remains the least controlled medium: any individual or institution can create websites with unrestricted number of documents and links. While great efforts are made to map and characterize the Internet's infrastructure, little is known about the topology of the web. Here we take a first step to fill this gap: we use local connectivity measurements to construct a topological model of the world wide web, allowing us to explore and characterize its large scale properties. Comment: 5 pages, 1 figure, updated with most recent results on the size of the www
Article
Full-text available
The rate at which nodes in a network increase their connectivity depends on their fitness to compete for links. For example, in social networks some individuals acquire more social links than others, or on the www some webpages attract considerably more links than others. We find that this competition for links translates into multiscaling, i.e. a fitness-dependent dynamic exponent, allowing fitter nodes to overcome the more connected but less fit ones. Uncovering this fitter-gets-richer phenomenon can help us understand in quantitative terms the evolution of many competitive systems in nature and society.
Conference Paper
Full-text available
Community analysis algorithm proposed by Clauset, New- man, and Moore (CNM algorithm) finds community struc- ture in social networks. Unfortunately, CNM algorithm does not scale well and its use is practically limited to networks whose sizes are up to 500,000 nodes. We show that this inef- ficiency is caused from merging communities in unbalanced manner and that a simple heuristics that attempts to merge community structures in a balanced manner can dramati- cally improve community structure analysis. The proposed techniques are tested using data sets obtained from exist- ing social networking service that hosts 5.5 million users. We have tested three three variations of the heuristics. The fastest method processes a SNS friendship network with 1 millionusersin5 minutes(70 timesfasterthan CNM)and an- other friendship network with 4 million users in 35 minutes, respectively. Another one processes a network with 500,000 nodes in 50 minutes (7 times faster than CNM), finds com- munity structures that has improved modularity, and scales to a network with 5.5 million. Further detail is reported in (3).
Conference Paper
Nowadays, social networks have been widely used by different people for different purposes in the world. The discovering of communities and clustering is a widespread subject in the space of social networks analysis. Many interesting solutions have been proposed in the literature, such as the algorithms of Newman and Girvan. However, most solutions have common problems: the stability and the community structures quality. In this paper, we give a state of the art, from 1970 until 2017 on the approaches as well as the algorithms of communities’ detection in social networks. This study aims to gives a comparison between the different proposed architectures;
Article
Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mech-anisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
Article
This document from the fmr group (flow, matrices, networks) presents two models of graphs introduced by physicists at the end of the 90s: small-world and scale-free networks. Once principals and measures remembered, an examination regarding their influence on geography and regional science is proposed.
Article
Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn^2) and space O(n^2) in the worst case, and in time O(n^2log n) and space O(n^2) in most real-world cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.