ThesisPDF Available

Understanding the universe of jazz by means of its musicians networks

June 2022

June 2022

DOI:10.13140/RG.2.2.33757.49125

Thesis for: Master
Advisor: Hugues Bersini

Authors:

Julien Baudru

Université Libre de Bruxelles

In this paper, we will look in detail at networks formed by the collaboration of jazz musicians. In such networks, each node represents a musician and the edges between these nodes indicate whether they played together on an album or at a concert. Two such networks have been built, one for artist collaborations during the recording of an album, the other for live collaborations during a concert. We will compare the parameters and topologies of these networks, and then examine different meta-networks abstracted from the previous ones. This document is a master thesis which tries to answer the general following question: How to better understand the universe of jazz by means of its musicians networks?

Montreux network

…

Album network evolution sample

…

Example of meta-network construction

…

Number of new nodes per year -Montreux network

Figures - uploaded by Julien Baudru

Content may be subject to copyright.

Content uploaded by Julien Baudru

Content may be subject to copyright.

Acknowledgements

This document being the result of a work spread over a period of one and a

half years, it seems essential to me to thank all the people who helped me in

any way. First of all, thanks to my supervisor and promoter Hugues Bersini

for his advice, his knowledge of jazz and his follow-up during this period of

time. Then, thanks to Lluc Bono Rosello for his different comments and for the

introduction to Gephi. Thanks also to the reviewers and the organizers of the

NetSciX 2022 conference and to the IRIDIA lab for the trust and the wonderful

memories related to this trip to Porto. Thanks to the organizers of the Print-

emps des Sciences for allowing me to present this work in a vulgarized way to

younger students. Thanks also to Laurie Goffette for her feedback and advice

on the English language. And ﬁnally, thanks to my family and friends for their

daily support and motivational speeches.

UNIVERSITÉ LIBRE DE BRUXELLES

Abstract

Faculty of Sciences

Master in Computer Sciences

Understanding the universe of jazz by means of its musicians networks

by Julien Baudru

In this paper, we will look in detail at networks formed by the collabora-

tion of jazz musicians. In such networks, each node represents a musician and

the edges between these nodes indicate whether they played together on an

album or at a concert. Two such networks have been built, one for artist col-

laborations during the recording of an album, the other for live collaborations

during a concert. We will compare the parameters and topologies of these

networks, and then examine different meta-networks abstracted from the pre-

vious ones. This document is a master thesis which tries to answer the general

following question: How to better understand the universe of jazz by means

of its musicians networks?

iii

Contents

Abstract ii

List of Figures vi

List of Tables viii

List of Abbreviations ix

1 Introduction 1

1.1 Motivations .............................. 1

1.2 Researchquestions .......................... 2

1.3 A short history of jazz . . . . . . . . . . . . . . . . . . . . . . . . 3

2 State of the art 5

2.1 Networks................................ 5

2.2 Socialnetworks ............................ 5

2.3 Scale-freenetworks.......................... 6

2.4 Preferential attachment . . . . . . . . . . . . . . . . . . . . . . . . 7

2.5 Communities ............................. 9

2.6 Jazzcommunities........................... 10

2.7 Natural language processing (NLP) . . . . . . . . . . . . . . . . 11

3 Method 12

3.1 Datacollection............................. 13

3.1.1 Choice of dataset . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.2 Problematic .......................... 13

3.1.3 Process............................. 14

Collecting data for the album network using NLP . . . . 15

Collecting data for the live performance network . . . . 18

3.1.4 Information on datasets . . . . . . . . . . . . . . . . . . . 18

3.2 Networks construction . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.1 Staticnetwork......................... 20

3.2.2 Dynamic network . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.3 Meta-networks ........................ 23

3.3 Networks visualization . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Results 25

4.1 Technical analysis of both networks . . . . . . . . . . . . . . . . 25

4.1.1 General parameters and comparison . . . . . . . . . . . . 26

4.1.2 Scale-free networks . . . . . . . . . . . . . . . . . . . . . . 28

4.1.3 Preferential attachment . . . . . . . . . . . . . . . . . . . 30

4.1.4 Other parameters . . . . . . . . . . . . . . . . . . . . . . . 32

Rich-club coefﬁcient . . . . . . . . . . . . . . . . . . . . . 33

Modularity and community . . . . . . . . . . . . . . . . . 34

Clustering coefﬁcient . . . . . . . . . . . . . . . . . . . . . 39

4.2 Hubs analysis of both networks . . . . . . . . . . . . . . . . . . . 41

4.2.1 Deﬁnition of hub . . . . . . . . . . . . . . . . . . . . . . . 41

4.2.2 Tophubs............................ 42

4.2.3 Instrument........................... 44

4.2.4 Geographical information . . . . . . . . . . . . . . . . . . 46

4.2.5 Birthyear............................ 48

4.2.6 Gender............................. 49

4.3 Comparison of musician’s networks at festivals . . . . . . . . . . 51

4.4 Analysis of meta-networks . . . . . . . . . . . . . . . . . . . . . . 53

4.4.1 Montreux instruments meta-network . . . . . . . . . . . 54

4.4.2 Years meta-networks . . . . . . . . . . . . . . . . . . . . . 56

4.5 Analysis of hub’s meta-networks . . . . . . . . . . . . . . . . . . 59

4.5.1 Country ............................ 59

5 Discussion 61

5.1 Limitations............................... 61

5.1.1 Datasets ............................ 61

5.1.2 Labels ............................. 62

5.1.3 Stylesofjazz.......................... 62

5.1.4 Time .............................. 62

5.2 Further improvement . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.1 Datacollection ........................ 63

5.2.2 Conﬁdence in the characteristics carried by the hubs . . 64

5.2.3 Popularity of musicians . . . . . . . . . . . . . . . . . . . 64

5.2.4 Evolution of racial segregation . . . . . . . . . . . . . . . 65

5.2.5 Meta-networks ........................ 65

6 Conclusion 66

6.1 Topology................................ 66

6.1.1 Album and Live networks . . . . . . . . . . . . . . . . . . 66

6.1.2 Comparison between live networks . . . . . . . . . . . . 67

6.1.3 Communities ......................... 67

6.2 Tophubs ................................ 68

6.2.1 Archetypalhub........................ 68

6.2.2 Well known ﬁgures . . . . . . . . . . . . . . . . . . . . . . 69

6.2.3 Evolution of mentalities . . . . . . . . . . . . . . . . . . . 69

6.2.4 Research question . . . . . . . . . . . . . . . . . . . . . . 69

References 70

List of Figures

3.1 Generalprocess ............................ 15

3.2 Steps for getting the jazzman name . . . . . . . . . . . . . . . . . 16

3.3 Albumnetwork ............................ 21

3.4 Montreuxnetwork .......................... 21

3.5 Album network evolution sample . . . . . . . . . . . . . . . . . 22

3.6 Example of meta-network construction . . . . . . . . . . . . . . 23

4.1 Number of new nodes per year - Wikipedia network . . . . . . . 27

4.2 Number of new nodes per year - Montreux network . . . . . . . 27

4.3 Degree distributions . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.4 Preferential attachment - Album network . . . . . . . . . . . . . 31

4.5 Preferential attachment - Montreux network . . . . . . . . . . . 32

4.6 Distribution of rich-club coefﬁcient by degree . . . . . . . . . . . 33

4.7 Modularity optimization comparison . . . . . . . . . . . . . . . 35

4.8 Communities visualization for both networks . . . . . . . . . . . 36

4.9 Network of Montreux festival: Isolated community . . . . . . . 37

4.10 Community in the network of Wikipedia albums: This ﬁgure

shows the separation in community around the sub-genre of jazz. 38

4.11 Clustering coefﬁcient for Wikipedia network . . . . . . . . . . . 40

4.12 Clustering coefﬁcient for Montreux network . . . . . . . . . . . 41

4.13 Top 3 hubs of album network . . . . . . . . . . . . . . . . . . . . 42

4.14 Top 3 hubs of festival network . . . . . . . . . . . . . . . . . . . . 43

4.15 Instruments of the top hubs . . . . . . . . . . . . . . . . . . . . . 44

vii

4.16 All nodes instrument distribution in the Montreux network . . 45

4.17 Countries of the top hubs . . . . . . . . . . . . . . . . . . . . . . 47

4.18 Cities of the top hubs . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.19 Birthyears of the top hubs . . . . . . . . . . . . . . . . . . . . . . 49

4.20 Genders of the top hubs . . . . . . . . . . . . . . . . . . . . . . . 50

4.21Festivalnetworks........................... 51

4.22 Degree distributions . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.23 Number of new nodes per year . . . . . . . . . . . . . . . . . . . 53

4.24 Instruments meta-network for Montreux festival . . . . . . . . . 55

4.25 Years meta-network for album collaboration network . . . . . . 57

4.26 Years meta-network for Montreux festival . . . . . . . . . . . . . 58

4.27 Instruments meta-network for Montreux festival . . . . . . . . . 59

viii

List of Tables

3.1 Comparison of both dataset . . . . . . . . . . . . . . . . . . . . . 19

4.1 Generalparameters.......................... 26

4.2 Parameters related to communities . . . . . . . . . . . . . . . . . 36

4.3 Average clustering coefﬁcient . . . . . . . . . . . . . . . . . . . . 40

4.4 Top 5 hubs and their degree for both networks . . . . . . . . . . 42

List of Abbreviations

API Application Programming Interface

BA Barabási Albert (model)

CD Compact Disc

CN Collaboration Network

CNM Clauset Newman Moore (model)

GRU Gated Recurrent Unit

HTML Hyper Text Markup Language

IoT Internet ofThings

LSTM Long Short-Term Memory

NLP Natural Langage Prossing

NLTK Natural Langage Tookit

RNN Recurrent Neural Networks

WBPA Weighted Betweenness Preferential Attachment (model)

Chapter 1

Introduction

1.1 Motivations

There are many systems that take the form of networks, i.e. a set of nodes

connected to each other by edges. Among the systems most commonly studied

in the literature, we ﬁnd, among others, the network of hypertext links on the

World Wide Web, the network of scientiﬁc citation [1], the network of roads

between cities in a country [2], the networks related to biology or the very

famous network of ﬁlm actors. This document focuses in detail on the network

formed by jazz musicians. In this network, each node represents a musician

and the links between these nodes indicate whether they played together on

an album or at a concert. These lines will therefore focus on the study of a

collaborative networks (CN) as well as on the construction of these networks

and their associated parameters.

This document is the continuation and the conclusion of a research work

extending over a year and a half. Thus, reference will be made to the results

obtained previously in order to compare them with those recently obtained.

Also, over this period of time, I had the chance to brieﬂy present my thesis at

the NetSciX 2022 conference. During this conference, when some participants

saw my poster, they questioned the usefulness of this research, so I would like

to seize this opportunity to clarify several reasons why such a study might be

Chapter 1. Introduction 2

useful. First, the study of collaborative networks such as the ones studied here

can allow us to learn more about the relationships that humans involved in

these networks have. We will see that certain characteristics of these networks

often reﬂect historically-based realities. Then, if I had to ﬁnd a purely prac-

tical function to this study, I would believe that labels and record companies

could use the results to identify the most inﬂuential musicians and artists. This

would allow them to sign the most interesting artists from a ﬁnancial point of

view, keeping in mind that this study could be transposed to more mainstream

music styles. Moreover, it is not impossible that the results obtained in these

pages or the datasets created could be used as a basis for new studies by other

students or researchers. Finally, I would add, and this will be the only subjec-

tive point of this thesis, that the study of networks related to music and more

generally the study of music, remains one of the subjects that federates most

people. Since all time everybody is affected by this phenomenon that some

like to qualify as magic; the study of all the aspects of music, the compositions,

the rhythm, the dynamics, the tempo or even its networks in this case seem

essential.

1.2 Research questions

In this paper, two main questions will be addressed. The ﬁrst one will focus

on a particular property of networks, namely the preferential attachment; the

second one will focus on the human and contextual aspect of networks, a nec-

essary question when studying the so-called social networks like those studied

here.

The ﬁrst question this study will attempt to answer is the following: What

are the parameters favoring the preferential attachment among jazz musicians

within a collaborative network? In other words, this paper will try to ﬁnd out

what are the parameters that favor the fact that musicians who have already

Chapter 1. Introduction 3

consecutively collaborated are more likely to make collaborations with new

musicians entering the network.

The second underlying question raised in this paper could be formulated

as follows: How to better understand the universe of jazz by means of its mu-

sicians networks? The latter calls for a more general analysis of networks and

jazz in order to draw conclusions about the relationships between musicians

in jazz. We will see in the following pages that these two questions are closely

related.

1.3 A short history of jazz

In order to study networks composed of jazz musicians, it seems essential to

start with a brief introduction to the history of jazz in order to better under-

stand the challenges and the different socio-historical contexts in which these

musicians have evolved over the years.

Many sources place the emergence of jazz in the early 20th century in the

United States, particularly in the city of New Orleans. Thanks to the cultural

diversity and the African, French, Italian, Caribbean and Mexican population

of the city, the different styles of music played by these populations (ex. rag-

time, blues) gradually mixed and formed the basis of the traditional jazz as

we know it today [3]. Some sources explain the explosion of jazz at this time

by the fact that World War I had just ended, thus leading to a period of peace

and economic boom combined with the fact that the younger generation at

that time had a huge need for freedom and expression. Therefore, jazz quickly

spread across the country thanks to the various jazz clubs that have become

legendary nowadays such as the Blue Note or the Cotton Club.

Later, thanks to the growing popularity of the genre, jazz spread all around

the world giving birth to new sub-genres such as free funk, jazz fusion or cool

jazz. One of the essential components of jazz is the fact that initially, most

Chapter 1. Introduction 4

of the musicians were black people, some of the clubs where they played even

forbade entry to non-white people. This reﬂects the history of the United States

and the racial segregation that took place in this country between 1877 and

1964, i.e. during the birth of jazz. Today, it would seem that the popularity

of jazz is not as strong as it was back then [4], but its inﬂuence is still evident

in many of the new productions of the modern day. The number of people

who attend jazz festivals as well as the number of its practitioners remains

signiﬁcant.

Chapter 2

State of the art

In the following section, an overview of the current state of research in the

different ﬁelds that will be discussed is presented.

2.1 Networks

As explained in the introductory section, the main topic of this document is

networks. The exact mathematical term for these structures would be graph;

the creation of this concept is often attributed to the father of graph theory,

Euler, and his famous Königsberg bridge problem. Among the ﬁrst authors

to be interested in the topic, it is unthinkable not to mention Hamilton, who

introduced a classical problem of graph theory namely the Hamiltonian cir-

cuit. Furthermore, it seems important to note that the ﬁrst book dealing with

the subject is historically attributed to K˝onig with his work entitled Theorie der

endlichen und unendlichen Graphen [Theory of Finite and Inﬁnite Graphs] [5].

This publication served as a theoretical basis for all the scientists who will fol-

low him such as Erd˝os,Turán or the researchers cited later in this section.

2.2 Social networks

When we study networks in which the nodes are organizations or people and

the links between them represent a social interaction, we refer to of a social

Chapter 2. State of the art 6

network. This is the exact type of network that will be studied in this paper.

This notion seems to have been introduced by the anthropologist Barnes in

1954. Later Gluckman was the ﬁrst to use graph theory in social science studies.

One of the ﬁrst and best known theories on social networks is undoubtedly the

one by Milgram called the Small-world phenomenon [6]. With this experiment,

he managed to show that all people in a social network are six or fewer social

connections away from each other. This idea was ﬁrst theorized by Karinthy in

1929. However, this theory has been heavily questioned after its publication,

especially with the arrival of the scale-free model.

2.3 Scale-free networks

Regarding the advances in the general ﬁeld of network topology, the latest

major discoveries are the scale-free networks. This notion is very often asso-

ciated with the one of social network also called collaborative network (CN).

The scale-free characteristic, put forward by Barabási and Albert in the paper

Emergence of Scaling in Random Networks [7], have challenged the small-world

network models and have also allowed a more general understanding of net-

work topology.

Typically, scale-free networks are characterized by the fact that they have

a small number of extremely connected nodes and a large number of weakly

connected nodes, giving a power law degree distribution. The study of this

type of networks has offered the possibility to highlight properties common

to many networks in sometimes very distant research ﬁelds [8]. The most fre-

quently cited networks in the different ﬁelds of application are the following:

the network where the nodes are proteins and genes and where the edges rep-

resent the chemical interactions between them [9]; the one where the nodes are

nerve cells and the links are axons [10]; the one where the nodes are HTML

pages connected by links pointing to other pages [11]; or the one where the

Chapter 2. State of the art 7

nodes are scientists who have written papers that are linked according to the

citations in the other papers [1].

2.4 Preferential attachment

An inseparable concept of scale-free networks discussed earlier is the so-called

preferential attachment. Indeed preferential attachment and growth are two

main factors explaining the appearance of the scale-free property in the net-

works. Thus, the study of the preferential attachment seeks to determine what

factors inﬂuence the creation of new links during the dynamic evolution of a

social network.

The concept of preferential attachment was introduced by Barabasi and Al-

bert [7], who showed that nodes with higher degrees tend to attract new nodes,

and thus links, during the evolution of the network. Historically, it seems that

Udny Yule was one of the ﬁrst to put forward this phenomenon in order to ex-

plain the power law distribution, which is why it is also called the Yule process.

This process generates a so-called long-tailed distribution following a Pareto

distribution or a power law. The phenomenon of preferential attachment could

be roughly summarized by the sentence: The rich get richer. It is more globally

known as the Matthew effect or cumulative advantage process.

Since the discovery of scale-free networks, many methods have been sug-

gested to generate this type of network using the preferential attachment mech-

anism, the most famous model allowing to simulate the phenomenon of pref-

erential attachment being the one suggested in 1999 by Barabasi and Albert. In

their model (BA), each new node added to the network is connected to existing

nodes with a probability proportional to the number of links that the existing

nodes already have. Note that this ﬁrst model was based on the previous work

of the physicist Derek J. de Solla Price and his Price model.

Chapter 2. State of the art 8

Another model that should be mentioned is the one suggested by Barabasi

and Bianconi in their paper entitled Competition and multiscaling in evolving Net-

works [12]. In the latter, the authors recommended a new model, this time

based on the ﬁtness of nodes to compete for links. In addition, they demon-

strated that the competition for links between nodes results in what they call a

multiscaling, which is a dynamic ﬁtness-dependent exponent, allowing more

ﬁt nodes to overcome more connected but less ﬁt nodes.

In 2003, with a method to quantify preferential attachment in evolutionary

networks, Jeong,Néda and Barabási showed that this phenomenon was indeed

present in real networks [13]. In their studies on four networks, the scientiﬁc

citation, the internet, the actor collaboration and the science coauthorship net-

works, they found that the rate Π(k), with which a node with klinks acquires

new links, is a monotonically increasing function of keither linear, power-law,

or sublinear.

In 2009, based on the preferential attachment model (BA) put forward by

Barabasi and Albert,Ben-Naim and Krapivsky showed that in a preferential at-

tachment network, the degree distribution of the nodes depends on the depth.

Here, the depth is deﬁned as the distance between the node and its root [14].

Moreover, they showed that nodes closer to the root tended to have a larger

number of connections. They explained this phenomenon by the correlation

that exists between the depth of a node and its age, so that younger nodes,

which are further from the root because they arrived later, are the least con-

nected.

In a paper published in Nature [15], Topirceanu,Udrescu and Marculescu

showed that the degree of the node is not the main attractor of new social

links. They showed that the betweenness of the nodes and the strength of

the links play a crucial role in the preferential attachment and thanks to that,

they suggested a new model named Weighted Betweenness Preferential Attach-

ment (WBPA) model.

Chapter 2. State of the art 9

More recently, among the new models proposed, Ruj and Pal have been

the ﬁrst to put forward a preferential attachment model with degree bound

[16]. In this model, the maximum degree is upper bounded by a ﬁxed value

and according to the authors, this model is more suitable for IoT and cyber-

physical systems than the conventional preferential attachment model.

2.5 Communities

One of the ﬁrst algorithms to detect communities within a network was based

on centrality indices to ﬁnd the boundaries of different communities and was

suggested by Newman and Girvan [17]. In their article Community structure

in social and biological networks, the two authors provided a new method with

a high degree of success in identifying communities in real-world networks

whose community structure is already known.

Since the introduction of this ﬁrst algorithm, many new methods have been

proposed to detect communities, each with its own particularity. The most

often mentioned are the following ones:

- The method of Clauset,Newman and Moore (CNM) [18], the main inno-

vation of this method lies in the fact that it allows to treat very large

networks with a lower complexity than the other methods of the time.

- The method of Pons and Latapy [19], the latter uses the concept of random

walk which has the effect of making it efﬁcient and allowing it to capture

well the community structure.

- And ﬁnally, the method of Watike and Tsurmi [20], this last method devel-

oped in 2007 proposes an improvement of the CNM method allowing to

settle its inefﬁciency on large networks caused by merging communities

in unbalanced manner.

Chapter 2. State of the art 10

The method chosen for this project is the one proposed in 2008 by Blondel

[21] from the University of Louvain. This method is based on the concept of

modularity, which is a quality index for a partition of a network into communi-

ties. Note that the Leuven algorithm is the method with the current best global

modularity. However, regarding modularity, in their paper titled Resolution

limit in community detection [22], through the analysis of modularity and its

applicability to community detection, Fortunato and Barthélemy found that by

applying modularity optimization, possible network partitions are explored

at a coarse level, which may favor network partitions with groups of nodes

combined into larger communities.

It should be noted that community detection algorithms can be divided

into two families: the static and the dynamic ones. These two families can be

further subdivided into two where we ﬁnd the algorithms allowing the over-

lapping of communities, meaning a node can belong to more than one com-

munity, and the algorithms that do not allow it. In this document, concerning

the communities, we will only talk about static algorithms that do not allow

overlapping. However, regardless of the algorithm chosen, it is still difﬁcult to

interpret the communities without the help of additional information.

2.6 Jazz communities

Regarding the communities in collaborative networks of jazz players in partic-

ular, the existing research is not extensive. However Gleiser and Danon, in their

paper entitled Community structure in Jazz [23], have put forward the primor-

dial parameter of communities, i.e. the grouping of nodes into sets supposed

to share common characteristics. Thus, the two authors have highlighted the

presence of racial communities in the jazz world between 1912 and 1940. At

Chapter 2. State of the art 11

that time, most of the white musicians performed only with other white mu-

sicians and the same was true for black musicians. Moreover, based on a ge-

ographical parameter, they also highlighted the separation of the musicians

from this period into four communities, which are New York, Chicago, the last

two and other cities.

2.7 Natural language processing (NLP)

The ﬁeld of natural language processing, which will be discussed in section

3.1.3 on data collection, is a particular ﬁeld of machine learning that focuses

on the understanding of texts by algorithms in order to predict a possible clas-

siﬁcation, translation or many other purposes. Thus, the main goal of natural

language processing algorithms is to enable machines to understand and in-

terpret human speech and text.

The ﬁrst creation of modern NLP algorithms seems to date back to the end

of the 1980s thanks to a mix of linguistic and statistical methods. Today, there

are several models that have been considered for a while as being the most

performing. We can mention, among others, the models of Recurrent Neu-

ral Networks (RNN), Long Short-Term Memory (LSTM) and Gated Recurrent

Unit (GRU) networks in particular. In 2017, researchers from Google, in their

paper named Attention Is All You Need [24], suggested a model called the Trans-

former that currently seems to be the most successful in the ﬁeld of natural

language processing.

Finally, from a more practical point of view, it seems that the most popular

libraries to do NLP nowadays are the Transformers library using Jax,PyTorch

and TensorFlow and the Natural Language Toolkit (NLTK) library [25]. The latter

has been chosen for this study.

Chapter 3

Method

In order to answer the questions stated in point 1.2, the methodology is di-

vided into two main parts: the construction of datasets (3.1) and the construc-

tion of the two networks (3.2), one for the collaborations during album record-

ing and one for the collaborations during live performance at the Montreux Jazz

Festival. The ﬁrst part, the data collection, uses artiﬁcial intelligence algorithms

of NLP type and web scrapping to extract useful information from web pages

used as sources for one of both networks. Once these data are collected, the

static networks, dynamic networks and meta-networks can be built. This will

be discussed in the second part.

For information purposes, note that all the code for the multiple functions

used during this project was written in Python 3.8.8. This choice was made

for simplicity reasons inherent to the use of this language and the speed with

which it allows to prototype ideas1.

1All the python ﬁles are accessible via the GitHub directory of the author:

https://github.com/jbaudru/MasterThesis-JazzNetwork. Note however that many opti-

mizations can still be applied to the code, the emphasis has been put on obtaining analytical

results more than on algorithmic perfection.

Chapter 3. Method 13

3.1 Data collection

3.1.1 Choice of dataset

At the beginning, this work only included a collaboration network during al-

bum recording. Later, a network of collaboration during live performance was

added, allowing to compare the two networks and to highlight the similarities

and the differences.

The ﬁrst network, was built thanks to the information available on the

French version of Wikipedia, because of the way these pages are structured

compared to their English version. This facilitates the webscrapping that we

will see later. Concerning the second network, the one of collaboration be-

tween musicians during concerts, the choice was made to take the Montreux

Jazz Festival because it is one of the best known in the jazz world and espe-

cially because it has an open database facilitating the data collection.

3.1.2 Problematic

The main problem encountered in the development of this thesis was, as is

often the case in research, the collection of data. Indeed, there was no ready-

to-use database available listing the collaborations between jazz musicians, so

most of the time spent on this research consisted in creating sufﬁciently large

and reliable datasets to be able to draw conclusions.

A second problem underlying the ﬁrst one then appeared very early in the

project: since the ﬁrst dataset is built with information provided on Wikipedia,

which is a collaborative website, they may be subject to inaccuracies and gaps.

Another essential parameter to take into account is the fact that this dataset

is constituted via webscrapping and NLP methods detecting all the names of

musicians, or words looking like names, present on the Wikipedia page of an

album. Thus, although it is rarely the case, some articles quote musicians who

Chapter 3. Method 14

did not collaborate on the album; when they are quoted, it is often for compar-

ison or anecdotal purposes. This has a direct consequence: some links have

been made between musicians despite the fact that they did not collaborate in

real life.

In addition, another concern with collecting the data on Wikipedia is that

these sources only took into account albums and compilations. Compilations

are a real concern because musicians on a compilation do not actually play

together.

To solve these problems encountered in the work preceding this report, ﬁl-

ters on the names of musicians were applied to the data from Wikipedia and

compilations were not taken into account in the ﬁnal data. Thus, it gives an-

other reason why the other database has been added to the project. Indeed,

using this database of the Montreux Jazz Festival, we can be sure that the mu-

sicians really played together. However, it is important to note that not all

the concerts of this festival were of jazz style. We will see that this speciﬁcity

will be important in the following pages. Although the data collection for this

database uses webscraping, it does not use the natural language processing

algorithms.

3.1.3 Process

For both networks the general procedure remains the same with a few more

subtleties for the album collaboration network. In a ﬁrst step, we recover

the raw data (HTML ﬁle) via different webscrapping methods. Then, these

data are cleaned and ﬁnally, we build the networks from these data by using

NetworkX and some parsing functions. Figure 3.1 gives an overview of the

methodology used in the implementation of this approach.

Chapter 3. Method 15

FIGURE 3.1: General process

In addition to these two datasets associating the album/live with the list of

participating musicians, other data were collected during the different research

phases. These data included the instruments, the country, the birthdate and the

birthplace (see section 4.2.2).

Collecting data for the album network using NLP

As explained earlier, in order to build the collaborative network of jazz musi-

cians recording albums, data was collected from various Wikipedia pages. The

list below compiles the sources that allowed to build the dataset for the album

network:

1. Wikipedia: Liste des albums de jazz les plus vendus

2. Wikipedia: Album de jazz

3. Wikipedia: Album de jazz Américain

4. Wikipedia: Album de jazz français

5. Wikipedia: Album de jazz fusion

6. Wikipedia: Album de bossa nova

As brieﬂy explained above, the main function of the NLP algorithms in this

project is to locate the different names of the artists present on a album web

page. Before using the NLP methods, a webscrapping algorithm is set up in

two steps. First, based on a Wikipedia page listing albums from a certain coun-

try, style or other criteria, the algorithm will fetch all the links leading to the

pages of the different albums mentioned in this main page. Then, for each of

Chapter 3. Method 16

these links, so each album, the algorithm will fetch the HTML code of the page

pointed by this link. To summarize the data collection, the NLP algorithm

related to webscrapping is shown in ﬁgure 3.2.

FIGURE 3.2: Steps for getting the jazzman name

The NLP algorithm extracts from the HTML page the sets of words that are

recognized as being of type PERSON. This type is provided natively in the nltk

library. The common names that can be erroneously recognized as PERSON

by the NLP algorithm are ﬁltered using the following word lists provided by

nltk:

1s et ( n lt k . co r pu s . w or d s . wo rd s ( ’ en ’ ) )

2s et ( n lt k . co r pu s . w or d s . wo rd s ( ’ fr ’ ) )

Moreover, other ﬁlters are applied to the collected data. These ﬁlters avoid

counting producers, authors, journalists, labels and album names often cited

in the Wikipedia articles who are sometimes recognized as false positives by the

NLP algorithm. Depending on the category, each ﬁlter contains, for example, a

list of the most frequently cited jazz producers in a simple text ﬁle. Of course,

the professions of producer, lyricist and journalist are closely related to music

and are of considerable importance in the jazz scene. However, they do not ﬁt

into the question that this document tries to answer, so it was decided to leave

Chapter 3. Method 17

them out. Thus, the purpose of these ﬁlters is to reduce the noise present in

the collected data.

Thus we manage to recover the following information: the title of the al-

bum, the year of release of this album and the list of musicians collaborating

on this album. Note that the dates of the collaboration are used for the con-

struction of the dynamic network in section 3.2.2.

Another possible use of natural language processing algorithms would be

to recognize when several different names refer to the same musician. For

example, the pseudonym of the jazz composer George Gershwin is Jacob Ger-

showitz. We often notice the presence of pseudonyms for musicians too. Most

famous pseudonyms were managed by hand.

Moreover, another machine learning application initially present in this

project used the ethnicolr library based on TensorFlow. The purpose of this al-

gorithm was to add the ethnic origin of the musicians in the datasets in order

to compare the results obtained by Gleiser and Danon about the racial segrega-

tion between 1920 and 1940 [23] with the data used in this project. The aim was

also to eventually show an evolution of mentalities after 1940. The choice of

using machine learning for this task was made because data on musicians’ ori-

gins are rarely explicitly provided on the Wikipedia pages used as sources and

can therefore not be obtained via classical webscrapping. More importantly,

given the large number of musicians present in the datasets, this data collec-

tion seemed too tedious to be done by hand. To do so, the algorithm tried to

guess the origin based on musicians’ names. The resulting classes were: Non-

Hispanic Whites, Non-Hispanic Blacks, Asians, and Hispanics. But since this

library is based primarily on consensus from 2000 or 2010 in the United States

of America, the origins of jazz players prior to this period are often awkwardly

predicted. Moreover, as said before, a lot of jazz musicians use pseudonyms,

which deeply complicates the detection of the musicians’origin for the algo-

rithm.

Chapter 3. Method 18

The use of pseudonyms by jazz musicians also causes difﬁculties in detect-

ing the gender of musicians. Indeed, the gender-guesser library in charge of

this task is based on statistics from a list of 50, 000 names. In various tests, this

library has often identiﬁed the gender of most musicians as unknown.

Furthermore, race, ethnicity and gender are complex and controversial top-

ics, requiring knowledge that the author of this document does not possess.

For this reason, this part of the data collection has been abandoned.

Collecting data for the live performance network

Regarding the data collection for the Montreux Jazz Festival, it was much easier

than for the case presented above. Indeed, contrary to the previous case, the

site used as a source is structured in such a manner that it allows the user to

ﬁnd out all the concerts that took place during a chosen year. You can access

the site via this link: Montreux concerts database. Thus, a simple webscrap-

ping aglorithm allowing to recover the data for each year was developed, and,

as for the previous case, the following information could be collected: the title

of the album, the year of release of this album and the list of musicians collab-

orating on this album. In addition to this information, the instruments played

by each musicians during the concert were also recorded.

3.1.4 Information on datasets

The dataset for the album network is composed of three columns: the name of

the album, the release year of the album and the list of musicians present on

the album. The dataset for the live performance network is composed of the

four following columns: the name of the band, the year of the performance,

and the list of musicians present on the stage and their instruments.

Chapter 3. Method 19

Wikipedia dataset Montreux dataset

Number of entries 1,038 albums 4,589 performance

Time interval From 1928 to 2020 From 1967 to 2020

Quality score 0.89 1

TABLE 3.1: Comparison of both dataset

It is interesting to note that despite its smaller number of cells2,Wikipedia-

based dataset covers a larger time period than the lives performance dataset.

This difference can be explained by similar arguments to those discussed in a

further section 4.2.5.

Next, in order to approximate the level of conﬁdence that can be given to

these two datasets, the author deﬁnes a very simple quality factor for each

of them. The higher this factor is, the more trustworthy the dataset is. This

quality score3for a dataset Xis deﬁned as follows:

Quality ScoreX=1− {Total number of empty cell in X

Total number of cell in X }

According to table 3.1, we notice that ≈11% of the data is missing for the

album network and that the live performance data set is 100% complete and

reliable. This ≈11% difference can be explained by the fact that in the early

stages of this work a part of the data (≈120 cells) of the album dataset was

completely entered by hand and that errors may have slipped into the data.

The most frequently missing entry for this dataset seems to be the year of re-

lease of the albums. To overcome this problem, it would have been possible

to ﬁll in these missing data either by the average of the years of release or by

2Here, we deﬁne a cell as a simple ﬁeld associated with a column in a row, this deﬁnition

is the same adopted by the classic Excel spreadsheets.

3This formula can be improved by adding a variable including the number of duplicated

entries but there are no such entries in the present data sets.

Chapter 3. Method 20

the mode, i.e. the year appearing most often in the data. However, it was de-

cided to keep the data unchanged in order to be as close as possible to reality.

An other solution to the problem of numerous empty inputs in the dataset has

been presented in section 5.2.

Finally, in addition to the quality score, the data collection will always be

considered as incomplete given the huge volume of albums and live perfor-

mances that are performed each year and the many albums that are not cited

by the various websites from which the data come. Like many social networks,

the collaborative networks of jazz musicians are growing networks; each year,

new data are added to the different Wikipedia pages mentioned above and ev-

ery year, the Montreux Jazz Festival updates its database according to the con-

certs that take place there.

3.2 Networks construction

This section will show how the two collaboration networks were built, namely

the collaboration network during album recording, based on Wikipedia data,

and the collaboration network during live performance, based on Montreux

Jazz Festival data. Finally, it will also discuss how the different meta-networks

have been created.

3.2.1 Static network

In the studied networks, each musician is represented by a node and every

link between two nodes has a weight. This weight is simply determined by the

number of times two musicians have collaborated during a recording or a live

performance. In addition, the size of a node is determined by the degree of that

node, so the more a musician collaborates, the bigger the node representing

him will be in the networks. Note that the networks constructed here do not

have any directed edges.

Chapter 3. Method 21

FIGURE 3.3: Album network

The ﬁgure 3.3 shows the network obtained for the data collected on Wikipedia.

We already notice that one node seems much more important than the others;

we will analyze this in detail later in the section 4.2.2.

FIGURE 3.4: Montreux network

The ﬁgure 3.4 represents the network obtained for the Montreux Jazz Festi-

val. The experienced eye of the reader will directly notice the striking differ-

ence between the topology of this network and the one presented just before;

we will also see this point in detail in the section 4.1.1 of this thesis.

Chapter 3. Method 22

3.2.2 Dynamic network

In order to study the preferential attachment phenomena that occurs in the

networks, dynamic networks were developed for both networks. Indeed, the

study of the preferential attachment requires adding a time variable to the net-

works in order to see the evolution of the links between the nodes constitut-

ing it. To do this, initial the choice to use the library DyNetx was made and

then replaced by the visualization tool Gephi, allowing to realize this type of

task with less effort. Thus each link contains information about the time at

which this collaborations appeared in the network. In the context presented

here, this temporal information is the album release date or the date of the

live performance. Note that the dates used to describe the links between the

different nodes are obtained thanks to the webscrapping method presented in

section 3.1.3. As a consequence, some nodes are not present in the dynamic

network because no information was available regarding the release date of

the album/performance on the web pages used as sources.

FIGURE 3.5: Album network evolution sample

In practice, to allow a better experience for the reader and a better visual-

ization of the networks evolution, a video for each of the networks has been

made. You can access them by clicking here:

•Video of the evolution of the album network (YouTube)

•Video of the evolution of the live network (YouTube)

Chapter 3. Method 23

3.2.3 Meta-networks

The term meta-network used here in this paper can be seen as a network of

a network. The idea behind this concept is to build a new network based on

common characteristics of the nodes composing a ﬁrst network. The charac-

teristics treated in these pages will concern the instruments of the musicians

for the all the nodes of the festival network and the geographical origin of the

musicians for the hubs of the festival network. Finally, the last characteristic

taken into account will be the years of album releases for the Wikipedia network

and the years of concert performances for Montreux Jazz Festival network.

To provide a better understanding of how these types of networks are built,

we can take the following example: if we take the instrument played by the

musicians as characteristics to build our meta-network, the nodes will be the

different existing instruments, and two instruments Aand Bwill be linked to-

gether if a musician playing an instrument Ahas collaborated with a musician

playing an instrument B.

The ﬁgure 3.6 gives an example of this type of meta-network construction

with few nodes.

FIGURE 3.6: Example of meta-network construction

Note that, except for the meta-network of the year, the meta-networks as

Chapter 3. Method 24

described here have been built only for the Montreux Jazz Festival network for

which we have information on the instruments of the musicians and for the

most important nodes of this same network for which we have information on

the country of origin.

3.3 Networks visualization

The visual representation of the networks was improved during the different

steps of this research. Initially, this project only used the Networkx library for

the creation and visualization of the network. However due to the well-known

speed problems of Python and the large size of the collaborative network at

the time of concert, this library was almost unusable for a correct visualiza-

tion. Some libraries such as neo4j,graphistry,Cytoscape3 or BioFabric seem to be

suitable solutions for a more efﬁcient network construction, even if given the

relatively small number of nodes in the networks studied here, it is not a pri-

ority. However, for the visualization of the networks, the choice was made to

switch to the open-source software Gephi, giving the user a greater freedom of

movement in the network, a much clearer view and a set of very useful tools.

Chapter 4

Results

In this section, the different results obtained regarding the topologies of the

two studied networks will be presented as well as the results obtained concern-

ing the hubs of these two networks. The results obtained following the com-

parison of the live performance network with a new similar network, namely

another collaborative network during a jazz festival, will also be presented.

Finally, the results obtained for the different meta-networks created will be

discussed.

The approach used to present these results will be as follows: for each point,

the metrics used will be explained, then these metrics will be put in perspective

to the subject studied, namely the world of jazz, and ﬁnally the different values

obtained will be discussed.

4.1 Technical analysis of both networks

The different parameters studied in this section provide an overview of the net-

works. These parameters are those usually studied in other documents dealing

with collaborative networks, such as the rich-club and clustering coefﬁcients,

the gamma factor, and the notion of communities.

Chapter 4. Results 26

4.1.1 General parameters and comparison

Table 4.1 shows the different parameters typically studied in the exploration of

networks.

Wikipedia network Montreux network

Number of nodes 1,540 14,090

Maximum degree 153 251

Average degree 30.32 65.39

Diameter 12 15

Density 0.006 0.001

Modularity 0.83 0.931

Num. of weakly connected nodes 196 1,136

TABLE 4.1: General parameters

Several references to these parameters will be made in the following sec-

tions, but it is important to note here that the live performance network is

much larger in terms of size, i.e. it has many more nodes than the album net-

work, respectively, 14, 090 versus 1,540.

Another parameter that is consistent with this simple observation is the

diameter of the networks. The diameter is a measure to calculate the length of

the longest path between any two vertices of a graph. This parameter, denoted

δ, is calculated with the following formula, where s(i,j)is the number of edges

in the shortest path from vertex ito vertex j:

δ=maxi,js(i,j)

Thus we observe that the diameter of the collaboration network during live

performances is higher than the one of collaboration during album recording.

In addition, in order to study how these two networks evolve, the nodes

should be coupled with a time factor; an interesting parameter to study is thus

Chapter 4. Results 27

the number of new nodes as a function of time. The ﬁgures 4.1 and 4.2 repre-

sent the number of new nodes entering the networks each year.

FIGURE 4.1: Number of new nodes per year - Wikipedia

network

In general, we notice a big increase in the number of nodes in the album net-

work starting in 1955. After that, we observe a decrease in the number of new

nodes as time passes. This can be explained by two potential factors, the ﬁrst

being the fact that, since around 2000 to 2010, the compact disc (CD) format,

often associated with albums, has seen its sales drop and be gradually replaced

by streaming services such as Youtube or Spotify to name a few. The fact that

the popularity of the album in CD format fell signiﬁcantly during those years,

thus making the album format less attractive to artists, could explain why the

number of albums released decreased. The second factor that could explain

this result might be the fact that some entries in the album database do not

have information about the release date, thus potentially biasing the result.

FIGURE 4.2: Number of new nodes per year - Montreux network

Concerning the live performance network, the general trend seems to be

the opposite of the one described for the album network. The number of new

Chapter 4. Results 28

nodes in the network seems to increase progressively with time. This phe-

nomenon seems to be explained by two factors, the ﬁrst being the increas-

ing popularity of the festival over time allowing the organizers to increase the

number of invited artists and/or stages on the festival. The second one is that

the organizers of the Montreux Jazz Festival try to invite new artists every year

in order to renew their line-up, whereas artists releasing an album at regular

intervals will tend to collaborate with the same musicians.

Finally, from a more practical point of view, our calculations show that

the average number of musicians taking part in an album recording for the

Wikipedia network is 3.28 while the average number of musicians taking part

in a concert is 6.46 for the Montreux Jazz Festival network. It can be observed

that the number of participants on stage is higher than the number of musi-

cians in the studio. This could be explained by the fact that the number of

musicians who needed to record an album is lower than the one needed to

play it. Indeed, one could easily imagine musicians practicing several instru-

ments during the recording of an album while it would be impossible to do it

simultaneously during a concert.

4.1.2 Scale-free networks

A common feature of many collaborative networks, or social networks, is the

fact that they are scale-free. This is what we will check in this section. In the

studied networks, the degree distribution of the nodes seems indeed to follow

a power law distribution. Thus, there is a high occurrence of low degree nodes

and a low number of high degree nodes.

We can see on the ﬁgures 4.3a and 4.3b that this property is respected by

both networks.

Chapter 4. Results 29

(A) Degree distribution of Wikipedia

network

(B) Degree distributions of Montreux

network

FIGURE 4.3: Degree distributions

The gamma factor is used to determine if a network is scale-free, which is

calculated via the following equation:

P(k)∼ck−γ

where P(k) is the frequency such that P(k) = nk

n,cis a proportionality constant

and kis the degree. P(k) can also be interpreted as the probability that a node

has klinks. It is generally accepted that a network is considered to be scale-free

if the γvalue is between 2 and 3.

The value of the γfactor found for the network of album collaborations is

1.682 and for the network of Montreux Jazz Festival collaborations, this value

is 1.827. This value of γwas found thanks to the curve ﬁtting method Fit

of the powerlaw library. We notice that for both networks, the values of the

gamma factor are close to each other and not far from 2. If we add that to the

results obtained for their degree distribution, we can conclude that these two

networks have the expected characteristics for scale-free networks.

Chapter 4. Results 30

4.1.3 Preferential attachment

As explained in section 3.2.2, the study of preferential attachment induces an

evolution over time of the network, which is commonly called a dynamic net-

work.

The preferential_attachment method of the NetworkX library allows to com-

pute the preferential attachment score between two nodes uand v. This type

of function as well as adamic_adar_index or jaccard_coefﬁcient are often used to

make predictions about the next links that nodes will make in the network.

Many similar methods are described in the paper The Link Prediction Problem

for Social Networks [26], written by Liben-Nowell and Kleinberg. The preferential

attachment score proposed by Newman and Barabasi is calculated according

to the following formula:

pscore(u,v) = |Γ(u)||Γ(v)|

where Γ(u)gives the set of neighbors of u. The higher this score is, the more

strongly the two nodes uand vwill be linked.

Thus, this formula allows us to calculate that over the entire period of time

studied extending from 1967 to 2020, the average score of preferential attach-

ment for the nodes is 64.91 for the collaboration network related to albums and

82.69 for that related to live performance. According to the deﬁnition given to

the preferential attachment score, on average, we notice that musicians of the

live collaboration network have a higher probability to start a new collabora-

tion than those of the album network.

The ﬁgures 4.4a and 4.4b show the evolution of the average and the maxi-

mum score of preferential attachment per year within the album networks.

Chapter 4. Results 31

(A) Maximal preferential attachment score per year - Album network

(B) Average preferential attachment score per year - Album network

FIGURE 4.4: Preferential attachment - Album network

For the album networks, we clearly see a spike in preferential attachment

around the year 1971. This can be explained by the fact that during this year,

our database lists 7 albums, including 4 on which many musicians collabo-

rated. This increase in the number of musicians can be veriﬁed in ﬁgure 4.1.

This result can show that, during this year, musicians in the album network

had a greatest chance to create new collaborations with other musicians.

Moreover, for networks from 2009 to present, we notice that the preferential

attachment is signiﬁcantly lower than during the period from 1967 to 2009.

This is due to the smaller number of nodes in the network for this period,

which is also veriﬁed in 4.1. Note that the data concerning the year of the

albums’ release are not complete for this network, meaning that these results

should be taken with a certain amount of caution.

The ﬁgures 4.5a and 4.5b show the evolution of the average and the max-

imum score of preferential attachment per year within the live performance

network.

Chapter 4. Results 32

(A) Maximal preferential attachment score per year - Montreux network

(B) Average preferential attachment score per year - Montreux network

FIGURE 4.5: Preferential attachment - Montreux network

As in the case of the album network, we notice, for the live collaboration

network, preferential attachment spikes. These occur in the years 1991, 1996

and 2008. Thus, as in the previous case, we can interpret this by saying that

during these three years, musicians within the network had, on average, a

higher probability of attracting new collaborators. Like before, this seems to

be explained by the presence of a larger number of nodes in the network cor-

responding to these years, so a higher number of nodes inevitably favors the

chances to have a higher preferential attachment score. Indeed, we notice in

ﬁgure 4.2 that from 1991 to 2018, the number of nodes in the networks of these

years is on average higher than for the period before 1991. Note that here, the

data concerning the years of the concerts indicated in this network are exhaus-

tive.

4.1.4 Other parameters

In the following section, we will look at parameters that are somewhat out

of the ordinary compared to those usually studied in the network literature.

Chapter 4. Results 33

These are particularly interesting as they are able to draw conclusions on the

topology of the two networks studied in this paper.

Rich-club coefﬁcient

The rich-club coefﬁcient allows us to check whether vertices of high degree

tend to be strongly connected to each other. This coefﬁcient is calculated via

the following equation:

ϕ(k) = 2E>k

N>k(N>k−1)

where E>kis the number of links among the N>knodes of degree higher than

kand N>k(N>k−1)is the maximum number of links between the N>knodes.

This coefﬁcient is normalized using the rich-club coefﬁcient of a random graph

of the same order as the one studied. The normalized indicator is therefore the

following:

ρran(k) = ϕ(k)

ϕran(k)

For this metric, if for certain values of kwe have ρran(k)>1, this denotes the

presence of the rich-club effect.

(A) Wikipedia network (B) Montreux network

FIGURE 4.6: Distribution of rich-club coefﬁcient by degree

Chapter 4. Results 34

For both networks, the rich-club coefﬁcient is calculated via the rich_club_

coefﬁcient() method of the NetworkX library, the average value of this one for the

Wikipedia network being 0.37 and 0.581 Montreux Jazz Festival network. It can

be noticed that for nodes of degree kclose to k_max, the rich-club coefﬁcient

is close to 1, which means that the nodes of high degree are well connected

to each other. In concrete terms, this result can be interpreted as follows: this

would mean that the network is robust and that removing a hub would not

affect the general connectivity of the network. For example, if we take the

network created by the inter-connection of websites between them, the fact of

deleting an extremely consulted page, a hub therefore, would not stop us from

joining sites that were linked to this hub through other sites.

Moreover, we can see that the distributions of the rich-club coefﬁcients with

respect to the degrees of the nodes take the same general form for both net-

works. However, it seems that the evolution of the rich-club coefﬁcient is

clearly more abrupt for the album network, from degree 58 to 60, whereas

it appears to be progressive for the other network.

We could transpose these results to the world of jazz as follows: if one

of the highly connected musicians, a rich node, present in the studied net-

works, passed away, most of the collaborations between musicians would not

be greatly affected. History has given us some famous examples with George

Duke or Toots Thielemans. Indeed, after their death, the Montreux Jazz Festival

network continued to evolve and the connectivity of the network did not de-

crease.

Modularity and community

Most social networks highlight the notion of community [17] bringing together

the different members, or nodes, of the network. As explained in section 2,

there exist different methods to detect communities within a network such as

the method of Clauset,Newman and Moore [18], the method of Pons and Latapy

Chapter 4. Results 35

[19], the method of Watike and Tsurumi [27] or the method of Louvain [28]. The

one used in this document is called the Louvain method created by V. D. Bondel.

This method has been chosen because it is easily implemented thanks to the

python-louvain library and especially because it seems to be the most efﬁcient

method to date.

FIGURE 4.7: Modularity optimization comparison

This method allows to perform the partitioning of a network by optimizing

the modularity. The modularity is a value between −0.5 and 1 which mea-

sures the density of edges inside the communities compared to the density of

edges connecting the communities. The formula to calculate the density is the

following:

Q=1

2m∑

[Aij −kikj

2m]δ(ci,cj)

where Aij gives the weight of the edge between the nodes i and j, kiand kjare

the sum of the weights of the edges linked to nodes iand j, 2mis the sum of

all the weights of the edges of the graph, ciand cjare the communities and δ

is the following Kronecker delta function:

δ(x,y) = 









1, if x=y

0, otherwise

The table 4.2 shows the data concerning the communities of the two networks.

The choice was made to use the visuals obtained by the Gephi tool rather than

Chapter 4. Results 36

those obtained at the beginning of this research which used the NetworkX li-

brary. Indeed, the Gephi tool allows a clear and clean visualization of the dif-

ferent communities, which is not always the case for NetworkX. Note that the

Gephi tool also uses the Louvain algorithm to detect communities.

Wikipedia network Montreux network

Modularity value 0.83 0.931

Number of communities 217 1,650

TABLE 4.2: Parameters related to communities

Figure 4.8 shows the graphs obtained for the detection of communities in

the two networks using the Gephi tool.

(A) Communities in the Wikipedia network (B) Communities in the Montreux network

FIGURE 4.8: Communities visualization for both networks

The Louvain method has enabled to put forward ≈217 different communi-

ties for the album network and ≈1, 650 communities for the live performance

network. These multiple communities are highlighted here with different col-

ors. The number of communities can vary because the Louvain [28] algorithm

is unstable. Indeed, the placement of nodes in the different communities de-

pends, among other things, on the evaluation order of the nodes [29].

Chapter 4. Results 37

As mentioned in section 4.1.1 and shown in ﬁgure 4.9, thanks to the differ-

ent colors of the communities, it is easier to notice that the number of discon-

nected components, i.e. isolated community, of the network is higher for the

collaboration network during the Montreux Jazz Festival than for the collabora-

tion network during the album recording.

FIGURE 4.9: Network of Montreux festival: Isolated community

As mentioned in section 2, it is often difﬁcult to draw conclusions from the

communities highlighted by the algorithm without having additional informa-

tion available. However, regarding the collaboration networks between jazz

musicians, we notice that, for both networks, most of the communities are built

around a very connected node. However, as there are clusters of nodes poorly

connected to the main network, it appears that sets of nodes logically form

communities without the presence of highly connected nodes within them.

Moreover, since musicians making up these two networks come from many

different countries, it appears that there is a link between the communities and

the geographical origin of the musicians, for example one notices the presence

of several communities comprising exclusively French musicians. The same is

Chapter 4. Results 38

true for the United States of America, Brazil and most of the countries men-

tioned at section 4.2.4 of this document. When there are several communities

for one country, it seems that the difference between these communities is de-

termined by the age of the musicians and by the sub-genre of jazz they usually

play.

To give an example, the ﬁgure 4.10 shows the separation into communities

of the collaboration network during album recording on the criterion of the

musical sub-genre usually practiced by the musicians. Thus in green, with

Duke Ellington, we observe the community associated with swing; in pink, with

Barney Bigard and Louis Armstrong, we observe the community associated with

Dixieland1; and ﬁnally in blue, with Roy Eldridge and John Lewis, we observe

the community associated with big-band style.

FIGURE 4.10: Community in the network of Wikipedia albums:

This ﬁgure shows the separation in community around the

sub-genre of jazz.

1This sub-genre of jazz is also called traditional jazz. It essentially includes the music

produced in New Orleans at the beginning of the 20th century. This name refers to the music

developed originally by the Original Dixieland Jass Band.

Chapter 4. Results 39

Apart from the difference in the number of communities detected in the

two networks, these communities differ by the fact that those present in the

Montreux Jazz Festival network are generally more widespread. Indeed, they

include, in average, more nodes. This can be explained, as for the larger num-

ber of communities, by the simple fact that the Montreux Jazz Festival network

contains more nodes.

Clustering coefﬁcient

As deﬁned by Girvan and Newman [17], the clustering, or network transitivity,

is the property that two vertices, which are both neighbors of the same third

vertex, have a heightened probability of also being neighbors of one another.

The global clustering coefﬁcient is deﬁned as follows:

Ci=3∗number of triangles on the graph

number of connected triples of vertices

The coefﬁcient Ciis the probability that two nodes are connected knowing that

they have a neighbor in common. In other words, this coefﬁcient indicates

the probability that two musicians having collaborated with a musician on a

album have themselves collaborated on another album. Based on this coef-

ﬁcient, it is possible to calculate the average clustering coefﬁcient of the two

networks as follows:

C=1

n∑

v∈G

Another parameter strongly related to the one presented just before is the

transitivity value. This one allows to know the fraction of all possible triangles

present in the network. This value, T, can be calculated with the following

formula, where possible triangles are identiﬁed by the number of triads (i.e.

two edges with a shared vertex):

T=3∗number of triangles

the number of triads

Chapter 4. Results 40

The table 4.3 shows the values obtained for the transitivity ratio and the

clustering coefﬁcient for both networks, the two most popular statistics that

measure the number of triangles in a network.

Wikipedia network Montreux network

Average clustering coefﬁcient 0.745 0.917

Transitivity value 0.606 0.753

TABLE 4.3: Average clustering coefﬁcient

Thanks to these two coefﬁcients, we can see that the nodes of the live collab-

oration network have clearly more tendency to cluster together than the nodes

of the album collaboration network. However, both networks have a relatively

high average clustering coefﬁcient, which is often the case for real-world net-

works, and in particular for social networks.

The ﬁgure 4.11 represents graphically the clustering coefﬁcient obtained for

each node of the Wikipedia collaboration network while ﬁgure 4.12 represents

the results obtained for the Montreux Jazz Festival network.

FIGURE 4.11: Clustering coefﬁcient for Wikipedia network

According to ﬁgure 4.11, it appears that most of the nodes (≈800) in the

album network have a clustering coefﬁcient between 0.9 and 1.

Chapter 4. Results 41

FIGURE 4.12: Clustering coefﬁcient for Montreux network

According to ﬁgure 4.12, it appears that most of the nodes (≈6,000) in the

album network have a clustering coefﬁcient between 0.9 and 1.

Thus, from the deﬁnition of the clustering coefﬁcient, it appears that more

than 90% of the nodes in both networks have a high probability of being con-

nected to another node knowing that they have a neighbor in common. Then,

we can conclude that both networks studied here are strongly aggregated.

4.2 Hubs analysis of both networks

4.2.1 Deﬁnition of hub

In the following section, we will analyze the hubs of the two networks pre-

sented earlier. To do this, it seems appropriate to start by deﬁning what we

mean by hub. We deﬁne this term as a node of the network having a higher

number of connections (i.e. degree) than the average nodes. This type of node

is proper to scale-free networks and cannot be observed in random networks.

For the sake of simplicity and available information, here we will focus on the

50 most connected hubs. Indeed, the various characteristics that follow are dif-

ﬁcult to collect for all the nodes of the two networks. In addition, the hubs are

interesting to study because they are central in the networks and they allow to

extract characteristics of most inﬂuential musicians in the jazz world.

Chapter 4. Results 42

4.2.2 Top hubs

Thus, based on their degree, the table 4.4 lists the 5 musicians who have done

most collaborations for each of the two networks2. For the sake of brevity, in

this subsections we will limit our analysis to the 3 most connected musicians.

However, all the statistics in the following subsections are performed on the

50 most connected nodes.

Wikipedia network Montreux network

1 Duke Ellington (152) George Duke (251)

2 Johnny Hodges (73) Toots Thielemans (209)

3 Johnny Grifﬁn (60) Herbie Hancock (209)

4 Eric Dolphy (58) Quincy Jones (207)

5 Barney Bigard (56) Claude Nobs (198)

TABLE 4.4: Top 5 hubs and their degree for both networks

The ﬁgures 4.13 and 4.14 are for information purposes only, allowing to put

a face on the most important names mentioned hereafter.

(A) Duke Ellington (B) Johnny Hodges (C) Johnny Grifﬁn

FIGURE 4.13: Top 3 hubs of album network

Regarding the album network, it is not surprising to ﬁnd Duke Ellington (see

4.13a) as the most connected node. Indeed, he is an essential ﬁgure of jazz; he

2For more details, the list of the 200 top hubs of each of the two networks is available via

this link: GitHub: List of the top 200 hubs.

Chapter 4. Results 43

appears among the ﬁrst results when we search for most famous jazz musicians

on Google. Moreover, his inﬂuence on the jazz scene through his numerous

collaborations is beyond doubt. This ﬁrst result can explain the presence of

the second most connected node, namely Johnny Hodges (see 4.13b). Indeed,

the latter was a soloist (alto saxophone) in Duke Ellington’s big band. Thus, his

particular connection with the most connected node of the network allowed

him to acquire many common connections with this hub. This singular place

in relation to Duke Ellington seems to explain the gap between the degree of

the two musicians (79). Finally, the third most connected node in the album

network is Johnny Grifﬁn (see 4.13c). This high ranking seems to be due to the

fact that he has released many albums during his career and has collaborated

with other famous names such as pianist Thelonious Monk, drummer Art Blakey

or tenor saxophonist Eddie "Lockjaw" Davis.

(A) George Duke (B) Toots Thielemans (C) Herbie Hancock

FIGURE 4.14: Top 3 hubs of festival network

Regarding the collaboration network during the Montreux Jazz Festival, the

musician who made the most collaboration is George Duke (see 4.14a) with a

degree of 251. As for Duke Ellington in the network of album, this musician is

one of the most prestigious in the world of jazz: he was nominated 7 times for

the Grammy awards, including 2 times where he won. In addition, he also has

a very extensive discography. It seems that these factors have allowed him to

perform many times with other musicians at the festival, bringing him to the

Chapter 4. Results 44

ﬁrst place in terms of collaboration. Then, we ﬁnd Toots Thielemans (see 4.14b)

and Herbie Hancock (see 4.14c) with a degree of 209. Although both are very

famous, which justiﬁes their ranking in the number of collaborations at the

festival, the ﬁrst one has the particularity to be also one of the rare musicians

of this importance to play the harmonica in addition to being the only Belgian

musician among the top hubs. Herbie Hancock on his side has the particularity

to have often played, not necessarily during the Montreux Jazz Festival, with

other famous jazz musicians such as Clark Terry,Miles Davis or Wayne Shorter.

4.2.3 Instrument

The ﬁgure 4.15a shows the most played instruments in the most connected

nodes for the album network.

We notice that the trumpet has largely the lead with 22.4% of the occur-

rences, followed by the saxophone and the piano with respectively 20.4% and

14.3% of the occurrences. It also appears that the two least represented instru-

ments in most connected nodes are the French horn and the vibraphone. The

ﬁgure 4.15b shows the most played instruments in most connected nodes for

the live performance network. For this network, we see that the most popular

instrument among the hubs is the voice with 22.4%, followed by the trumpet

with 12.4%, and ﬁnally the trombone and the saxophone with both 10.2%.

(A) Top hub instrument in album network (B) Top hub instrument in live network

FIGURE 4.15: Instruments of the top hubs

Chapter 4. Results 45

Since the instruments played by the musicians is a data only available for

the hubs of the album network, it is impossible to compare the results pre-

sented above with the percentage of musicians playing each instrument. How-

ever, we only have this data for the Montreux Jazz Festival network. Thus, ﬁg-

ure 4.16 shows the distribution of instruments between all the nodes of this

network, not only the hubs. We notice that the most popular instruments for

the Montreux Jazz Festival are respectively the voice with 20.4%, the bass with

10.6%, the drums with 10.2%, the guitar with 10% and the trumpet with 9.2%.

Thus, we observe that, except for the voice, the most represented instruments

for all the nodes of this network are not the same as the most represented in-

struments for the hubs of the network. This is notably the case for the guitar,

the piano, the bass and the trombone. This last result leads us to think that the

instrument plays a signiﬁcant role in the fact of being hubs of the network3.

FIGURE 4.16: All nodes instrument distribution in the Montreux

network

The instrumental compositions of a jazz band depends heavily on the style

3Unfortunately, this type of comparison relative to all the nodes of the network is only

possible for instruments of the Montreux Jazz Festival network. Thus, this type of analysis

will not be done in the following sections studying other parameters. But given the results

obtained here, we can assume that the characteristics studied are speciﬁc to the hubs and not

to all the nodes of the network.

Chapter 4. Results 46

of jazz played by the band and by the number of musicians. There are sev-

eral jazz ensemble compositions; the most common are duets, trios, quartets,

quintets, sextets, and beyond 12 musicians, we generally speak of big band

or orchestra. All these compositions can take very different forms. The most

common form of trio in jazz includes a pianist, a double bass player and a

drummer (e.g. The Bad Plus). A wind instrument such as a saxophone or a

trumpet is often added to this form of band.

Several hypotheses can explain the popularity of wind instruments and

voice among the hubs of the two networks. One of them is based on the fact

that they are rather leading instruments in the compositions of the groups.

Indeed, they tend to be added to a rhythmic pattern already established by the

other instruments, making them instruments that could be described as soloist

and inclined to the entertainment. The most obvious example in this group of

instruments is undoubtedly the saxophone. Indeed, invented in Belgium by

Adolphe Sax,Sidney Bechet popularized it in the 1920’s to the point of making it

indistinguishable from jazz.

Finally, it appears that there is a correlation between the fact that a node is

strongly connected and the fact that it plays a so-called solo instrument. How-

ever, it seems unlikely that the instrument is the only factor favoring collabora-

tions given that some soloistic musicians are not very connected, for example

Lester Young (saxophone) and Chet Baker (trumpet), who do not appear among

the hubs of the two networks.

4.2.4 Geographical information

The ﬁgures below show the percentage of high degree nodes in relation to their

birthplace (i.e. city and country). What stands out the most for ﬁgure 4.17a is

the predominance of the United States of America in this graph with 98% of

the 50 most connected musicians coming from this country. Concerning the

most connected musicians of the Montreux Jazz Festival network (see 4.17b), as

Chapter 4. Results 47

for the previous case, we note a predominance of the USA (42.9%) among their

countries of origin. Then come England and France with respectively 16.3%

and 14.3% of the musicians coming from these countries, and then Germany

and Switzerland with both 8.2%.

(A) Top hub country in album network (B) Top hub country in live network

FIGURE 4.17: Countries of the top hubs

It appears that New York has been the birthplace of 10.4% of most collab-

orative jazz musicians of the album network (see 4.18a). This city is followed

by Chicago with 8.3% of the musicians. Concerning the cities of origin of the

Montreux Jazz Festival network’s musicians (see 4.18b), we notice that the 4

most represented cities are all from the USA, namely Chicago (6.4%), Los An-

geles (4.3%), Detroit (4.3%) and San Raphael (2.1%). This result is consistent

with the result obtained in ﬁgure 4.17b.

(A) Top hub city in album network (B) Top hub city in live network

FIGURE 4.18: Cities of the top hubs

Chapter 4. Results 48

As explained in section 1.3, it is generally accepted that the geographical

origin of jazz is in the United States of America and more precisely in New

Orleans and Louisiana. This origin could explain the signiﬁcant presence of

the United States in the countries of birth of the most connected nodes.

The predominance of New York and Chicago as birthplace of the most con-

nected musicians seems to be explained by the fact that most famous jazz clubs

were located in these cities. For instance, among others, the Savoy Ballroom and

the Cotton Club are both in Harlem.

Moreover, the presence of more European countries and cities for the Mon-

treux Jazz Festival can be explained both because this festival is held in Europe

but also because some countries such as France are also nests of jazz music. In-

deed, we can cite, among the most proliﬁc jazzman at this festival names, such

as Louis-Herve Maton or Guillaume Dionnet. We can also underline the presence

of jazz club well known in France such as the Blue Note, the Tabou or the Club

Saint-Germain. The strong presence of English musicians among the most con-

nected nodes seems to be explained by the presence of pop/rock stars who

are often present at this festival, such as James Morrison and Mick Hucknall, the

singer of Simply Red.

4.2.5 Birthyear

It seems important to specify that the age of the musicians and the age of the

node in the networks are two distinct parameters. For example, a musician

might start collaborating at an older age, so the node representing him or her

will be recent and the age of that node will be low. However, the older a musi-

cian is, the more likely he or she is to join the network early in its construction

and thus have an older node as well.

This section will deal with the actual age of the musicians, rather than the

age of the nodes that represent these musicians.

Chapter 4. Results 49

(A) Top hub birthyear in album network (B) Top hub birthyear in live network

FIGURE 4.19: Birthyears of the top hubs

We notice that on average, the hubs of the Montreux Jazz Festival network

are younger (avg. 1951) than those of the album network from Wikipedia (avg.

1920). Several factors seem to explain the difference in average age between

the musicians of the two networks. First, since the Wikipedia pages used as a

source list the most inﬂuential jazz albums and these are often so-called his-

torical albums, it is very likely that they were released a long time ago. So

musicians are also older. Secondly, a factor that can explain why the musicians

of the Montreux Jazz Festival are younger is the fact that this festival, on the one

hand continues to be organized and therefore on the long term, musicians who

participate will inevitably be born later and later.

On the other hand, as said before, this festival welcomes musicians of all

styles, not only jazz for which the average age is 52 years old [30] but also other

styles like rap for which the average age in the Hot 100-Charting is 26.6 years

old [31].

4.2.6 Gender

Finally, one of the last criteria that seems interesting to analyze for hubs is their

gender. Since the author of this paper is not an expert in gender studies, this

Chapter 4. Results 50

section will be limited to highlighting the male/female ratio among the hubs

of the two networks.

(A) Top hub gender in album network (B) Top hub gender in live network

FIGURE 4.20: Genders of the top hubs

There is a major presence of men among the hubs of both networks. This

is nothing more than a reﬂection, though greatly improving with time, of the

weakest presence of woman in the universe of jazz. There are 42 men against 8

women within the Montreux Jazz Festival top hubs, whereas there is no female

hub in the album network. For this later network, this can be explained by the

age of these musicians and the era in which they performed, leaving little room

for women in music. As we have seen in section 4.2.5, on average, the hubs

of the Montreux Jazz Festival network are younger than those of the album net-

work. This suggests that mentalities seem to evolve over time and that women

tend to be more and more represented among inﬂuential jazz musicians.

Moreover, it is interesting to notice the absence of some great ladies of

jazz in the album network such as Ella Fitzgerald,Billie Holiday or Nina Si-

mone. This can be explained either by incomplete data or by the fact that the

Wikipedia pages serving as sources do not properly reference and credit the

various women in the jazz world. This absence can also possibly be explained

by the fact that the common element among the three women previously men-

tioned is the fact that they are all singers. Finally, for informative purposes,

the ﬁrst woman who appears among the hubs of the album network is Maggie

Chapter 4. Results 51

Hyams, an American vibraphonist, located at the 96th place among the hubs

with a degree of 24.

4.3 Comparison of musician’s networks at festivals

In this section, the goal will be to compare the network of musicians at the

Montreux Jazz Festival to another festival in order to know if they have common

characteristics or not. To do this, a network has been created from the data

available online for the New Orleans festival. Figure 4.21a shows the network

obtained with these data.

(A) New Orleans festival network (B) Montreux festival network

FIGURE 4.21: Festival networks

The dataset used to build the network for the New Orleans Festival has a

quality score (see 3.1) equivalent to the one of the Montreux Jazz Festival, which

is 1. We can therefore compare these two networks on a solid foundation.

This new network has 11, 283 nodes, which is relatively close to the number

of nodes in the Montreux Jazz Festival network. However, these two networks

differ mainly by the maximum degree among the nodes which is 47 for the

Chapter 4. Results 52

New Orleans Festival network against 251 for the Montreux Jazz Festival net-

work. Thus, we notice that this new network is less connected. This conclu-

sion is conﬁrmed by the number of weakly connected components which rises

to 7, 573 where it was only 1,136 for the Montreux Jazz Festival network. It is

also interesting to note that the average number of musicians present on stage

for the Montreux Jazz Festival is 6.46 while this number is only 1.44 for the New

Orleans Festival.

In addition to highlighting the fact that both networks are scale-free, the

ﬁgures 4.22 illustrate the difference that exists regarding the number of weakly

connected components.

(A) Degree distribution of New Orleans

Festival network

(B) Degree distributions of Montreux

network

FIGURE 4.22: Degree distributions

However, we notice that these two networks have a rather similar structure.

It seems that the networks of musicians collaborations during jazz festivals

generally adopt the same general form.

The next ﬁgure 4.23 compares the number of new nodes per year for the

two festival networks.

Chapter 4. Results 53

(A) New nodes per year - New Orleans

(B) New nodes per year - Montreux

FIGURE 4.23: Number of new nodes per year

Moreover, to support the hypothesis stated above, we notice on ﬁgure 4.23a

that the evolution of the number of new nodes as a function of time seems

to evolve similarly for the two festival networks. Indeed, they tend to grow

compared to the evolution followed by the album network illustrated in ﬁgure

4.1.

However, to conﬁrm the hypothesis that the festival networks have the

same general form, it is necessary to have data on more than two festivals

and possibly to compare different types of festivals. Indeed, it is very likely

that an electronic music festival, where collaborations are rare, does not adopt

the same structure as the one presented here.

4.4 Analysis of meta-networks

In this section, we will analyze the meta-networks obtained for the instruments

of all the musicians of the Montreux Jazz Festival, for the countries of origin of

its the top hubs and for the years of album releases for the Wikipedia network

and the years of concert performances for Montreux Jazz Festival.

Chapter 4. Results 54

The interest of this type of network lies in the fact that they allow us to

highlight properties that are not obvious at ﬁrst sight when we simply look at

the network or the statistics of the top hubs. Indeed, the latter establish a re-

lationship between the different instruments and countries, making it possible

to detect possible afﬁnities between them.

As a reminder to the reader, these meta-networks are constructed as fol-

lows: if we take the musicians’ country of origin as a characteristic to build our

meta-network, the nodes will be the different existing countries. Two countries

A and B will then be linked if a musician born in country A has collaborated

with another musician born in country B. Thus the strength of the link between

countries A and B will be an indicator of the afﬁnity that musicians from these

two countries have with each other.

4.4.1 Montreux instruments meta-network

The meta-network shown in ﬁgure 4.24 comprises a set of 25 instruments4. As

explained previously, each node of this network represents an instrument, so

all the musicians playing the same instrument are grouped in the same node,

the number of musicians playing the same instrument thus deﬁnes the size

of this node. Note that the stronger the link between two instruments is, the

more these instruments will be a common combination during the Montreux

Jazz Festival.

Note that since the album dataset does not have information about the in-

struments of the musicians, this meta-network could only be realized for the

Montreux Jazz Festival collaboration network.

4Note that for simplicity, the orchestra conductor has been considered as an instrument

here.

Chapter 4. Results 55

FIGURE 4.24: Instruments meta-network for Montreux festival

Thus we notice that in the speciﬁc case of this festival, the most common

association is combining the guitar with the voice. There is nothing surprising

in this result given the popularity of this type of association in the music world.

Then, with almost equal weight or afﬁnity, the following group of instruments

are often associated together: the drums, the bass, the keyboard, the guitar

and the voice. This ﬁrst group seems to represent the classic composition of

pop/rock bands. Among the bands using this composition we can mention

Genesis,Queen or Supertram5. To this ﬁrst group of instruments it seems that

a second one is grafted, including the following instruments: the saxophone,

the piano, the trumpet, the trombone and the percussions. This second group

seems to be closer to the instruments traditionally associated with jazz music.

Moreover, for the remaining instruments, we note, among others, an afﬁn-

ity between the violin and the clarinet and between the violin and the cello. In

addition, there is also a group of instruments that have almost no afﬁnity with

5Although all these band taken as examples are British, there are obviously others such as,

among others, Lynyrd Skynyrd,The Beach Boys or Kraftwerk.

Chapter 4. Results 56

the others: the n’goni6, the harp, the bandoneon7, the accordion, the tuba, the

French horn, the haromica and the organ. This can be explained by the lesser

popularity of these instruments in general and particularly at Montreux Jazz

Festival. Finally, if we compare the results obtained here with those obtained

in section 4.2.3, we can deduce that, in spite of their great popularity among

the musicians of the festival, the guitar and the bass are not the most popular

among the top hubs of this same network.

4.4.2 Years meta-networks

In this section, the parameter for which we will study the afﬁnity is the year

of release of the albums and the year of performance of the concerts. In this

network, each node represents a year, the size of the nodes depends on the

number of musicians having played that year. The links between these dif-

ferent nodes are established if two musicians belonging to these nodes have

played together during albums or concerts that took place in different years.

Thus, the stronger the link, the greater the relative afﬁnity between the two

years. Note that given the way this one is constructed and given the concept

of overlapping years, this meta-network is less obvious to analyze than the one

presented in the previous section.

Figure 4.25 represents the meta-network described above for the album col-

laboration network.

6A traditional guitar of Mali.

7A kind of accordion popular in Uruguay and Argentina.

Chapter 4. Results 57

FIGURE 4.25: Years meta-network for album collaboration

network

The years with most musicians are 1961 with a degree of 518; 1960 with a

degree of 574; and ﬁnally the year 1956 with a degree of 648. We can clearly

see here the domination of the 60’s decade. Indeed, we notice a strong afﬁnity

between multiple nodes of the years going from 1956 to 1963, that seem to

indicate that the musicians having participated in the realization of albums

during this period often tended to collaborate.

Moreover, it is interesting to note that the year 1976 is strongly linked to

the year 1956 and 2007 which are added to the period of time described earlier,

knowing that this last results haven’t yet been explained by the author.

Figure 4.26 represents the meta-network interconnecting the years for the

collaborative network during live performance.

Chapter 4. Results 58

FIGURE 4.26: Years meta-network for Montreux festival

The years where we count the most musicians are 1993 with a degree of 514;

1994 with a degree of 600; and ﬁnally 1991 with a degree of 666. We can clearly

see here the domination of the 90’s decade. Here, we observe two distinct

sets. Indeed, there is an afﬁnity between the set of years from 1973 to 1995

and an afﬁnity between the years that cover the period of time between 2001

and 2007. It would also seem that the year 2001 acts as a bridge between these

two sets. Thus these results seem to show that musicians who performed from

1973 to 1995 often tended to collaborate with each other and to participate in

each other’s concerts. The same is true for musicians who performed at the

festival in the ﬁrst decade of the 2000s.

Chapter 4. Results 59

4.5 Analysis of hub’s meta-networks

4.5.1 Country

In this section, the same type of meta-network has been constructed but this

time on the basis of the country of the musicians’ origin rather than on the

basis of their instrument.

Note that, as said before, since this geographical data was not available for

all the musicians of the Montreux Jazz Festival network, this meta-network has

been built only from the top hubs (for which these data were available). Thus,

the results obtained should be interpreted with caution.

FIGURE 4.27: Instruments meta-network for Montreux festival

As for the results obtained in section 4.2.4, ﬁgure 4.27 allows us to highlight

the omnipresence of the USA among the most important nodes of the Montreux

Jazz Festival network. Indeed, we notice that the American top hubs collaborate

with all the other top hubs except those from Jamaica. Moreover, apart from

the existing relationship between Switzerland and England, the American top

Chapter 4. Results 60

hubs, in addition to intra-country collaboration, collaborate in an exclusive

way with the other countries. Finally, it can be noted that there seems to be

a stronger afﬁnity between the USA and England and between the USA and

Switzerland than between the USA and other countries.

Note also that this type of meta-network for the top hubs of the album

network would not be very useful. Indeed, the results obtained in section 4.2.4,

indicate that there are only two countries among the top hubs of this network

(USA and Canada), so an afﬁnity analysis would have limited interest.

Chapter 5

Discussion

In this section, the different technical limitations encountered during the de-

velopment of this research will be discussed, whether it is of technical or logis-

tic nature. In addition, we will also discuss the improvements that the author

would like to introduce to this document in the future.

5.1 Limitations

5.1.1 Datasets

As explained many times in this thesis, the main problem encountered was

the availability of online data to establish the datasets. Indeed, due to this lack

of information, it was only possible to collect the instruments associated to all

the musicians for the Montreux Jazz Festival network and not for the album net-

work. Thus it was not possible to compare the proportion of hubs playing an

instrument with the proportion of nodes playing an instrument for the album

network. The same is true for both networks for the other parameters studied

in section 4.2.2.

Moreover, information such as geographical data, birth years and instru-

ments had to be collected by hand for the top hubs, thus greatly limiting the

number of hubs considered for this study.

Chapter 5. Discussion 62

5.1.2 Labels

In the ﬁrst version of this work, dealing only with the album network, a study

of the different labels from which the musicians were issued had been made.

This last point was abandoned for several reasons: ﬁrstly because the notion of

label for the live performance network added in this version of the document

does not make sense; secondly it is quite possible that a same musician has

worked with many different labels, thus making this parameter complex to

study correctly.

5.1.3 Styles of jazz

One of the important limitations to consider is, as music lovers will have rightly

noticed, that all the different styles of jazz have been included here under one

and the same set. Thus all the different variations of this music, explained in

section 1.3, were not taken into account when studying the top hubs. The only

reference to jazz style is the one made in section 4.8 with an assumption about

the grouping into communities according to the sub-genre of jazz practiced by

the musicians. This data could have told us if there exist a correlation between

the fact that a musician is a top hub and the fact that this musician plays a

particular style of jazz.

This data could not be collected because on the one hand it was not avail-

able in the different datasets used here, and on the other hand, for music, as

for other artistic domains, there are many cases of hybrid creation that are im-

possible to classify in a single style, making the task even more complex.

5.1.4 Time

Obviously, as in all research work, the biggest limitation is time. Indeed, we

would always like to do more and go further. Unfortunately one day we have

Chapter 5. Discussion 63

to write down what we have found. Thus the next section 5.2, summarizes the

different topics that the author would have loved to cover in this document.

5.2 Further improvement

5.2.1 Data collection

A possible solution to the problem of incomplete datasets presented in section

5.1.1, would have been to use an API of Spotify, or another streaming platform,

in order to have data about the musicians. Moreover, with this technique,

it would also have been possible to study parameters related to the average

tempo and the main key of the albums for example. Above all, if this API

allows it, it would completely have solved the problem of labels explained in

section 5.1.2. However, this would not have solved the problem of lack of data

for the live performance network.

Additionally, a future improvement that would greatly serve the reliability

of this research would be to collect data related to another album network in

order to compare it with the network of Wikipedia as it was done in the section

4.3 for the case of the Montreux Jazz Festival network which was compared to

the New Orleans live network.

Finally, a last point regarding data collection that could be re-evaluated in

the future, as a substitute for an API for Spotify, would be the possible use of

the DBpedia framework for the data collection of the album network. Indeed,

this tool, discovered too late by the author, would allow to avoid the multiple

webscrapping methods presented in section 3.1, generating a greater efﬁciency

and possibly limiting the errors related to the data collection. However, due to

the particular nature of the links between musicians it is not sure that this tool

will be of the best use in the future, but it remains a track to explore.

Chapter 5. Discussion 64

5.2.2 Conﬁdence in the characteristics carried by the hubs

Although in section 4.2.3, the instrument characteristics of the hubs were com-

pared to those of the other nodes of the Montreux Jazz Festival network, thus

ensuring the uniqueness of these results, this type of comparison could not be

performed for the other characteristics of both networks.

For both networks, a possible solution to this problem would be to ran-

domly take nodes in the lower part of the ﬁrst quartile and in the upper part

of the third quartile of the data. Then, to collect, by hand because we do not

have this information in the datasets, the characteristics of each of these nodes

in order to compare them with those of the hubs. This method would allow us

to make sure that the characteristics highlighted are speciﬁc to the hubs and

not common to all the nodes of the network.

5.2.3 Popularity of musicians

One parameter that seems to be important and that has not been taken into

account in this study is the popularity of the musicians towards the public.

Indeed, it would have been interesting to study the impact of the popularity of

a musician on his place in the network, i.e. if there is a correlation between the

fact that a musician is a hub and the fact that he is popular with the public.

The reason why this parameter has not been approached in this paper is the

following: this data is very complex to quantify. Of course it would be possi-

ble to use the number of plays (streams/views) on the streaming platforms of

a certain album of a musician to try to approach this value but it would remain

reductive. Indeed, by proceeding in such a way, we would neglect the popular-

ity of musicians during live performances. To do so, we would have to count

the number of spectators during each performance, a practically impossible

task given the inaccessibility of these data. Moreover, using this technique to

Chapter 5. Discussion 65

quantify the popularity of musicians, we would leave out a large number of

musicians who do not have any songs/albums on the streaming platforms.

Another existing popularity measure is Q-score [32]. The latter is often

calculated on the basis of groups sharing a common characteristic, such as

age, allowing for example to know how popular a celebrity is among a certain

segment of the population. However, this measure seems difﬁcult to apply

here because it would require asking a large group about jazz musicians, which

is closer to sociology than to computer science.

Thus, in a possible continuation of this study, one of the important points

would be to ﬁnd a way to quantify the popularity of a musician.

5.2.4 Evolution of racial segregation

An interesting analysis to be done in a future version of this document would

be to see if the racial segregation between 1912 and 1940 mentioned by Gleiser

and Danon [23] is still happening in the jazz world today. Unfortunately, this

comparison could not be made because the data concerning the racial origin

of the musicians could not be retrieved for all the nodes of both networks. The

intuition of the author is that this segregation is less pronounced nowadays

than it was then, but this has still to be demonstrated.

5.2.5 Meta-networks

In the ideal case where the API mentioned in the previous point could ﬁll all

the missing data, it would be interesting to produce meta-networks for all the

musicians of the two networks for the other parameters studied in section 4.2.2,

i.e. the country of origin, the city of origin and possibly the labels. Indeed, up

to now, the study of afﬁnities between instruments is only possible for the

Montreux Jazz Festival network and the study of the afﬁnity between the other

parameters is currently only possible for the top hubs of the two networks.

Chapter 6

Conclusion

The ﬁnal section presented here will summarize the different results obtained

in section 4. The purpose of this last part is to provide a general answer to the

second question exposed in section 1.2 which is How to better understand the

universe of jazz by means of its musicians networks?

6.1 Topology

First, we will look in detail at the conclusions that can be drawn from the topol-

ogy of the different networks studied.

6.1.1 Album and Live networks

An important observation is that both networks, the album and the concert

one, are scale-free. This is indeed an expected result for so-called social collab-

oration networks. This result has been documented many times in the litera-

ture on networks.

Furthermore, for the two collaborative networks, through the rich-club co-

efﬁcient, it appears that the most connected nodes (i.e. hubs) of the networks

are strongly connected to each other, which could result in some robustness.

Thus, if we remove some hubs, the general connectivity of the networks would

not be signiﬁcantly affected. This result can be understood as follows: if one of

Chapter 6. Conclusion 67

the highly connected musicians present in the studied networks passed away,

the majority of the collaborations between musicians would not be greatly af-

fected.

However, the two collaborative networks seem to differ on several points.

Firstly, we can analyze the evolution of the number of new nodes added to the

networks: indeed, we notice that for the album network, this number tends to

decrease since 2010, suggesting a correlation with the progressive death of the

CD format. This phenomenon has not been observed for the live performance

network. Secondly, the two networks are also distinguished by a large differ-

ence in the number of nodes, the live performance network being much more

populated than the album network. This difference has the direct consequence

of increasing the number of communities, the average and maximum degree

and the number of related components of the live performance network.

6.1.2 Comparison between live networks

Thanks to the results obtained in section 4.3, in which we compared two collab-

oration networks during live performances, we have shown that there seems

to be a similarity between these types of networks. Indeed, ﬁrstly, once again

these two networks are scale-free. Secondly, the evolution of the number of

new nodes entering the two networks over time is similar. This result sug-

gests that collaborative networks of jazz music festivals evolve in a similar

way. However, these two networks differ on the following points: the max-

imum degree, the number of weakly connected components and the average

number of musicians present on stage during a performance.

6.1.3 Communities

Regarding as for the communities, the two networks differ mainly on the num-

ber of them. Indeed, this result is due to the fact that the Montreux Jazz Festival

Chapter 6. Conclusion 68

collaboration network has many more nodes and weakly connected compo-

nents than the Wikipedia collaboration network.

However, it seems that for both networks, the communities are formed

around the same criteria. The intuition of the authors is that the communities

detected in this study are related to the following two parameters: the style of

jazz/music played by the nodes and their geographical origin.

6.2 Top hubs

Thanks to the multiple analyses carried out on the top hubs of the two net-

works, the following conclusions can be drawn from these particular struc-

tures.

6.2.1 Archetypal hub

First, it seems that the archetypal highly connected musician for the album

collaboration network is an American male from New York born around 1920.

He would also have a particular attraction for the trumpet or the saxophone.

Then, concerning the archetype of the very connected musician in a live col-

laboration network, this one has very strong chances to be an American man

born around 1951 and originating from Chicago. This top hub would proba-

bly be a singer and thanks to the instrument meta-network, we can see that he

has a strong chance to collaborate mainly with guitarists, themselves probably

American. It is also possible that his favorite instrument could be the trum-

pet, in which case it would be more likely that he collaborates mainly with

musicians playing the saxophone.

Thanks to the meta-network of years (see 4.25 and 4.26), we can hypothe-

size that there is a strong chance that the archetypal musician very connected

in the album network recorded the majority of his performances between 1956

Chapter 6. Conclusion 69

and 1963. Also, there is a strong chance that the hub archetype of the live col-

laboration network has played mainly between 1973 and 1995 or between 2001

and 2016.

6.2.2 Well known ﬁgures

It also seems that the majority of top hubs of both networks are already well

known ﬁgures in the jazz world (see 4.4). However, it is difﬁcult to know if

they have attracted many collaborations over the years because of their posi-

tion as ﬁgureheads or if, on the contrary, they have become essential musicians

thanks to their multiple collaborations.

6.2.3 Evolution of mentalities

Furthermore, thanks to the gender distribution of the two networks’ hubs (see

4.2.6) and to their average age (see 4.2.5), there seems to be an evolution of the

place left to women in the world of jazz. Indeed, we notice that for the hubs of

the album network of Wikipedia, being older on average, no woman is present,

while for the hubs of the live performance network of Montreux Jazz Festival,

the number of women has increased by 16%.

6.2.4 Research question

Finally, to conclude this research, this last section will answer the ﬁrst research

question introduced in section 1.2 which is What are the parameters favor-

ing the preferential attachment among jazz musicians within a collabora-

tive network? It seems that the parameters inﬂuencing the preferential attach-

ment of nodes, and thus the creation of important hubs within the network

are mainly: the instrument played by the musician, his country of origin, his

gender and possibly the reputation of the artist.

References

[1] M.E.J. Newman. “Scientiﬁc Collaboration Networks. II. Shortest Paths,

Weighted Networks, and Centrality”. In: Physical review. E, Statistical,

nonlinear, and soft matter physics 64 (Aug. 2001), p. 016132. DOI:10.1103/

PhysRevE.64.016132.

[2] Laurent Beauguitte and César Ducruet. “Scale-free, small-world networks

et géorgraphie”. In: Geography (June 17, 2011).

[3] Wikipedia. Histoire du jazz. 2014. URL:https : / / fr . wikipedia . org /

wiki/Histoire_du_jazz (visited on 01/30/2022).

[4] Google AI Blog. Explore the history of Pop – and Punk, Jazz, and Folk – with

the Music Timeline. 2014. URL:https:// ai.googleblog.com/2014/01 /

explore-history-of-pop-and-punk-jazz.html (visited on 01/30/2022).

[5] D. König. “Theorie der endlichen und unendlichen Graphen”. In: Math.

in Monogr. und Lehrb. XVI (1936).

[6] Jeffrey Travers and Stanley Milgram. “An Experimental Study of the

Small World Problem”. In: Sociometry 32 (Dec. 1969), pp. 425–443. DOI:

10.2307/2786545.

[7] Albert-László Barabási and Réka Albert. “Emergence of Scaling in Ran-

dom Networks”. In: Science 286.5439 (Oct. 1999), pp. 509–512. DOI:10.

1126 / science . 286 . 5439 . 509.URL:https : / / doi . org / 10 . 1126 %

2Fscience.286.5439.509.

References 71

[8] Réka Albert and Albert-László Barabási. “Statistical mechanics of com-

plex networks”. In: Reviews of Modern Physics 74.1 (Jan. 2002), pp. 47–97.

DOI:10.1103/revmodphys . 74 . 47.URL:https : // doi .org / 10. 1103%

2Frevmodphys.74.47.

[9] Gezhi Weng, Upinder S. Bhalla, and Ravi Iyengar. “Complexity in Bi-

ological Signaling Systems”. In: Science 284.5411 (1999), pp. 92–96. DOI:

10.1126/science.284.5411 . 92. eprint: https : / / www . science . org/

doi/pdf/10.1126/science.284.5411.92.URL:https://www.science.

org/doi/abs/10.1126/science.284.5411.92.

[10] Christof Koch and Gilles Laurent. Complexity and the Nervous System.

1999. DOI:10 . 1126 / science . 284 . 5411 . 96. eprint: https : / / www .

science.org/doi/pdf / 10 . 1126 / science . 284.5411.96.URL:https:

//www.science.org/doi/abs/10.1126/science.284.5411.96.

[11] Rita Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. “Diameter of

the World-Wide Web”. In: Nature 401 (Sept. 1999), pp. 130–131. DOI:10.

1038/43601.

[12] Ginestra Bianconi and Albert-Laszlo Barabasi. “Competition and mul-

tiscaling in evolving Networks”. In: EPL (Europhysics Letters) 54 (May

2001), p. 436. DOI:10.1209/epl/i2001-00260-6.

[13] H Jeong, Z Néda, and A. L Barabási. “Measuring preferential attach-

ment in evolving networks”. In: Europhysics Letters (EPL) 61.4 (Feb. 2003),

pp. 567–572. DOI:10.1209/epl/i2003-00166-9.URL:https://doi.org/

10.1209%2Fepl%2Fi2003-00166-9.

[14] E Ben-Naim and P L Krapivsky. “Stratiﬁcation in the preferential attach-

ment network”. In: Journal of Physics A: Mathematical and Theoretical 42.47

(Nov. 2009), p. 475001. DOI:10 .1088 / 1751- 8113 / 42/ 47 /475001.URL:

https://doi.org/10.1088%2F1751-8113%2F42%2F47%2F475001.

References 72

[15] Alexandru Topirceanu, Mihai Udrescu, and Radu Marculescu. “Weighted

Betweenness Preferential Attachment: A New Mechanism Explaining

Social Network Formation and Evolution”. In: Scientiﬁc Reports 8 (July

2018). DOI:10.1038/s41598-018-29224-w.

[16] Sushmita Ruj and Arindam Pal. “Preferential Attachment Model with

Degree Bound and its Application to Key Predistribution in WSN”. In:

(2016). DOI:10. 48550 / ARXIV . 1604.00590.URL:https://arxiv. org /

abs/1604.00590.

[17] M. Girvan and M. E. J. Newman. “Community structure in social and

biological networks”. In: Proceedings of the National Academy of Sciences

99.12 (June 2002), pp. 7821–7826. DOI:10.1073/pnas.122653799.URL:

https://doi.org/10.1073%2Fpnas.122653799.

[18] Aaron Clauset, M Newman, and Cristopher Moore. “Finding commu-

nity structure in very large networks”. In: Physical review. E, Statistical,

nonlinear, and soft matter physics 70 (Jan. 2005), p. 066111. DOI:10.1103/

PhysRevE.70.066111.

[19] Pascal Pons and Matthieu Latapy. “Computing Communities in Large

Networks Using Random Walks”. In: Computer and Information Sciences -

ISCIS 2005. Ed. by pInar Yolum et al. Berlin, Heidelberg: Springer Berlin

Heidelberg, 2005, pp. 284–293. ISBN: 978-3-540-32085-2.

[20] Ken Wakita and Toshiyuki Tsurumi. “Finding Community Structure in

Mega-scale Social Networks”. In: vol. 105. Mar. 2007, pp. 1275–1276. DOI:

10.1145/1242572.1242805.

[21] Vincent Blondel et al. “Fast Unfolding of Communities in Large Net-

works”. In: Journal of Statistical Mechanics Theory and Experiment 2008

(Apr. 2008). DOI:10.1088/1742-5468/2008/10/P10008.

References 73

[22] Santo Fortunato and Marc Barthélemy. “Resolution limit in community

detection”. In: Proceedings of the National Academy of Sciences 104.1 (Jan.

2007), pp. 36–41. DOI:10.1073/ pnas .0605965104.URL:https:// doi .

org/10.1073%2Fpnas.0605965104.

[23] Pablo M. Gleiser and Leon Danon. “Community structure in jazz”. In:

Advances in Complex Systems 06.04 (Dec. 2003), pp. 565–573. DOI:10 .

1142/s0219525903001067.URL:https://doi.org/10.1142%2Fs0219525903001067.

[24] Ashish Vaswani et al. “Attention Is All You Need”. In: CoRR abs/1706.03762

(2017). arXiv: 1706.03762.URL:http://arxiv.org/abs/1706.03762.

[25] Edward Loper Bird Steven and Ewan Klein. “Natural Language Process-

ing with Python”. In: (2009).

[26] David Liben-Nowell and Jon Kleinberg. “The Link Prediction Problem

for Social Networks”. In: Proceedings of the Twelfth International Conference

on Information and Knowledge Management. CIKM ’03. New Orleans, LA,

USA: Association for Computing Machinery, 2003, pp. 556–559. ISBN:

1581137230. DOI:10.1145/956863.956972.URL:https://doi.org/10.

1145/956863.956972.

[27] Ken Wakita and Toshiyuki Tsurumi. “Finding Community Structure in

Mega-Scale Social Networks: [Extended Abstract]”. In: Proceedings of the

16th International Conference on World Wide Web. WWW ’07. Banff, Al-

berta, Canada: Association for Computing Machinery, 2007, pp. 1275–

1276. ISBN: 9781595936547. DOI:10.1145/1242572.1242805.URL:https:

//doi.org/10.1145/1242572.1242805.

[28] Pasquale De Meo et al. “Generalized Louvain method for community

detection in large networks”. In: 2011 11th International Conference on In-

telligent Systems Design and Applications (2011), pp. 88–93.

References 74

[29] Rachid Djerbi, Rabah Imache, and Mourad Amad. “Communities’ De-

tection in Social Networks: State of the art and perspectives”. In: 2018

International Symposium on Networks, Computers and Communications (IS-

NCC) (2018), pp. 1–6. DOI:10.1109/ISNCC.2018.8531055.

[30] Joan Jeffri. “Changing the Beat: A Study of the Worklife of Jazz Musi-

cians”. In: National endowment for the arts (2003).

[31] Brian Zisook. How Old is the Average Hot 100-Charting Rapper Right Now?

2018. URL:https://djbooth.net/features/2018-02-02- billboard-

rapper- age#:~:text=While%5C%20the%5C%20average%5C%20age% 5C%

20of, over%5C%20the%5C%20past%5C%20six%5C%20decades (visited on

01/30/2022).

[32] Wikipedia. Q Score. 2021. URL:https://en . wikipedia .org /wiki /Q _

Score# : ~ : text = The % 5C % 20Q % 5C % 20Score % 5C % 20(popularly % 5C %

20known, are%5C%20aware%5C%20of%5C%20the%5C%20subject. (visited

on 04/26/2022).

ResearchGate has not been able to resolve any citations for this publication.

Weighted Betweenness Preferential Attachment: A New Mechanism Explaining Social Network Formation and Evolution

Article

Full-text available

Jul 2018

The dynamics of social networks is a complex process, as there are many factors which contribute to the formation and evolution of social links. While certain real-world properties are captured by the degree-driven preferential attachment model, it still cannot fully explain social network dynamics. Indeed, important properties such as dynamic community formation, link weight evolution, or degree saturation cannot be completely and simultaneously described by state of the art models. In this paper, we explore the distribution of social network parameters and centralities and argue that node degree is not the main attractor of new social links. Consequently, as node betweenness proves to be paramount to attracting new links - as well as strengthening existing links -, we propose the new Weighted Betweenness Preferential Attachment (WBPA) model, which renders quantitatively robust results on realistic network metrics. Moreover, we support our WBPA model with a socio-psychological interpretation, that offers a deeper understanding of the mechanics behind social network dynamics.

Preferential Attachment Model with Degree Bound and its Application to Key Predistribution in WSN

Article

Full-text available

Apr 2016

Preferential attachment models have been widely studied in complex networks, because they can explain the formation of many networks like social networks, citation networks, power grids, and biological networks, to name a few. Motivated by the application of key predistribution in wireless sensor networks (WSN), we initiate the study of preferential attachment with degree bound. Our paper has two important contributions to two different areas. The first is a contribution in the study of complex networks. We propose preferential attachment model with degree bound for the first time. In the normal preferential attachment model, the degree distribution follows a power law, with many nodes of low degree and a few nodes of high degree. In our scheme, the nodes can have a maximum degree $d_{\max}$, where $d_{\max}$ is an integer chosen according to the application. The second is in the security of wireless sensor networks. We propose a new key predistribution scheme based on the above model. The important features of this model are that the network is fully connected, it has fewer keys, has larger size of the giant component and lower average path length compared with traditional key predistribution schemes and comparable resilience to random node attacks. We argue that in many networks like key predistribution and Internet of Things, having nodes of very high degree will be a bottle-neck in communication. Thus, studying preferential attachment model with degree bound will open up new directions in the study of complex networks, and will have many applications in real world scenarios.

Diameter of the World-Wide Web

Article

Full-text available

Sep 1999

Despite its increasing role in communication, the world wide web remains the least controlled medium: any individual or institution can create websites with unrestricted number of documents and links. While great efforts are made to map and characterize the Internet's infrastructure, little is known about the topology of the web. Here we take a first step to fill this gap: we use local connectivity measurements to construct a topological model of the world wide web, allowing us to explore and characterize its large scale properties. Comment: 5 pages, 1 figure, updated with most recent results on the size of the www

Competition and multiscaling in evolving Networks

Article

Full-text available

May 2001

The rate at which nodes in a network increase their connectivity depends on their fitness to compete for links. For example, in social networks some individuals acquire more social links than others, or on the www some webpages attract considerably more links than others. We find that this competition for links translates into multiscaling, i.e. a fitness-dependent dynamic exponent, allowing fitter nodes to overcome the more connected but less fit ones. Uncovering this fitter-gets-richer phenomenon can help us understand in quantitative terms the evolution of many competitive systems in nature and society.

Finding Community Structure in Mega-scale Social Networks

Conference Paper

Full-text available

Mar 2007

Community analysis algorithm proposed by Clauset, New- man, and Moore (CNM algorithm) finds community struc- ture in social networks. Unfortunately, CNM algorithm does not scale well and its use is practically limited to networks whose sizes are up to 500,000 nodes. We show that this inef- ficiency is caused from merging communities in unbalanced manner and that a simple heuristics that attempts to merge community structures in a balanced manner can dramati- cally improve community structure analysis. The proposed techniques are tested using data sets obtained from exist- ing social networking service that hosts 5.5 million users. We have tested three three variations of the heuristics. The fastest method processes a SNS friendship network with 1 millionusersin5 minutes(70 timesfasterthan CNM)and an- other friendship network with 4 million users in 35 minutes, respectively. Another one processes a network with 500,000 nodes in 50 minutes (7 times faster than CNM), finds com- munity structures that has improved modularity, and scales to a network with 5.5 million. Further detail is reported in (3).

Natural Language Processing with Python

Book

Full-text available

Jan 2009

Communities' Detection in Social Networks: State of the art and perspectives

Conference Paper

Jun 2018

Nowadays, social networks have been widely used by different people for different purposes in the world. The discovering of communities and clustering is a widespread subject in the space of social networks analysis. Many interesting solutions have been proposed in the literature, such as the algorithms of Newman and Girvan. However, most solutions have common problems: the stability and the community structures quality. In this paper, we give a state of the art, from 1970 until 2017 on the approaches as well as the algorithms of communities’ detection in social networks. This study aims to gives a comparison between the different proposed architectures;

Emergence of Scaling in Random Networks

Article

Jan 1999

Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mech-anisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

Scale-free, small-world networks et géographie

Article

Dec 2011

This document from the fmr group (flow, matrices, networks) presents two models of graphs introduced by physicists at the end of the 90s: small-world and scale-free networks. Once principals and measures remembered, an examination regarding their influence on geography and regional science is proposed.

Computing Communities in Large Networks Using Random Walks.

Article

Jan 2006

Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn^2) and space O(n^2) in the worst case, and in time O(n^2log n) and space O(n^2) in most real-world cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.

Understanding the universe of jazz by means of its musicians networks

Abstract and Figures

Recommended publications

Community detection using network structure

Challenger, a New Way to Visualize Data

Aspects of Various Community Detection Algorithms in Social Network Analysis

A new method for detecting communities in network based on the affinity between nodes