Available via license: CC BY-NC-SA 4.0
Content may be subject to copyright.
A new network-base high-level data classification methodology (Quipus) by
modeling attribute-attribute interactions
Esteban Wilfredo Vilca Zuiga
Dept. of Computing and Mathematics
FFCLRP-USP
Ribeiro Preto, Brasil
evilcazu@usp.br
Liang Zhao
Dept. of Computing and Mathematics
FFCLRP-USP
Ribeiro Preto, Brasil
zhao@usp.br
Abstract—High-level classification algorithms focus on the in-
teractions between instances. These produce a new form to
evaluate and classify data. In this process, the core is a complex
network building methodology. The current methodologies use
variations of kNN to produce these graphs. However, these
techniques ignore some hidden patterns between attributes and
require normalization to be accurate. In this paper, we propose
a new methodology for network building based on attribute-
attribute interactions that do not require normalization. The
current results show us that this approach improves the
accuracy of the high-level classification algorithm based on
betweenness centrality.
1. Introduction
The machine learning classification algorithms are low
level when they use only physical features to classify usually
distance measures like euclidean distance and ignore the
possible interactions between instances as a system [1].
However, high-level classification algorithms focus on ex-
ploit theses characteristics using complex networks as data
representation [2] [3].
There are a variety of techniques to build networks
but usually, they use kNN as the core [1] [4] [5]. These
algorithms produce a network where each node represents
an instance and each edge represents a neighbor in kNN [6].
A complex network is defined as a non-trivial graph [7].
Usually, the quantity of instances on the dataset and the
interactions generates a large graph with numerous edges.
This large graphs presents special characteristics that are
exploited in many techniques to classify data like Between-
ness Centrality [5], Clustering Coefficient [8], Assortativity
[6] and so on.
The current methodologies produce one network reduc-
ing each instance to a node. This approach presents some
different problems like the need to normalize data and
the omission of hidden-patterns between attribute-attribute
interaction.
In this paper, we will present a new methodology that
captures these attribute-attribute interactions building a net-
work for each attribute, removing networks without useful
information, optimizing the importance of each one. We will
use the high-level classification technique presented in [5]
to evaluate the results.
2. Model Description
In this section, we describe our algorithm. First, we
make a literature review. Then, we describe step by step
our algorithm.
2.1. Literature Review
2.1.1. Complex networks as data representation. In order
to capture the instance interactions, we need to represent the
data in a network structure. Different authors present build-
ing methodologies using the k nearest neighbors algorithm.
They transform each instance into a node and the nearest
neighbors will be the neighborhood of that node [3].
Figure 1. Image of two related instances transformed in two linked nodes.
In figure 2 , we can observe how each instance in the
dataset is represented as nodes. The links depend on the
k nearest neighbors. Due to this methodology depends on
kN N, the dataset normalization is needed.
In equation 1, it is described the general methodology
to build the network representation of the data [6].
N(Xi) = (kN N(Xi, yi),otherwise
-radius(Xi, yi),if |-radius(Xi, yi)|> k
(1)
Where Xiis the instance iand yiis its label. kN N
returns the knearest nodes related with the instance Xi
that have the same label yiusing a similarity function
like euclidean distance [8]. -radius returns a set of nodes
arXiv:2009.13511v1 [cs.LG] 28 Sep 2020
{Vj, Vj∈ V :distance(Xi, Xj)< ∧yi=yj}.Vjis
the node representation of the instance Xj. The value is
a percentile of the distances calculated with kN N some
authors consider just the median [6].
Following this equation 1, we build a complex network
Gfrom a dataset D={(X1, y1), ..., (Xn, yn)}. Where each
class is represented as sub graph gi.
Figure 2. Image of a complex network Gwith three classes
gred,gg reen,gblue based on Wine UCI dataset and modularity Q=
0.6562.
2.1.2. Important network measures. To exploit the high-
level approach, we need measures that capture nodes inter-
action like structure, centrality, and connectivity measures.
•Clustering Coefficient(CC ): This metric gives infor-
mation about the connectivity between the neighbors
of a node [7]. It is between 0 (no communica-
tion between the neighbors) and 1 (fully connected
neighbors).
•Betweenness Centrality (BC ): This metric capture
the communication between nodes using the short-
est paths [2]. For each node we will calculate the
number of geodesic path where this node is present.
A node with higher BC present an important role
in the network communication.
B(i) = X
s6=i∈V X
t6=i∈V
ηi
st
ηst
(2)
where ηi
st is 1 when the node iis part of the geodesic
path from sto tand 0 otherwise. ηst is the total
number of shortest paths between sand t.
•Modularity (Q): This metric provides information
about the quality of a given partition [9]. Usually, it
is between 0 and 1. Where 0 means poor community
structure and 1 represents a strong differentiation be-
tween the communities. In supervised learning clas-
sification, the communities are the classes. Higher
modularity represents better separations of the sub
graphs giand probably better classification.
Q=1
2|E|X
i,jV
(Aij −kikj
2|E|)(3)
Where Aij is the weight of the edge that links the
vertex iand j.|E|and Vrepresents the number of
edges and nodes respectively. kiis the degree of the
node i.
2.1.3. High-level classification algorithms. These kinds of
algorithms apply these measures in complex networks to
classify.
•Based on Impact Score (NBHL): This algorithm
uses different network measures after and before and
insertion [6]. Where the insertion produces fewer
metrics variation the node will be classified.
•Based on Importance (PgRkNN): This algorithm use
the Pagerank algorithm to evaluate the importance of
the nodes [1]. For each sub graph gi, we measure
the importance of the nodes related to the node to
be classified. The neighbors with higher importance
will capture the new node.
•Based on Betweenness (HLNB-BC): This algorithm
insert the new node into each sub graph giand
search the nodes with similar BC in each sub graph
[5]. These are compare and evaluated to provide
a probability for each class and the higher is the
selected label.
H ≈ (α)Wn+ (1 −α)Tn(4)
where His a probability list for each label for one
node. Wnis the BC average difference of the b
closes nodes. Tnis the list of the number of link
for each subgraph gi.αcontrols the weights between
structural information Wnand number of links Tn.
2.2. High-Level Classification Algorithm Using
Attribute-Attribute Interaction (Quipus)
2.2.1. Attribute-attribute interactions. In the section
2.1.1, we analyse how each instance is represented as a
node but this approach ignore some hidden patterns between
attribute-attribute interaction.
Figure 3. Image with 4 graphs that capture the interactions into each
attribute.
In this paper, we create a graph for each attribute to
detect this hidden patterns. In figure 3, we can appreciate
how each attribute is represented as an independent graph.
Using this approach, we can capture the attribute-attribute
interaction. Since we are using attributes that have the same
scale to produces the graphs, our method increases its resis-
tance to non-normalized data. However, there are attributes
that by themselves do not provide relevant information and
require others to be useful. Thus, we will use the Qto
evaluate each graph.
2.2.2. Proposed methodology. In supervised learning, we
split the dataset in two Xtraining and Xtesting . In training
phase, we produce a model that will help us to predict the
testing dataset. In training phase, we need to split the data
again in Xnet and Xopt because we have an optimization
phase. The proportion depends on the quantity of data
available.
The next steps are for training phase:
•First, we need to build a graph for each attribute to
capture the hidden patters between them following
the equation 1 on Xnet. Then, we build one more
graph using the instance as a node to capture the
instance-instance interaction.
•Second, we calculate the Qfor each graph. To
avoid possible noise of attributes without relevant
information, we ignore the graphs with modularity
lower than the instance-instance network.
•Third, we insert each instance from Xopt to the
networks following the same strategy described in
step 1. However, we will keep the link between
different labels because we want to simulate a real
insertion. Introduce each attribute into correspondent
graph and the complete instance in the instance-
instance graph.
•Fourth, we obtain the probability to be part of each
class in each graph using the high-level algorithm
HLNB-BC. For example, in a dataset with 3 classes
and 4 attributes, it will give us a list with three prob-
abilities for each graph (12 probabilities in total).
•Fifth, we give a weight for each graph from 0 to
1. This will give us a way to reduce or increase
the classification probability of the graphs. Then, we
use an optimization algorithm like particle swarm
to determinate the better weights for each graph to
increase the accuracy of the predicted instances in
Xopt.
•Finally, we save the weights and produce the final
graphs following the same procedure in step 1 with
Xtraining .
In testing phase, we insert each instance into the graphs
following the same process in step 4 of testing phase and
multiply the probabilities for each graph with the weights
defined in step 5.
3. Results
In this section, we present the performance of our
methodology Quipus. We use Python as programming
language, Scikit-learn library for machine learning algo-
rithms [?], networkx as graph library [10], and Pyswarms
as PSO optimizer [11]. Each algorithm were tested us-
ing 10-folds cross validation 10 times for training and
testing datasets, and Grid search to tune the param-
eters. We search kfrom 1 to 30, the percentile
was tested with these values [0.1,0.2,0.3,0.4,0.5],b
nearest nodes from 1 to 7 and αwith these values
[0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
The datasets used, their attributes, instances, and classes
are described on table 1.
Dataset Instances Attributes Classes
Iris 150 4 3
Wine 178 13 3
Zoo 101 16 7
TABLE 1. INFORMATION ABOUT THE UCI CLASSIFICATION DATASET
USED ON THESE PROJECT
In section 2.2.2, we split the training data in Xnet and
Xopt. In our tests, we use a stratified random split 80% and
20% respectively. This value could be modified according
to the quantity of data.
Then, we build a network for each attribute and one net-
work for instance-instance interactions. We calculate their
modularities (Q) and compare each attribute network with
the instance-instance network. The networks with lower
modularity will be ignored in the rest of the process. The
table 3 show us the modularities for each network.
Network Modularity Q
Instance-instance 0.3181
Attribute 1 0.3189
Attribute 2 0.0924
Attribute 3 0.0500
Attribute 4 0.1689
.
.
.
.
.
.
Attribute 10 0.2288
Attribute 11 0.3008
Attribute 12 0.3333
TABLE 2. MODULARITIES OF ATTRIBUTE NETWORKS IN UCI WINE
DATASET
For instance, the modularity of the networks attribute
1 and attribute 12 are higher than modularity of instance-
instance network. So, these networks will be used for op-
timization, classification, and insertion. The others will be
ignore because do not have a high community structure.
The insertion of the nodes into each graph follow the
equation 1, but preserving the links with nodes of different
labels. Given that we want to capture the insertion probabil-
ity for each class. Then, we create a weight for each graph
probabilities and start an optimization phase. We use a par-
ticle swarm optimization from Pyswarms library with these
parameters {c1= 0.5, c2= 0.1, w = 0.9, iterations =
500}. Theses could be optimized but we use these fixed
values for these experiments.
For example, in one interaction, our algorithm capture
the weight in table 3.
Network Modularity QWeights Ignored
Instance-instance 0.3181 0.9083 False
Attribute 1 0.3189 0.8065 False
Attribute 2 0.0924 - True
Attribute 3 0.0500 - True
Attribute 4 0.1689 - True
.
.
.
.
.
.
Attribute 10 0.2288 - True
Attribute 11 0.3008 - True
Attribute 12 0.3333 0.1746 False
TABLE 3. MODULARITIES AND WEIGHTS OF ATTRIBUTE NETWORKS IN
UCI WINE DATASET
Once the weights are defined, we proceed to rebuild
the graphs but using the entire training dataset Xtraining.
Finally, the classification phase, will follow the same process
that optimization phase but using the optimized weights.
Figure 4. Image of instance-instance network from one interaction in UCI
wine dataset classification.
Figure 5. Image of attribute-attribute 1 network from one interaction with
higher modularity than its instance-instance network in UCI wine dataset
classification.
In figure 4 , we can observe the instance-instance net-
work from one instance of wine dataset classification. The
black nodes represents instances classified. It present an
structure where the red nodes are in one side, the blue
nodes in the other side, and the green nodes in the middle of
them. In the figure 5, we observe the network from the first
attribute of wine datset that had a modularity of 0.3189.
Once the nodes are inserted this graphs present a higher
modularity Q= 0.6553. Without this methodology, we will
lose these attribute-attribute interactions. These networks
gives us an accuracy of 91.11%.
Dataset k e b α
Iris 12 0.0 3 1.0
Wine 7 0.0 3 1.0
Zoo 1 0.0 1 1.0
TABLE 4. PARAM ETER VAL UES US ED BY N BHL-BC WITH QUIPUS
ME THOD OLO GY IN UC I DATAS ETS
In table 5, we observe the accuracy of Quipus against
the literature network building technique kNN+-radius.
This current technique present problems related to data non-
normalized like wine uci dataset. However, using Quipus,
we reduce this problem. Due to the attribute networks build
their relations in the same scale, the optimized weight
manage the probability force, and reduce its impact in the
final classification.
Results of 10 times using 10-folds cross validation
Dataset Prediction Building (k) Accuracy
Iris HLNB-BC kNN+-radius (7) 95.33 ±11.85
Quipus (12) 95.80 ±09.36
Wine HLNB-BC kNN+-radius (1) 75.84 ±19.15
Quipus (7) 93.03 ±13.08
Zoo HLNB-BC kNN+-radius (1) 96.36 ±12.98
Quipus(1) 96.87 ±04.97
TABLE 5. TABLE WITH ACCURACY OF DIFFERENT BUILDING
METHODOLOGIES AND UCI DATASE TS WIT HOU T NORM ALIZ ATION .
4. Conclusion
The new classification methodology proposed exploit the
hidden patterns in attribute-attribute interactions. Building
networks for each attribute and ignoring the ones with lower
modularity. Also, uses the high-level classification technique
HLNB-BC. Introduces resilience to the model against non-
normalized data.
Many different modification, tests, and experiments have
been left for future work like testing with others high-level
techniques (NBHL, PgRkNN), identify a way to optimize
the parameters of particle swarm, and the use of the Quipus
methodology in other real datasets.
References
[1] M. Carneiro and L. Zhao, “Organizational data classification based
on the importance concept of complex networks,” IEEE Transactions
on Neural Networks and Learning Systems, vol. 29, pp. 3361–3373,
2018.
[2] T. Christiano Silva and L. Zhao, Machine Learning in Complex
Networks. Springer International Publishing, 2016.
[3] M. Carneiro and L. Zhao, “Analysis of graph construction methods
in supervised data classification,” in 2018 7th Brazilian Conference
on Intelligent Systems (BRACIS), Oct 2018, pp. 390–395.
[4] S. A. Fadaee and M. A. Haeri, “Classification using link prediction,”
Neurocomputing, vol. 359, pp. 395 – 407, 2019.
[5] E. Vilca and L. Zhao, “A network-based high-level data classification
algorithm using betweenness centrality,” 2020.
[6] T. Colliri, D. Ji, H. Pan, and L. Zhao, “A network-based high level
data classification technique,” in 2018 International Joint Conference
on Neural Networks (IJCNN), July 2018, pp. 1–8.
[7] R. Albert and A.-L. Barab ´
asi, “Statistical mechanics of complex
networks,” Rev. Mod. Phys., vol. 74, pp. 47–97, Jan 2002.
[8] T. C. Silva and L. Zhao, “Network-based high level data classifica-
tion,” IEEE Transactions on Neural Networks and Learning Systems,
vol. 23, no. 6, pp. 954–970, June 2012.
[9] A. Clauset, M. E. J. Newman, , and C. Moore, “Finding community
structure in very large networks,” Physical Review E, pp. 1– 6, 2004.
[Online]. Available: www.ece.unm.edu/ifis/papers/community-moore.
pdf
[10] A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network
structure, dynamics, and function using networkx,” in Proceedings of
the 7th Python in Science Conference, G. Varoquaux, T. Vaught, and
J. Millman, Eds., Pasadena, CA USA, 2008, pp. 11 – 15.
[11] L. J. V. Miranda, “PySwarms, a research-toolkit for Particle Swarm
Optimization in Python,” Journal of Open Source Software, vol. 3,
2018. [Online]. Available: https://doi.org/10.21105/joss.00433